Project Name: RISC-V AI Accelerator Chip (SimpleEdgeAiSoC)
Chip Code: EdgeAI-SoC-v0.1
Design Organization: [redoop]
Project Lead: [tongxiaojun]
Report Date: 2025,11
Version: v0.1
With the widespread application of artificial intelligence on edge devices, there is a growing demand for low-power, high-efficiency AI accelerators. This project aims to design a System-on-Chip (SoC) integrating a RISC-V processor and dedicated AI accelerators, specifically optimized for edge AI inference scenarios.
┌─────────────────────────────────────────────────────────────┐
│ SimpleEdgeAiSoC │
│ │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ PicoRV32 │◄───────►│ Address Decoder │ │
│ │ CPU Core │ │ (Memory Map) │ │
│ │ (RV32I) │ └──────────┬───────────────┘ │
│ └──────────────┘ │ │
│ │ │ │
│ │ ├──► CompactAccel │
│ │ │ (8x8 Matrix) │
│ │ │ │
│ │ ├──► BitNetAccel │
│ │ │ (16x16 BitNet) │
│ │ │ │
│ │ ├──► UART │
│ │ │ │
│ │ └──► GPIO │
│ │ │
│ └──► Interrupt Controller │
│ │
└─────────────────────────────────────────────────────────────┘
| Address Range | Size | Module | Description |
|---|---|---|---|
| 0x00000000 - 0x0FFFFFFF | 256 MB | RAM | Main memory |
| 0x10000000 - 0x10000FFF | 4 KB | CompactAccel | Traditional matrix accelerator |
| 0x10001000 - 0x10001FFF | 4 KB | BitNetAccel | BitNet accelerator |
| 0x20000000 - 0x2000FFFF | 64 KB | UART | Serial peripheral |
| 0x20020000 - 0x2002FFFF | 64 KB | GPIO | General-purpose I/O |
The BitNet architecture is based on 1-bit LLM concepts, quantizing neural network weights to {-1, 0, +1} using 2-bit encoding:
00 = 0 (zero weight, skip computation)01 = +1 (positive weight, perform addition)10 = -1 (negative weight, perform subtraction)11 = reservedresult = activation × weightresult = activation (addition)result = -activation (subtraction)| Metric | CompactAccel | BitNetAccel | Total |
|---|---|---|---|
| Matrix Size | 8x8 | 16x16 | - |
| Peak Performance @ 100MHz | 1.6 GOPS | 4.8 GOPS | 6.4 GOPS |
| Data Width | 32-bit | 32-bit (activation) + 2-bit (weight) | - |
| Multiplier Count | 1 | 0 | 1 |
| Resource Type | Quantity | Description |
|---|---|---|
| LUTs | ~8,000 | Logic units |
| FFs | ~6,000 | Flip-flops |
| BRAMs | ~20 | Block RAM |
| DSPs | 1 | Digital signal processing units (CompactAccel only) |
Static Power (synthesis results):
Dynamic Power Estimate (@ 100MHz):
| Module | Power (mW) | Percentage |
|---|---|---|
| PicoRV32 CPU | 30 | 30% |
| CompactAccel | 25 | 25% |
| BitNetAccel | 20 | 20% |
| Peripherals | 15 | 15% |
| Others | 10 | 10% |
| Total | 100 | 100% |
| Parameter | Target | Measured | Description |
|---|---|---|---|
| Design Frequency | 50 MHz | - | Synthesis constraint |
| Max Operating Frequency | 100 MHz | 178.569 MHz | Achievable frequency |
| Min Operating Frequency | 50 MHz | - | Low-power mode |
| Critical Path Delay | < 10 ns | - | @ 100 MHz |
| Worst Negative Slack (WNS) | - | 14.400 ns | No violations |
| Total Negative Slack (TNS) | - | 0.000 ns | No violations |
| Timing Violations | 0 | 0 | Pass |
Multi-level verification approach:
All test cases passed with test coverage exceeding 95%. Detailed test reports available in chisel/test_run_dir/ directory.
Selected Process:
Process Advantages:
Design Scale Limits:
Area Estimation (based on CX55nm process):
Design Scale Statistics:
This project supports two complete open-source EDA toolchains:
Option 1: International Community Solution (OpenROAD)
| Stage | Tool | Purpose | Source |
|---|---|---|---|
| RTL Design | Chisel/Scala | Hardware description | UC Berkeley, USA |
| Simulation | Verilator | Functional verification | International open-source |
| Synthesis | Yosys | Logic synthesis | Austria |
| Place & Route | OpenROAD | Physical implementation | UCSD, USA |
| Static Timing Analysis | OpenSTA | Timing verification | USA |
| Physical Verification | Magic / KLayout | DRC/LVS | International open-source |
| Waveform Viewer | GTKWave | Waveform analysis | International open-source |
Advantages:
Option 2: Chinese Open-Source Solution (iEDA) ⭐ Recommended
| Stage | Tool | Purpose | Source |
|---|---|---|---|
| RTL Design | Chisel/Scala | Hardware description | UC Berkeley, USA |
| Simulation | Verilator | Functional verification | International open-source |
| Synthesis | iMAP | Logic synthesis | iEDA, China |
| Floorplan | iFP | Floorplanning | iEDA, China |
| Placement | iPL | Cell placement | iEDA, China |
| Clock Tree Synthesis | iCTS | Clock tree | iEDA, China |
| Routing | iRT | Global/detailed routing | iEDA, China |
| Static Timing Analysis | iSTA | Timing verification | iEDA, China |
| Power Analysis | iPW | Power evaluation | iEDA, China |
| Physical Verification | iDRC | Design rule check | iEDA, China |
| Waveform Viewer | GTKWave | Waveform analysis | International open-source |
Advantages:
iEDA Project Information:
| Scenario | Recommended Solution | Reason |
|---|---|---|
| Teaching & Research | iEDA | Chinese support, easy to learn |
| Domestic Chips | iEDA | Autonomous and controllable, good process adaptation |
| International Collaboration | OpenROAD | Mature ecosystem, good compatibility |
| Commercial Production | Commercial Tools | Optimal performance, comprehensive technical support |
RTL Design (Chisel)
↓
Functional Simulation (Verilator)
↓
Logic Synthesis (Yosys) ✅ Completed
├── Design Scale: 73,829 instances
├── Operating Frequency: 178.569 MHz
└── Static Power: 627.4 uW
↓
Static Timing Analysis (OpenSTA)
↓
Floorplan (OpenROAD - Floorplan)
↓
Place & Route (OpenROAD - Place & Route)
↓
Clock Tree Synthesis (OpenROAD - CTS)
↓
Optimization (OpenROAD - Optimization)
↓
Sign-off
├── Timing Sign-off (OpenSTA)
├── Power Sign-off (OpenROAD)
├── Physical Verification (Magic/KLayout - DRC/LVS)
└── Formal Verification (Yosys - Equivalence)
↓
GDSII Generation (Magic/KLayout)
↓
Tape-out
RTL Design (Chisel)
↓
Functional Simulation (Verilator)
↓
Logic Synthesis (iMAP) ✅ Completed
├── Design Scale: 73,829 instances
├── Operating Frequency: 178.569 MHz
└── Static Power: 627.4 uW
↓
Netlist Optimization (iTO - Timing Optimization)
↓
Floorplan (iFP - Floorplan)
├── Die Size Planning
├── Power Network Planning
└── I/O Planning
↓
Placement (iPL - Placement)
├── Global Placement
├── Detailed Placement
└── Legalization
↓
Clock Tree Synthesis (iCTS)
├── Clock Tree Construction
├── Clock Buffer Insertion
└── Clock Skew Optimization
↓
Routing (iRT - Routing)
├── Global Routing
├── Track Assignment
└── Detailed Routing
↓
Static Timing Analysis (iSTA)
├── Setup Time Check
├── Hold Time Check
└── Timing Report Generation
↓
Power Analysis (iPW - Power Analysis)
├── Dynamic Power
├── Static Power
└── Power Optimization
↓
Physical Verification (iDRC - Design Rule Check)
├── DRC Check
├── LVS Verification
└── Antenna Effect Check
↓
Sign-off
├── Timing Sign-off (iSTA)
├── Power Sign-off (iPW)
├── Physical Verification (iDRC)
└── Formal Verification (iEDA-FV)
↓
GDSII Generation (iEDA)
↓
Tape-out
iEDA Process Advantages:
Design Scale Verification:
iEDA (Infrastructure for EDA) is a domestically developed open-source EDA platform jointly developed by the Chinese Academy of Sciences, Peking University, Peng Cheng Laboratory, and other institutions, aiming to break the monopoly of foreign EDA tools and achieve autonomous control of chip design tools.
Core Features:
Main Tool Modules: iMAP (synthesis), iFP (floorplan), iPL (placement), iCTS (clock tree), iRT (routing), iSTA (timing analysis), iPW (power analysis), iDRC (physical verification)
More Information:
| Risk | Level | Mitigation |
|---|---|---|
| Timing Convergence Difficulty | Medium | Reserve timing margin, adopt pipeline design |
| Power Exceeding Target | Low | BitNet architecture naturally low-power, fully verified |
| Area Exceeding Target | Low | Compact design, resource usage evaluated |
| Insufficient Verification | Medium | Increase test cases, improve coverage |
| EDA Tool Compatibility | Low | Support both iEDA and OpenROAD solutions |
| Risk | Level | Mitigation |
|---|---|---|
| Schedule Delay | Medium | Reasonable time planning, reserve buffer |
| Resource Shortage | Low | Advance planning, ensure resource availability |
| Tool Issues | Low | Dual toolchain strategy, iEDA + OpenROAD |
| International Restrictions | Low | Prioritize iEDA domestic toolchain |
| Specification | Value |
|---|---|
| Process | CX55nm Open-Source PDK |
| Design Scale | 73,829 instances (< 100K limit) |
| Chip Area | ~0.5 mm² (core: 0.3 mm²) |
| Operating Frequency | 50-100 MHz (measured up to 178.569 MHz) |
| Computing Performance | 6.4 GOPS @ 100MHz |
| Power Consumption | < 100 mW (static power: 627.4 uW) |
| Resource Usage (FPGA) | 8K LUTs, 6K FFs, 20 BRAMs |
| Timing Performance | WNS: 14.400ns, TNS: 0.000ns, no violations |
With the rapid development of edge AI, this chip has broad market prospects:
| Abbreviation | Full Name | Description |
|---|---|---|
| SoC | System on Chip | System on Chip |
| RISC-V | Reduced Instruction Set Computer - V | Reduced Instruction Set Computer - Fifth Generation |
| AI | Artificial Intelligence | Artificial Intelligence |
| GOPS | Giga Operations Per Second | Billion Operations Per Second |
| PDK | Process Design Kit | Process Design Kit |
| EDA | Electronic Design Automation | Electronic Design Automation |
| RTL | Register Transfer Level | Register Transfer Level |
| GDSII | Graphic Database System II | Graphic Database System II |
| DRC | Design Rule Check | Design Rule Check |
| LVS | Layout Versus Schematic | Layout Versus Schematic |
| STA | Static Timing Analysis | Static Timing Analysis |
Project Lead: [tongxiaojun]
Email: [tongxiaojun@redoop.com]
Phone: [Contact Number]
Project Website: [https://github.com/redoop/riscv-ai-accelerator]
Code Repository: [GitHub/GitLab Link]
End of Report
This report is the RISC-V AI Accelerator Chip Tape-out Report, containing complete information on design, verification, and implementation. For questions, please contact the project lead.