SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
Model Overview
SiliconMind-V1 is a family of open-source Large Language Models (LLMs) specialized for Verilog code generation, testing, and debugging. Unlike previous approaches that rely heavily on commercial models or external EDA tools, SiliconMind-V1 is locally fine-tuned to iteratively generate, test, and debug RTL designs through test-time scaling.
The SiliconMind-V1 models are enabled by a unified multi-agent framework for reasoning-oriented training data generation with integrated testbench-driven verification to achieve state-of-the-art functional correctness on major benchmarks.
Key Features:
- Reasoning-Oriented: Trained to "think" before coding, producing reasoning traces that guide functional correctness.
- Self-Testing & Debugging: Capable of generating its own test report to fix bugs without tool-calling.
- Tool-Free Verification: Reduces reliance on expensive, proprietary EDA software during the generation loop.
- Multi-Strategy Inference: Supports Regular, Deep Thinking, and Agentic inference modes for scalable performance.
Model Variants
We provide SiliconMind-V1 variants fine-tuned from the following base models:
| Model Name | Base Model | Size |
|---|---|---|
| SiliconMind-V1-Qwen2.5-C-7B-I | Qwen2.5-Coder-7B-Instruct | 7B |
| SiliconMind-V1-Qwen3-4B-T-2507 | Qwen3-4B-Thinking-2507 | 4B |
| SiliconMind-V1-Qwen3-8B | Qwen3-8B | 8B |
| SiliconMind-V1-Olmo-3-7B-Think | Olmo-3-7B-Think | 7B |
Model Sources
- Project Page: https://AS-SiliconMind.github.io/SiliconMind-V1
- Repositories:
- Inference Engine: https://github.com/AS-SiliconMind/SiliconMind-V1
- Paper: arxiv
Usage & Inference Strategies
SiliconMind-V1 is designed to work with three distinct inference strategies, allowing users to trade off between latency/cost and accuracy. Please refer to our inference engine for more details on how to get started with SiliconMind-V1.
1. Regular Strategy
The model acts as a standard code generator but is prompted to produce a reasoning trace before the final code.
- Best for: Quick prototyping and simple modules.
2. Deep Thinking Strategy
Explicit instructions are given to the model to solve the problem by:
- Drafting an initial solution.
- Mentally "testing" it against scenarios.
- Self-debugging within the reasoning trace.
- Best for: Complex logic where single-pass generation often fails.
3. Agentic Strategy (Recommended for SOTA Results)
A multi-turn workflow where the model plays different "Agent" roles sequentially:
- Solution Agent: Generates initial code + reasoning.
- Test Agent: Generates a test report for the code.
- Debug Agent: Reviews the test report and fixes errors.
- Performance: Achieves the highest pass rates (Pass@1) by allowing iterative refinement (up to 3 interactions recommended).
Training
The models were trained on a Multi-Faceted Dataset constructed via a custom two-phase pipeline:
Code Generation Phase: A multi-agent system (Revision, Solution, Testbench, Verification Agents) synthesized 36k functionally verified (problem, reasoning, code, testbench) tuples from public sources.
Self-Correction Phase: The model was stress-tested against these problems. Hard samples (where the model failed) were augmented with "Test" and "Debug" curriculum, teaching the model how to write test reports and fix its own errors.
Evaluation: Pass@1 Performance (%) Across Major Verilog Benchmarks
| Model Name | Base Model | RTLLM-v2 | VerilogEval-v2 | VerilogEval-v2-NTU | CVDP-cid02&03 |
|---|---|---|---|---|---|
| Foundation Models: | |||||
| DeepSeek-R1-0528 | -- | 68.7 | 80.9 | 86.4 | 25.6 |
| gpt-oss-120b (high) | -- | 70.0 | 83.2 | 87.9 | 27.6 |
| Qwen3-32B | -- | 55.4 | 70.3 | 76.3 | 12.8 |
| Qwen3-14B | -- | 50.0 | 64.2 | 69.5 | 12.9 |
| Qwen2.5-C-7B-I | -- | 29.3 | 31.5 | 33.6 | 7.3 |
| Qwen3-4B-T-2507 | -- | 36.4 | 48.2 | 52.5 | 12.4 |
| Qwen3-8B | -- | 40.2 | 53.7 | 57.4 | 11.9 |
| Olmo-3-7B-Think | -- | 10.4 | 7.8 | 8.9 | 1.2 |
| Fine-tuned Models: | |||||
| CodeV-R1-7B-Distill | Qwen2.5-C-7B-I | 58.5 | 66.4 | 69.6 | 19.0 |
| CodeV-R1-7B | Qwen2.5-C-7B-I | 🥉 66.1 | 69.7 | 73.2 | 21.3 |
| SiliconMind-V1 | Qwen2.5-C-7B-I | 63.8 | 69.7 | 73.9 | 🥉 22.3 |
| SiliconMind-V1 | Qwen3-4B-T-2507 | 🥇 67.9 | 🥈 76.4 | 🥇 82.0 | 🥈 23.5 |
| SiliconMind-V1 | Qwen3-8B | 🥈 66.6 | 🥇 76.5 | 🥈 81.0 | 🥇 24.0 |
| SiliconMind-V1 | Olmo-3-7B-Think | 63.3 | 🥉 73.5 | 🥉 79.5 | 21.2 |
Note: - Bold values denote the better-performing model between CodeV-R1 and ours using the same base model.
- Rankings among specialized models: 🥇 First, 🥈 Second, 🥉 Third.
- For brevity, we refer to Qwen2.5-Coder-7b-Instruct as Qwen2.5-C-7B-I and Qwen3-4B-Thinking-2507 as Qwen3-4B-T-2507.
- SiliconMind-V1 models' results were obtained using the Agentic Strategy, and we allow up to 3 Test/Debug Agent interactions.
License
SiliconMind-V1 is licensed under Apache 2.0.
The base models' licenses:
Qwen2.5-Coder-7B-Instruct,
Qwen3-4B-Thinking-2507,
Qwen3-8B,
Olmo-3-7B-Think (Responsible Use Guidelines).
Acknowledgements
We acknowledge the financial support from Academia Sinica's SiliconMind Project (AS-IAIA-114-M11). We also thank the National Center for High-Performance Computing (NCHC) for providing computational and storage resources, and Taipei-1 for providing H100 computing resources. In addition, we acknowledge financial support from the National Science and Technology Council.
Citation
BibTeX:
@misc{Chen2026SiliconMindV1,
title = {{SiliconMind-V1}: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation},
author = {Mu-Chi Chen and Yu-Hung Kao and Po-Hsuan Huang and Shao-Chun Ho
and Hsiang-Yu Tsou and I-Ting Wu and En-Ming Huang
and Yu-Kai Hung and Wei-Po Hsin and Cheng Liang
and Chia-Heng Tu and Shih-Hao Hung and H.T. Kung},
year = {2026},
url = {https://AS-SiliconMind.github.io/SiliconMind-V1}
}
- Downloads last month
- 35
Model tree for AS-SiliconMind/SiliconMind-V1-Olmo-3-7B-Think
Base model
allenai/Olmo-3-1025-7B