SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation

Model Overview

SiliconMind-V1 is a family of open-source Large Language Models (LLMs) specialized for Verilog code generation, testing, and debugging. Unlike previous approaches that rely heavily on commercial models or external EDA tools, SiliconMind-V1 is locally fine-tuned to iteratively generate, test, and debug RTL designs through test-time scaling.

The SiliconMind-V1 models are enabled by a unified multi-agent framework for reasoning-oriented training data generation with integrated testbench-driven verification to achieve state-of-the-art functional correctness on major benchmarks.

Key Features:

Reasoning-Oriented: Trained to "think" before coding, producing reasoning traces that guide functional correctness.
Self-Testing & Debugging: Capable of generating its own test report to fix bugs without tool-calling.
Tool-Free Verification: Reduces reliance on expensive, proprietary EDA software during the generation loop.
Multi-Strategy Inference: Supports Regular, Deep Thinking, and Agentic inference modes for scalable performance.

Model Variants

We provide SiliconMind-V1 variants fine-tuned from the following base models:

Model Name	Base Model	Size
SiliconMind-V1-Qwen2.5-C-7B-I	Qwen2.5-Coder-7B-Instruct	7B
SiliconMind-V1-Qwen3-4B-T-2507	Qwen3-4B-Thinking-2507	4B
SiliconMind-V1-Qwen3-8B	Qwen3-8B	8B
SiliconMind-V1-Olmo-3-7B-Think	Olmo-3-7B-Think	7B

Model Sources

Project Page: https://AS-SiliconMind.github.io/SiliconMind-V1
Repositories:
- Inference Engine: https://github.com/AS-SiliconMind/SiliconMind-V1
Paper: arxiv

Usage & Inference Strategies

SiliconMind-V1 is designed to work with three distinct inference strategies, allowing users to trade off between latency/cost and accuracy. Please refer to our inference engine for more details on how to get started with SiliconMind-V1.

1. Regular Strategy

The model acts as a standard code generator but is prompted to produce a reasoning trace before the final code.

Best for: Quick prototyping and simple modules.

2. Deep Thinking Strategy

Explicit instructions are given to the model to solve the problem by:

Drafting an initial solution.
Mentally "testing" it against scenarios.
Self-debugging within the reasoning trace.

Best for: Complex logic where single-pass generation often fails.

3. Agentic Strategy (Recommended for SOTA Results)

A multi-turn workflow where the model plays different "Agent" roles sequentially:

Solution Agent: Generates initial code + reasoning.
Test Agent: Generates a test report for the code.
Debug Agent: Reviews the test report and fixes errors.

Performance: Achieves the highest pass rates (Pass@1) by allowing iterative refinement (up to 3 interactions recommended).

Training

The models were trained on a Multi-Faceted Dataset constructed via a custom two-phase pipeline:

Code Generation Phase: A multi-agent system (Revision, Solution, Testbench, Verification Agents) synthesized 36k functionally verified (problem, reasoning, code, testbench) tuples from public sources.
Self-Correction Phase: The model was stress-tested against these problems. Hard samples (where the model failed) were augmented with "Test" and "Debug" curriculum, teaching the model how to write test reports and fix its own errors.

Evaluation: Pass@1 Performance (%) Across Major Verilog Benchmarks

Model Name	Base Model	RTLLM-v2	VerilogEval-v2	VerilogEval-v2-NTU	CVDP-cid02&03
Foundation Models:
DeepSeek-R1-0528	--	68.7	80.9	86.4	25.6
gpt-oss-120b (high)	--	70.0	83.2	87.9	27.6
Qwen3-32B	--	55.4	70.3	76.3	12.8
Qwen3-14B	--	50.0	64.2	69.5	12.9

Qwen2.5-C-7B-I	--	29.3	31.5	33.6	7.3
Qwen3-4B-T-2507	--	36.4	48.2	52.5	12.4
Qwen3-8B	--	40.2	53.7	57.4	11.9
Olmo-3-7B-Think	--	10.4	7.8	8.9	1.2
Fine-tuned Models:
CodeV-R1-7B-Distill	Qwen2.5-C-7B-I	58.5	66.4	69.6	19.0
CodeV-R1-7B	Qwen2.5-C-7B-I	🥉 66.1	69.7	73.2	21.3
SiliconMind-V1	Qwen2.5-C-7B-I	63.8	69.7	73.9	🥉 22.3
SiliconMind-V1	Qwen3-4B-T-2507	🥇 67.9	🥈 76.4	🥇 82.0	🥈 23.5
SiliconMind-V1	Qwen3-8B	🥈 66.6	🥇 76.5	🥈 81.0	🥇 24.0
SiliconMind-V1	Olmo-3-7B-Think	63.3	🥉 73.5	🥉 79.5	21.2

Note: - Bold values denote the better-performing model between CodeV-R1 and ours using the same base model.

Rankings among specialized models: 🥇 First, 🥈 Second, 🥉 Third.
For brevity, we refer to Qwen2.5-Coder-7b-Instruct as Qwen2.5-C-7B-I and Qwen3-4B-Thinking-2507 as Qwen3-4B-T-2507.
SiliconMind-V1 models' results were obtained using the Agentic Strategy, and we allow up to 3 Test/Debug Agent interactions.

License

SiliconMind-V1 is licensed under Apache 2.0.
The base models' licenses: Qwen2.5-Coder-7B-Instruct, Qwen3-4B-Thinking-2507, Qwen3-8B, Olmo-3-7B-Think (Responsible Use Guidelines).

Acknowledgements

We acknowledge the financial support from Academia Sinica's SiliconMind Project (AS-IAIA-114-M11). We also thank the National Center for High-Performance Computing (NCHC) for providing computational and storage resources, and Taipei-1 for providing H100 computing resources. In addition, we acknowledge financial support from the National Science and Technology Council.

Citation

BibTeX:

@misc{Chen2026SiliconMindV1,
  title  = {{SiliconMind-V1}: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation},
  author = {Mu-Chi Chen and Yu-Hung Kao and Po-Hsuan Huang and Shao-Chun Ho 
            and Hsiang-Yu Tsou and I-Ting Wu and En-Ming Huang 
            and Yu-Kai Hung and Wei-Po Hsin and Cheng Liang 
            and Chia-Heng Tu and Shih-Hao Hung and H.T. Kung},
  year   = {2026},
  url    = {https://AS-SiliconMind.github.io/SiliconMind-V1}
}