File size: 8,299 Bytes
566dbe3 9dc92d9 566dbe3 82e0c50 566dbe3 82e0c50 566dbe3 82e0c50 566dbe3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
license: apache-2.0
base_model: Qwen/Qwen3.6-27B
language:
- en
library_name: transformers
tags:
- solidity
- smart-contracts
- code-generation
- foundry
- blockchain
- ethereum
- security-audit
- rejection-fine-tuning
- qwen
datasets:
- ASSERT-KTH/DISL
- braindao/solidity-base-sft-v2
- samscrack/solidity-audit-cot
pipeline_tag: text-generation
---
# Qwen 3.6 Solidity (27B)
A 5-stage Solidity-specialist fine-tune of `Qwen/Qwen3.6-27B`. Trained to produce
Foundry-compileable Solidity contracts and matching test suites from natural-
language specs, and to reason about smart-contract security with long-CoT audit
traces.
This is the **final merged checkpoint** β all five stages (CPT β SFT instruction
β SFT audit/CoT β SFT Opus distillation β RFT) folded into a single bf16 model.
Loadable directly with `AutoModelForCausalLM.from_pretrained(...)` β no adapters
to apply.
## Solidity Eval (2026) β pass@1 leaderboard
Top of the pass@1 leaderboard on [`samscrack/solidity-eval-2026`](https://huggingface.co/datasets/samscrack/solidity-eval-2026) (`lite` split, 200 real Etherscan contracts):
| Agent / model | pass@1 | Wall-clock |
|---|---|---|
| **This model β Qwen 3.6 Solidity 27B** | **46.5%** (93/200) | ~27 min |
| Claude Code 2.1.128 (Claude Opus 4.7) | 39.0% (78/200, 1 timeout) | ~34 min |
`pass@1` here is SolBench's `echidna()` rule: a single agentic attempt is scored 1.0 only if Diffusc compiles the candidate AND Echidna's differential-fuzz finds no behavioral divergence vs. the ground-truth body, with B3 canary + stub-residue guards. No resampling. Identical conditions across rows: 16-way concurrency, `max_agent_turns=40`, `agent_temperature=0.6`, `fuzz_test_calls=50000`, `fuzz_seed=0xDEADBEEF`, same sandbox image, same host. This model served locally via vLLM TP=2 FP8 (qwen3_xml tool parser) on 2Γ Blackwell GPUs through the in-process Hermes agent loop; Claude Code via Anthropic API through the CLI agent backend.
See the dataset card for the full reproduction recipe and harness-agnostic scoring instructions.
## Pipeline
| # | Stage | Method | Adapter | Training data |
|---|---|---|---|---|
| 0 | Continued pretrain | LoRA r=64, ~500M Solidity tokens | folded in | `ASSERT-KTH/DISL` (514k deployed contracts, CC-BY 4.0) + ~80 curated blue-chip GitHub repos |
| 1B | Instruction SFT | LoRA r=64, 178 steps | folded in | `final.jsonl` (~315k rows: braindao/solidity-base-sft-v2 + andstor/smart_contract_code_comments + lohoz/Smart-Contract-MultiTask + slither-audited + Pyano-fun) + 4,240 unverified `foundry_tests.jsonl` rows |
| 2 | Audit / long-CoT | LoRA r=16, 2 epochs | folded in | `samscrack/solidity-audit-cot` (~6,140 Opus 4.7 long-form audit traces, all `confidence=high`, β€30k chars to fit 8K ctx) |
| 3 | Opus distillation SFT | LoRA r=16, 2 epochs, lr=5e-5 | folded in | 4,000 of 4,919 forge-verified Opus pairs (`foundry_tests.verified.jsonl`); 919 held out from training |
| 4 | Rejection fine-tuning (RFT) | LoRA r=16, 2 epochs, lr=5e-5 | folded in (this checkpoint) | 926 model-generated contract+test pairs that passed `forge build && forge test` self-oracle, with non-triviality gate (β₯3 test fns, β₯2 distinct asserts) |
**Stages 0/1B/2** were the original recipe (specification + Opus-CoT distillation).
**Stages 3/4** are the addition: directly distill the highest-quality forge-verified
Opus pairs (Stage 3), then rejection-sample the model's own forge-passing outputs
to anchor self-consistent generation (Stage 4).
## Eval β Stage 3 β Stage 4 (RFT) comparison
200 prompts Γ N=4 candidates from a held-out slice (never trained on at any
stage). Each model-generated `(contract, test_file)` pair is dropped into a
fresh Foundry project and scored end-to-end with `forge build && forge test`:
| Metric (200 prompts Γ N=4 candidates) | Post-Stage-3 | **Post-Stage-4** | Ξ |
|---|---|---|---|
| extract success | 80.5% | **86.4%** | +5.9 pp |
| compile success | 46.8% | **50.6%** | +3.8 pp |
| test pass | 19.2% | **21.4%** | +2.2 pp |
| **prompts β₯1 pass** | 45.0% | **54.0%** | **+9.0 pp** |
Stage 4 RFT lifted prompt-level yield by **+9 percentage points** (45 β 54 %).
Per-candidate compile rate jumped 10Γ across the full pipeline (4.5 % pre-Stage-3
β 50.6 % post-Stage-4) β the model now produces Foundry-compileable contracts
with matching test suites at >50 % per individual candidate.
## What this model is good at
- **Producing self-consistent Foundry-compileable contract + test pairs from a NL spec.**
Self-oracle test pass rate is 21.4% per candidate, 54% of prompts have β₯1 of 4 passes.
- **Long-CoT audit reasoning.** Stage 2 was trained on ~6k Opus 4.7 audit traces with
reasoning steps + structured findings (severity / category / location / impact / fix).
- **Solidity-idiomatic generation.** Stage 0 CPT shifts the base distribution toward
modern Solidity patterns (`mapping`, `msg.sender`, `pragma`, custom errors, etc.).
## Limitations
- **Synthetic-data lineage.** Stage 1B includes braindao/solidity-base-sft-v2
whose teacher model is undisclosed (likely commodity GPT, not GPT-4-class).
Quality ceiling is bounded by the teacher.
- **Audit-corpus legality.** Stage 2 corpus (`samscrack/solidity-audit-cot`) is
Opus-generated under Anthropic API terms over braindao seed contracts. Legal
review recommended before any commercial use of the audit-finding outputs.
- **Held-out eval.** This model has never seen `samscrack/solidity-eval-2026`
(SolBench RACR-4k + differential fuzz) at any stage β that's the gold benchmark.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"samscrack/Qwen3.6-Solidity-27B",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tok = AutoTokenizer.from_pretrained("samscrack/Qwen3.6-Solidity-27B")
# Spec β contract + tests
spec = (
"Implement a Solidity contract that holds a mapping from address to uint256 "
"balance. Owner can mint to any address. Anyone can transfer their balance to "
"another address. Include a Foundry test suite covering happy paths and the "
"owner-only invariant.\n\nProduce both the Solidity contract and a Foundry "
"test suite that exercises it."
)
msgs = [{"role": "user", "content": spec}]
inputs = tok.apply_chat_template(
msgs, tokenize=False, add_generation_prompt=True, enable_thinking=True,
)
toks = tok(inputs, return_tensors="pt").to(model.device)
out = model.generate(
**toks, max_new_tokens=4096, temperature=0.7, top_p=0.9, do_sample=True,
)
print(tok.decode(out[0][toks.input_ids.shape[-1]:], skip_special_tokens=True))
```
The generated assistant turn has the shape:
```
<think>...short design rationale...</think>
```solidity
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.x;
contract MyContract { ... }
```
```solidity
// test/Contract.t.sol
import "forge-std/Test.sol";
import "../src/Contract.sol";
contract MyContractTest is Test { ... }
```
```
## Format envelope
The model was trained on the canonical `<think>...</think>\n```solidity\n{contract}\n```\n\n```solidity\n// test/Contract.t.sol\n{tests}\n``` ` envelope. Most reliable
reproduction is to ask the user prompt to end with: *"Produce both the Solidity
contract and a Foundry test suite that exercises it."*
## Training infrastructure
- 2Γ NVIDIA RTX PRO 6000 Blackwell Workstation (96 GB each)
- Trainer: TRL 0.22 + Unsloth 2026.4.7 + PyTorch 2.8.0 + cu128
- Inference (sampling for Stage 4 RFT): vLLM 0.19.1 with FP8 dynamic quant +
FLASH_ATTN backend + Qwen3 reasoning parser
## Citation
```
@misc{qwen3.6-solidity-27b,
author = {Sam Crack (samscrack)},
title = {Qwen 3.6 Solidity (27B): a 5-stage CPT/SFT/RFT recipe for
Foundry-compileable Solidity codegen},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/samscrack/Qwen3.6-Solidity-27B}
}
```
## License
Apache-2.0 (this checkpoint). Underlying training data is from CC-BY/MIT-tier
sources; teacher reasoning content (Stage 2 + Stage 3) was generated under
Anthropic API terms of use as of generation date (2026-05-04). Eval set
`samscrack/solidity-eval-2026` is NOT used at any training stage.
|