| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3.6-27B |
| language: |
| - en |
| library_name: transformers |
| tags: |
| - solidity |
| - smart-contracts |
| - code-generation |
| - foundry |
| - blockchain |
| - ethereum |
| - security-audit |
| - rejection-fine-tuning |
| - qwen |
| datasets: |
| - ASSERT-KTH/DISL |
| - braindao/solidity-base-sft-v2 |
| - samscrack/solidity-audit-cot |
| pipeline_tag: text-generation |
| --- |
| |
| # Qwen 3.6 Solidity (27B) |
|
|
| A 5-stage Solidity-specialist fine-tune of `Qwen/Qwen3.6-27B`. Trained to produce |
| Foundry-compileable Solidity contracts and matching test suites from natural- |
| language specs, and to reason about smart-contract security with long-CoT audit |
| traces. |
|
|
| This is the **final merged checkpoint** — all five stages (CPT → SFT instruction |
| → SFT audit/CoT → SFT Opus distillation → RFT) folded into a single bf16 model. |
| Loadable directly with `AutoModelForCausalLM.from_pretrained(...)` — no adapters |
| to apply. |
|
|
| ## Solidity Eval (2026) — pass@1 leaderboard |
|
|
| Top of the pass@1 leaderboard on [`samscrack/solidity-eval-2026`](https://huggingface.co/datasets/samscrack/solidity-eval-2026) (`lite` split, 200 real Etherscan contracts): |
|
|
| | Agent / model | pass@1 | Wall-clock | |
| |---|---|---| |
| | **This model — Qwen 3.6 Solidity 27B** | **46.5%** (93/200) | ~27 min | |
| | Claude Code 2.1.128 (Claude Opus 4.7) | 39.0% (78/200, 1 timeout) | ~34 min | |
|
|
| `pass@1` here is SolBench's `echidna()` rule: a single agentic attempt is scored 1.0 only if Diffusc compiles the candidate AND Echidna's differential-fuzz finds no behavioral divergence vs. the ground-truth body, with B3 canary + stub-residue guards. No resampling. Identical conditions across rows: 16-way concurrency, `max_agent_turns=40`, `agent_temperature=0.6`, `fuzz_test_calls=50000`, `fuzz_seed=0xDEADBEEF`, same sandbox image, same host. This model served locally via vLLM TP=2 FP8 (qwen3_xml tool parser) on 2× Blackwell GPUs through the in-process Hermes agent loop; Claude Code via Anthropic API through the CLI agent backend. |
| |
| See the dataset card for the full reproduction recipe and harness-agnostic scoring instructions. |
| |
| |
| ## Pipeline |
| |
| | # | Stage | Method | Adapter | Training data | |
| |---|---|---|---|---| |
| | 0 | Continued pretrain | LoRA r=64, ~500M Solidity tokens | folded in | `ASSERT-KTH/DISL` (514k deployed contracts, CC-BY 4.0) + ~80 curated blue-chip GitHub repos | |
| | 1B | Instruction SFT | LoRA r=64, 178 steps | folded in | `final.jsonl` (~315k rows: braindao/solidity-base-sft-v2 + andstor/smart_contract_code_comments + lohoz/Smart-Contract-MultiTask + slither-audited + Pyano-fun) + 4,240 unverified `foundry_tests.jsonl` rows | |
| | 2 | Audit / long-CoT | LoRA r=16, 2 epochs | folded in | `samscrack/solidity-audit-cot` (~6,140 Opus 4.7 long-form audit traces, all `confidence=high`, ≤30k chars to fit 8K ctx) | |
| | 3 | Opus distillation SFT | LoRA r=16, 2 epochs, lr=5e-5 | folded in | 4,000 of 4,919 forge-verified Opus pairs (`foundry_tests.verified.jsonl`); 919 held out from training | |
| | 4 | Rejection fine-tuning (RFT) | LoRA r=16, 2 epochs, lr=5e-5 | folded in (this checkpoint) | 926 model-generated contract+test pairs that passed `forge build && forge test` self-oracle, with non-triviality gate (≥3 test fns, ≥2 distinct asserts) | |
|
|
| **Stages 0/1B/2** were the original recipe (specification + Opus-CoT distillation). |
| **Stages 3/4** are the addition: directly distill the highest-quality forge-verified |
| Opus pairs (Stage 3), then rejection-sample the model's own forge-passing outputs |
| to anchor self-consistent generation (Stage 4). |
|
|
| ## Eval — Stage 3 → Stage 4 (RFT) comparison |
|
|
| 200 prompts × N=4 candidates from a held-out slice (never trained on at any |
| stage). Each model-generated `(contract, test_file)` pair is dropped into a |
| fresh Foundry project and scored end-to-end with `forge build && forge test`: |
|
|
| | Metric (200 prompts × N=4 candidates) | Post-Stage-3 | **Post-Stage-4** | Δ | |
| |---|---|---|---| |
| | extract success | 80.5% | **86.4%** | +5.9 pp | |
| | compile success | 46.8% | **50.6%** | +3.8 pp | |
| | test pass | 19.2% | **21.4%** | +2.2 pp | |
| | **prompts ≥1 pass** | 45.0% | **54.0%** | **+9.0 pp** | |
|
|
| Stage 4 RFT lifted prompt-level yield by **+9 percentage points** (45 → 54 %). |
| Per-candidate compile rate jumped 10× across the full pipeline (4.5 % pre-Stage-3 |
| → 50.6 % post-Stage-4) — the model now produces Foundry-compileable contracts |
| with matching test suites at >50 % per individual candidate. |
|
|
| ## What this model is good at |
|
|
| - **Producing self-consistent Foundry-compileable contract + test pairs from a NL spec.** |
| Self-oracle test pass rate is 21.4% per candidate, 54% of prompts have ≥1 of 4 passes. |
| - **Long-CoT audit reasoning.** Stage 2 was trained on ~6k Opus 4.7 audit traces with |
| reasoning steps + structured findings (severity / category / location / impact / fix). |
| - **Solidity-idiomatic generation.** Stage 0 CPT shifts the base distribution toward |
| modern Solidity patterns (`mapping`, `msg.sender`, `pragma`, custom errors, etc.). |
|
|
| ## Limitations |
|
|
| - **Synthetic-data lineage.** Stage 1B includes braindao/solidity-base-sft-v2 |
| whose teacher model is undisclosed (likely commodity GPT, not GPT-4-class). |
| Quality ceiling is bounded by the teacher. |
| - **Audit-corpus legality.** Stage 2 corpus (`samscrack/solidity-audit-cot`) is |
| Opus-generated under Anthropic API terms over braindao seed contracts. Legal |
| review recommended before any commercial use of the audit-finding outputs. |
| - **Held-out eval.** This model has never seen `samscrack/solidity-eval-2026` |
| (SolBench RACR-4k + differential fuzz) at any stage — that's the gold benchmark. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "samscrack/Qwen3.6-Solidity-27B", |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| trust_remote_code=True, |
| ) |
| tok = AutoTokenizer.from_pretrained("samscrack/Qwen3.6-Solidity-27B") |
| |
| # Spec → contract + tests |
| spec = ( |
| "Implement a Solidity contract that holds a mapping from address to uint256 " |
| "balance. Owner can mint to any address. Anyone can transfer their balance to " |
| "another address. Include a Foundry test suite covering happy paths and the " |
| "owner-only invariant.\n\nProduce both the Solidity contract and a Foundry " |
| "test suite that exercises it." |
| ) |
| msgs = [{"role": "user", "content": spec}] |
| inputs = tok.apply_chat_template( |
| msgs, tokenize=False, add_generation_prompt=True, enable_thinking=True, |
| ) |
| toks = tok(inputs, return_tensors="pt").to(model.device) |
| out = model.generate( |
| **toks, max_new_tokens=4096, temperature=0.7, top_p=0.9, do_sample=True, |
| ) |
| print(tok.decode(out[0][toks.input_ids.shape[-1]:], skip_special_tokens=True)) |
| ``` |
|
|
| The generated assistant turn has the shape: |
| ``` |
| <think>...short design rationale...</think> |
| ```solidity |
| // SPDX-License-Identifier: MIT |
| pragma solidity ^0.8.x; |
| contract MyContract { ... } |
| ``` |
| |
| ```solidity |
| // test/Contract.t.sol |
| import "forge-std/Test.sol"; |
| import "../src/Contract.sol"; |
| contract MyContractTest is Test { ... } |
| ``` |
| ``` |
|
|
| ## Format envelope |
|
|
| The model was trained on the canonical `<think>...</think>\n```solidity\n{contract}\n```\n\n```solidity\n// test/Contract.t.sol\n{tests}\n``` ` envelope. Most reliable |
| reproduction is to ask the user prompt to end with: *"Produce both the Solidity |
| contract and a Foundry test suite that exercises it."* |
|
|
| ## Training infrastructure |
|
|
| - 2× NVIDIA RTX PRO 6000 Blackwell Workstation (96 GB each) |
| - Trainer: TRL 0.22 + Unsloth 2026.4.7 + PyTorch 2.8.0 + cu128 |
| - Inference (sampling for Stage 4 RFT): vLLM 0.19.1 with FP8 dynamic quant + |
| FLASH_ATTN backend + Qwen3 reasoning parser |
| |
| ## Citation |
| |
| ``` |
| @misc{qwen3.6-solidity-27b, |
| author = {Sam Crack (samscrack)}, |
| title = {Qwen 3.6 Solidity (27B): a 5-stage CPT/SFT/RFT recipe for |
| Foundry-compileable Solidity codegen}, |
| year = {2026}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/samscrack/Qwen3.6-Solidity-27B} |
| } |
| ``` |
| |
| ## License |
| |
| Apache-2.0 (this checkpoint). Underlying training data is from CC-BY/MIT-tier |
| sources; teacher reasoning content (Stage 2 + Stage 3) was generated under |
| Anthropic API terms of use as of generation date (2026-05-04). Eval set |
| `samscrack/solidity-eval-2026` is NOT used at any training stage. |
| |