Qwopus3.6-27B-solidity-sft-stage1B

⚠️ Intermediate checkpoint — Stage 1 of 5. This is the spec→contract instruction-following LoRA, layered on top of the Stage 0 CPT adapter. Not intended for direct production use. Audit/reasoning capability and last-mile pass-rate improvements come from Stages 2-4. Use the final-stage output for actual deployment.

A LoRA r=64 adapter on top of Qwopus3.6-27B-solidity-cpt-merged (the merged Stage 0 CPT base) that teaches the model to translate natural-language specs into idiomatic Solidity contracts, plus optionally Foundry test suites.

Pipeline context

#	Stage	Status	Output
0	Continued pretrain (DoRA on Solidity corpus)	✅ done	`Qwopus3.6-27B-solidity-cpt-stageA`
1	SFT (instruction): spec → contract	✅ done — this repo	this repo
2	SFT (audit / Long-CoT reasoning)	🟡 in progress	TBD
3	RFT (rejection-sampling FT against `forge test`)	⬜ planned	TBD
4	GSPO (sequence-level RL with executor reward)	⬜ planned	TBD

Stage 1B is the instruction-following head — it makes the Solidity-pretrained base actually respond to user prompts in the qwen3-thinking chat template format. The <think>...</think> block is intentionally empty for this stage; per-token reasoning supervision comes in Stage 2.

Training data

After ruthless quality filtering (strict pragma ≥ 0.7, license-clean, no GPT-3.5 teachers, forge-verified Opus rows only):

Source	Rows	Role
`braindao/solidity-base-sft-v2` (filtered) + `lohoz/Smart-Contract-MultiTask-Dataset[requirement_fsm_code]`	65,100	spec → contract, pragma ≥ 0.7
Opus 4.7-synthesized (contract, Foundry test) pairs that compile AND pass `forge test`	4,919	spec → contract + test suite
Total pool	70,019	sampled to 14,000 → 12,796 after ctx≤8192 filter

The 4,919 Opus rows were re-verified after a multi-solc patch to verify_synth.py (initial run had only 384 verified due to a hard-coded ≥0.8.13 preflight; patched to accept any 0.8.x and rewrite plain pragmas to ^0.8.0).

Training set: 12,796 rows of natural-language Solidity instructions paired with verified contract outputs.

Recipe (Jack's recipe, Stage 1B)

LoRA: r=64, α=64, dropout=0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, out_proj
Trainable parameters: 353,370,112 (~2.34% of effective 15B param count)
Quantization: base loaded in 4-bit (BnB NF4, QLoRA-style); adapter weights bf16
Effective batch size: 72 (4 per device × 9 grad accum × 2 GPUs)
Sequence length: 8,192
Optimizer: 8-bit AdamW (adamw_8bit), weight decay 0.001
Learning rate: 2e-4, linear schedule, warmup 5%
Epochs: 1
Total steps: 178
Chat template: qwen3-thinking (with <think></think> empty for this stage)
Loss masking: train_on_responses_only (loss only on assistant tokens after <|im_start|>assistant\n<think>)

Training metrics

Wall time: 6h 48m
Train loss: 0.367 → 0.289 (final), min 0.223
Hardware: 2× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB each)
Distributed: DDP via torchrun --nproc-per-node=2
Framework: Unsloth 2026.4.7 with TRL 0.22.2

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwopus3.6-27B-solidity-cpt-stageA",  # merge with your CPT base first
    torch_dtype="bfloat16", device_map="auto",
)
model = PeftModel.from_pretrained(base, "samscrack/Qwopus3.6-27B-solidity-sft-stage1B")
tokenizer = AutoTokenizer.from_pretrained("samscrack/Qwopus3.6-27B-solidity-sft-stage1B")

messages = [{"role": "user", "content":
    "Implement an ERC-20 token with a 1% transfer tax that goes to a treasury address. "
    "Include events, ownership, and Solidity 0.8.20 syntax."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=False))

Limitations

No reasoning depth. The <think> block is empty by design for Stage 1. Expect direct contract emission, no chain-of-thought. Stage 2 adds Long-CoT.
No security analysis. Stage 1 doesn't audit, doesn't flag vulnerabilities, doesn't reason about MEV/reentrancy. Stage 2 does.
No forge test validation in the loss. Stage 1 uses next-token cross-entropy only. RFT (Stage 3) introduces test-pass as a reward signal.
Solidity ≥ 0.7 bias (training data was strict-filtered). Will be weaker on 0.4/0.5/0.6 idioms.
Audit / contest authorship not validated for this stage's data — sources are spec→contract instruction pairs, not audit findings.

Related artifacts

Stage 0 CPT: samscrack/Qwopus3.6-27B-solidity-cpt-stageA
Eval set (held-out, must not train on): samscrack/solidity-eval-2026
Stage 2 audit-CoT corpus (Opus 4.7 traces): samscrack/solidity-audit-cot

Citation

@misc{qwopus3-6-27b-solidity-sft-stage1b-2026,
  author       = {samscrack},
  title        = {Qwopus3.6-27B-solidity-sft-stage1B: Stage 1 instruction LoRA on the Solidity-pretrained base},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-sft-stage1B}},
}

Downloads last month: 16

Model tree for samscrack/Qwopus3.6-27B-solidity-sft-stage1B

Base model

Qwen/Qwen3.6-27B

Adapter

samscrack/Qwopus3.6-27B-solidity-cpt-stageA

Adapter

(1)

this model

Adapters

1 model

samscrack
/

Qwopus3.6-27B-solidity-sft-stage1B