Qwopus3.6-27B-solidity-audit-stage2

⚠️ Intermediate checkpoint — Stage 2 of 5. This is the audit / Long-CoT reasoning LoRA, layered on top of the Stage 1B SFT base. Not yet RFT-tuned — Stage 3 (rejection-sampling fine-tuning against forge test) and the optional Stage 4 (GSPO RL with executor reward) come next.

A LoRA r=16 adapter on top of Qwopus3.6-27B-solidity-sft1B-merged (Stage 1B merged into bf16 base) that teaches the model to:

read a Solidity contract end-to-end
emit a long chain-of-thought (8-15 substantive paragraphs) in <think>...</think>
output a structured finding (severity, category, location, description, impact, fix, confidence)

Pipeline context

#	Stage	Status	Output
0	Continued pretrain (DoRA on Solidity corpus)	✅ done	`Qwopus3.6-27B-solidity-cpt-stageA`
1	SFT (instruction): spec → contract	✅ done	`Qwopus3.6-27B-solidity-sft-stage1B`
2	SFT (audit / Long-CoT reasoning)	✅ done — this repo	this repo
3	RFT (rejection-sampling FT against `forge test`)	⬜ planned	TBD
4	GSPO (sequence-level RL with executor reward)	⬜ optional	TBD

Training data

Corpus: samscrack/solidity-audit-cot — Long-CoT audit traces generated by Claude Opus 4.7 (adaptive thinking, xhigh effort) over real Solidity contracts.
Strict-quality cleaned: 7,707 rows. 8K-friendly subset for this stage: 6,140 rows (filtered to ≤30,000 chars total → fits in 8K ctx). The remaining 1,567 long-tail rows are preserved at [audit_cot_clean_long_tailed.jsonl] for a future longer-context Stage 2 re-run.
After ctx-filter at training time: 6,106 rows (34 marginal rows dropped).
Severity distribution (full clean corpus): 33% high · 25% medium · 14% low · 14% informational · 14% none. The none rows are deliberate — they teach the model not to fabricate vulnerabilities.

Each row's user turn is the contract code with an "Audit this contract" instruction. The assistant turn is <think>{12 reasoning paragraphs joined}</think> followed by a structured finding rendered as markdown.

Recipe (Stage 2 specifics, per SOTA-plan §2)

LoRA: r=16, α=16, dropout=0 (smaller than Stage 1B's r=64 — layered on, not replacing)
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, out_proj
Trainable parameters: 88,342,528 (~0.60% of effective 14.8B param count)
Quantization: base loaded in 4-bit (BnB NF4); adapter weights bf16
Effective batch size: 72 (4 per device × 9 grad accum × 2 GPUs)
Sequence length: 8,192
Optimizer: 8-bit AdamW, weight decay 0.001
Learning rate: 1e-4 (half of Stage 1B's 2e-4 — finer adjustment on top of trained base)
Epochs: 1
Total steps: 85
Chat template: qwen3-thinking (with <think> filled by real Opus-derived reasoning)
Loss masking: train_on_responses_only (loss only on assistant tokens after <|im_start|>assistant\n<think>)

Training metrics

Wall time: 3h 48m
Train loss: 1.21 (start) → ~0.99 (min) → 1.026 (final under linear decay)
Hardware: 2× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB each)
Distributed: DDP via torchrun --nproc-per-node=2
Framework: Unsloth 2026.4.7 with TRL 0.22.2

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the Stage-1B-merged base, then layer the Stage 2 LoRA
base = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwopus3.6-27B-solidity-sft-stage1B",  # merge with the CPT base first locally
    torch_dtype="bfloat16", device_map="auto",
)
model = PeftModel.from_pretrained(base, "samscrack/Qwopus3.6-27B-solidity-audit-stage2")
tokenizer = AutoTokenizer.from_pretrained("samscrack/Qwopus3.6-27B-solidity-audit-stage2")

contract = open("MyContract.sol").read()
messages = [{"role": "user", "content":
    "Audit the following Solidity contract for security issues. Identify the most "
    "impactful vulnerability (or 'none' if the contract is clean), reason step-by-step "
    "about your analysis, then output a structured finding with severity, category, "
    "location, description, impact, and a concrete fix.\n\n"
    f"```solidity\n{contract}\n```"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=False))

The output will contain a <think> block with multi-paragraph reasoning followed by a markdown audit report.

Limitations

Single-finding output by design. Each response identifies the one most-impactful issue. Multi-finding audits would require either prompting it to "find all issues" or running it with different temperatures and aggregating.
Synthetic teacher. Reasoning style and breadth reflect Claude Opus 4.7's idiom. The model has been pushed in that direction; output quality is bounded by Opus's audit ceiling.
No forge test validation in the loss. This stage uses next-token cross-entropy on Opus traces. Stage 3 RFT introduces test-pass as a reward signal — that's where functional correctness gets enforced.
Long contracts truncated. Training set was filtered to contracts that fit in 8K tokens. Contracts >~7,500 tokens may produce shorter or less coherent reasoning at inference time.
Solidity ≥ 0.7 bias (inherited from Stage 1B's strict pragma filter).
Findings can be wrong. Like any synthetic-teacher SFT, this model can produce confident-sounding false positives (flagging non-bugs) or miss real issues. Use as a triage assistant, not a substitute for human audit review.

Related artifacts

Stage 0 CPT: samscrack/Qwopus3.6-27B-solidity-cpt-stageA
Stage 1 SFT: samscrack/Qwopus3.6-27B-solidity-sft-stage1B
Training data: samscrack/solidity-audit-cot
Eval set (held-out, must not train on): samscrack/solidity-eval-2026

Citation

@misc{qwopus3-6-27b-solidity-audit-stage2-2026,
  author       = {samscrack},
  title        = {Qwopus3.6-27B-solidity-audit-stage2: Stage 2 audit/Long-CoT LoRA},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-audit-stage2}},
}

Downloads last month: 11

Model tree for samscrack/Qwopus3.6-27B-solidity-audit-stage2

Base model

Qwen/Qwen3.6-27B

Adapter

samscrack/Qwopus3.6-27B-solidity-cpt-stageA

Adapter

samscrack/Qwopus3.6-27B-solidity-sft-stage1B

Adapter

(1)

this model

samscrack
/

Qwopus3.6-27B-solidity-audit-stage2

Qwopus3.6-27B-solidity-audit-stage2

Pipeline context

Training data

Recipe (Stage 2 specifics, per SOTA-plan §2)

Training metrics

Usage

Limitations

Related artifacts

Citation

Model tree for samscrack/Qwopus3.6-27B-solidity-audit-stage2

Dataset used to train samscrack/Qwopus3.6-27B-solidity-audit-stage2