Qwopus3.6-27B-solidity-cpt-stageA

⚠️ Intermediate checkpoint β€” Stage 0 of 5. This is a continued-pretraining (CPT) LoRA adapter, the first stage of a multi-stage Solidity specialization pipeline. Not intended for direct production use. It learns Solidity-language priors but has not yet been instruction-tuned. Use the downstream stage outputs for actual deployment.

A LoRA adapter on top of Qwopus3.6-27B-stage1-merged (a 27 B Qwen3.6 fine-tune) that biases the model's distribution toward modern Solidity smart contracts.

Pipeline context

Stage Adapter Status Output
Stage 0 β€” CPT (this card) plain LoRA (r=64, Ξ±=64) βœ… complete LoRA adapter, biases LM toward Solidity priors
Stage 1 β€” Instruction SFT plain LoRA (r=64, Ξ±=64) βœ… complete β†’ Qwopus3.6-27B-solidity-sft-stage1B spec β†’ contract + Foundry tests capability
Stage 2 β€” Audit / reasoning SFT plain LoRA (r=16, Ξ±=16) βœ… complete β†’ Qwopus3.6-27B-solidity-audit-stage2 vulnerability detection + Long-CoT audit reasoning, trained on solidity-audit-cot (6,140 Opus 4.7 traces fit at 8K ctx)
Stage 3 β€” RFT (rejection-sampling FT) plain LoRA (r=64, Ξ±=64) ⬜ pending functional-correctness boost via Foundry test execution
Stage 4 β€” GSPO (RL) plain LoRA (r=64, Ξ±=64) ⬜ optional last-mile pass-rate improvement

ℹ️ None of the stages use DoRA. DoRA was empirically benchmarked at Stage 0 and dropped β€” use_dora=True measured ~280 s/step on this stack vs ~180 s for plain LoRA (β‰ˆ30 % slower) for sub-1pt expected quality gain at this scale. The legacy directory name qwen_dora_cpt/ and some internal references are historical artefacts only β€” the saved weights are plain LoRA. DoRA may be revisited at Stage 4 (GSPO) where step time matters less.

This adapter is the foundation; subsequent stages stack on top after merge.

Model details

  • Base model: Qwopus3.6-27B-stage1-merged (51 GB bf16, derived from Qwen/Qwen3.6-27B)
  • Adapter type: plain LoRA (PEFT). Not DoRA β€” use_dora=False. See DoRA decision in Limitations for the benchmark that drove this choice.
  • LoRA configuration:
    • rank: 64
    • alpha: 64
    • target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, out_proj
    • dropout: 0.0
    • bias: none
  • Trainable parameters: 353,370,112 (~2.34% of effective 15B-equivalent param count)
  • Quantization: base loaded in 4-bit (BnB NF4) for QLoRA-style training; adapter weights are bf16

Training data

A composite Solidity corpus, quality-filtered from public sources:

Source License Approx. tokens (post-filter)
ASSERT-KTH/DISL (decomposed split) CC BY 4.0 ~76M (top-10% by composite quality score)
GitHub blue-chip protocols (~30 repos) β€” OpenZeppelin, Uniswap v2/v3/v4, Aave v3, Compound, Morpho, EigenLayer, Pendle, Seaport, Solady, LayerZero, ENS, Optimism, Arbitrum, Polygon L1, etc. mostly MIT/Apache included in 76M

Quality filter (composite 0–115 score)

Heuristics applied to score every raw row before tokenization:

Signal Points
Pragma 0.8.20+ (modern compiler) +30
0.8.13–19 +20
0.8.x (any) +10
Has SPDX license header +10
Has @notice / @dev NatSpec +10
Has @param / @return NatSpec +10
Comment density 5–30 % +10
Uses custom errors (error X();) +5
Uses unchecked blocks +5
No SafeMath import (avoids old patterns) +5
Size 500–8000 chars +10
Source: GitHub blue-chip +20

Only rows scoring β‰₯ 55 were kept (top 10 %, ~23 487 unique source files / ~89.6M tokens). EIP-1167 minimal proxies and contracts with relative imports we couldn't satisfy were dropped before scoring.

Pre-processing

  • Cross-source SHA256 deduplication (whitespace-collapsed)
  • Tokenized with the Qwen3.6 tokenizer (vocab 152 k)
  • Megatron-style packing into fixed 8192-token sequences with EOS separators
  • 10 938 packed sequences total (~89.6M effective tokens)
  • 1 500 sequences (~12M tokens) used for this Stage A run

Training procedure

  • Framework: Unsloth 2026.4.7 with UnslothTrainer (decoupled embedding LR support, 4-bit QLoRA kernels)
  • Hardware: 2Γ— NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB each)
  • Distributed: DDP via torchrun --nproc-per-node=2 (find_unused_parameters=True)
  • Precision: bf16 forward + adapter, NF4 base
  • Optimizer: 8-bit AdamW (adamw_8bit)
  • Gradient checkpointing: Unsloth custom

Hyperparameters

Hyperparameter Value
Loss Plain causal LM (next-token prediction across all tokens)
max_seq_length 8192
Packing dense, EOS-separated, Megatron-style
per_device_train_batch_size 2
gradient_accumulation_steps 9
effective batch 36 sequences = 295 k tokens / optim step
Epochs 1
Total training steps ~42 (Stage A)
Learning rate (adapters) 5 Γ— 10⁻⁡
LR scheduler cosine
Warmup ratio 0.03
Weight decay 0.01
Seed 3407

Training results

Loss curve (every step logged):

step  | loss   | grad_norm | LR (e-5)
   1  | 0.5805 |   0.80    |  0.00 (warmup)
   2  | 0.6304 |   0.95    |  2.50
   3  | 0.6687 |   1.09    |  5.00 (peak)
   4  | 0.578  |   0.76    |  4.99
   5  | 0.4751 |   0.21    |  4.97
   6  | 0.5706 |   0.27    |  4.93
   7  | 0.4578 |   0.14    |  4.88
   8  | 0.4479 |   0.24    |  4.81
   9  | 0.4039 |   0.13    |  4.73
  10  | 0.3847 |   0.09    |  4.63  ← first checkpoint saved
  ...

The model loss dropped from a warmup-peak 0.6687 (step 3) to a sustained ~0.36–0.41 plateau by step 10, indicating successful adaptation to the Solidity distribution. Full curve available in the wandb run.

Intended use

This adapter alone is NOT a deployment-ready model. It biases token-level prediction toward Solidity but is not yet instruction-tuned. Direct use of just this CPT adapter will produce continuation-style Solidity text but won't reliably follow specs or produce complete contracts on demand.

The intended downstream usage is:

  1. Merge this adapter onto the base model
  2. Run Stage 1 SFT (instruction tuning on spec β†’ contract pairs)
  3. Run Stage 2 SFT (audit / reasoning)
  4. Optionally run Stage 3 RFT and Stage 4 GSPO with executor-based reward (Foundry test-pass rate)
  5. Evaluate on SolBench / HumanEval-Solidity / SB-Heist

For research, ablation, or as a starting point for your own Solidity fine-tune.

Limitations and caveats

  • No benchmark scores yet. Evaluation runs after the full pipeline completes.
  • Solidity 0.8.x bias. Filter excluded < 0.8.x and > 0.8.26 patches; model will be weaker on Solidity 0.4–0.7 idioms.
  • Embeddings + lm_head not adapted. To keep step time tractable on dual-Blackwell, the LoRA target_modules omit embed_tokens and lm_head (each ~780 M params on 27B). This means the model's output token distribution shifts less than a full-CPT would; it gains internal Solidity priors but doesn't relearn token-level emission probabilities.
  • DoRA decision (Stage 0): DoRA (use_dora=True) was benchmarked head-to-head and dropped in favour of plain LoRA. DoRA measured ~280 s/step on this stack vs ~180 s for plain LoRA (β‰ˆ30 % slower) for an expected quality gain of <1 pt at the LoRA rank we use (r=64). Stages 0–3 all use plain LoRA. DoRA may be re-evaluated at Stage 4 (GSPO RL) where wall-clock cost is amortised over fewer optimiser steps.
  • License compatibility. Base inherits Apache-2.0 from Qwen. Training data: DISL is CC BY 4.0; GitHub blue-chips are mostly MIT/Apache. Some Etherscan-derived source has unspecified license; use research/personal context unless you've done your own license review.

How to use the adapter

from peft import PeftModel
from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tok = FastLanguageModel.from_pretrained(
    "samscrack/Qwopus3.6-27B-stage1-merged",   # or your local merged base
    max_seq_length=8192,
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(
    model,
    "samscrack/Qwopus3.6-27B-solidity-cpt-stageA",
)
FastLanguageModel.for_inference(model)

# Note: this is the CPT adapter β€” it produces Solidity-flavoured text,
# but for instruction-following you want the Stage 1 SFT adapter.
inputs = tok(
    "// SPDX-License-Identifier: MIT\npragma solidity ^0.8.20;\n\n",
    return_tensors="pt"
).to("cuda")
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0]))

Citation

@misc{qwopus_solidity_cpt_stageA_2026,
  title  = {{Qwopus3.6-27B-solidity-cpt-stageA}: a continued-pretraining LoRA adapter for Solidity smart-contract LLM specialization},
  author = {samscrack},
  year   = {2026},
  url    = {https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-cpt-stageA},
  note   = {Stage 0 of a 5-stage Solidity specialization pipeline.}
}

Acknowledgments

  • Unsloth for the LoRA / QLoRA training kernels and UnslothTrainer
  • ASSERT-KTH for the DISL dataset
  • Jackrong-llm-finetuning-guide for the existing 27B fine-tune recipe this work builds on
  • The OpenZeppelin, Uniswap, Aave, Compound, Morpho, EigenLayer, Pendle, Solady, Seaport, LayerZero, ENS, Optimism, Arbitrum, and Polygon teams whose audited public source forms the GitHub blue-chip slice

Changelog

  • 2026-05-03 β€” Initial Stage A adapter (1500 packed sequences, 1 epoch, plain LoRA r=64). Loss curve shows healthy convergence from warmup peak 0.67 to a 0.36-0.41 plateau by step 10.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for samscrack/Qwopus3.6-27B-solidity-cpt-stageA

Base model

Qwen/Qwen3.6-27B
Adapter
(53)
this model
Adapters
1 model

Dataset used to train samscrack/Qwopus3.6-27B-solidity-cpt-stageA