DeepSeek-R1-Distill-Qwen-7B — Smart Contract Vulnerability Detection

RL fine-tuned version of DeepSeek-R1-Distill-Qwen-7B for detecting vulnerabilities in Solidity smart contracts. Fine-tuned using GRPO (Group Relative Policy Optimization) with LoRA on the CGT (Consolidated Ground Truth) dataset.

Model Description

Base model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Fine-tuning method: GRPO + LoRA
Task: Smart contract vulnerability detection and classification
Developer: Nishant Pandav (@npanium)
Repository: https://github.com/npanium/smartcontracts-vulnerability-r1
License: MIT

What it does

Given a Solidity smart contract, the model:

Reasons through the code using chain-of-thought inside <think> tags
Determines whether the contract is vulnerable
Classifies the vulnerability by DASP category (1–9) and SWC ID (100–136)

Output format:

<think>
... reasoning about the contract ...
</think>
VULNERABLE: yes/no
DASP_CATEGORY: N
SWC_ID: NNN
EXPLANATION: ...

Results

Evaluated on 1,478 held-out contracts from the CGT dataset:

Metric	Before (base)	After (fine-tuned)	Delta
Detection accuracy (Tier 1)	23.0%	74.3%	+51.3%
DASP category (Tier 2)	8.8%	11.3%	+2.5%
SWC ID (Tier 3)	3.0%	0.0%	-3.0%
Overall	11.6%	28.5%	+16.9%
Parse failure rate	40.2%	0.0%	-40.2%

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "npanium/deepseek-r1-qwen7b-smartcontract-grpo"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

contract = """
pragma solidity ^0.4.18;
contract Vulnerable {
    mapping(address => uint) public balances;

    function withdraw(uint _amount) public {
        if (balances[msg.sender] >= _amount) {
            msg.sender.call.value(_amount)();
            balances[msg.sender] -= _amount;
        }
    }
}
"""

prompt = f"""Analyze this Solidity smart contract for security vulnerabilities.
Think step by step inside <think> tags, then provide your assessment.

``solidity
{contract}``


Use this exact format:
VULNERABLE: yes/no
DASP_CATEGORY: [1-9]
SWC_ID: [100-136]
EXPLANATION: [one sentence]"""

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)
if hasattr(inputs, "input_ids"):
    inputs = inputs.input_ids
inputs = inputs.to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=1024,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(
    outputs[0][inputs.shape[1]:],
    skip_special_tokens=True,
)
print(response)

Training Details

Dataset

CGT (Consolidated Ground Truth) — github.com/gsalzer/cgt

Consolidates 13 prior smart contract vulnerability datasets. Labels cross-validated across source datasets.

Split	Examples
Train	5,910
Test (locked)	1,478

Training Hyperparameters

Parameter	Value
Training regime	bf16 mixed precision
Learning rate	5e-6
LoRA rank	16
LoRA alpha	32
LoRA target modules	q_proj, v_proj, k_proj, o_proj
Generations per prompt	8
Max completion length	1024
Gradient accumulation steps	8
Epochs	1

Hardware

GPU: NVIDIA A100 SXM4 80GB
Cloud provider: Fluence Network
Training duration: ~48 hours

Limitations

Outcome reward only. The reward function validates whether the final label is correct, not whether the reasoning is valid. The model may produce plausible-sounding analysis that doesn't actually justify the conclusion.

SWC ID regression. Post-training SWC ID accuracy dropped to zero. The model prioritised the higher-weighted binary detection reward at the expense of fine-grained weakness classification.

Context window. Contracts exceeding ~4,000 characters were excluded from training. Performance on very large contracts is untested.

Citation

@misc{pandav2026scvulnrl,
  author = {Pandav, Nishant},
  title  = {Smart Contract Vulnerability Detection via RL Fine-Tuning},
  year   = {2026},
  url    = {https://github.com/npanium/smartcontracts-vulnerability-r1}
}

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for npanium/deepseek-r1-qwen7b-smartcontract-grpo

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Adapter

(113)

this model