OffensiveSET β€” Smart Contract Security Auditor

A fine-tuned version of Qwen 2.5 Coder 7B Instruct specialized in smart contract security auditing. Trained on 5,000 multi-turn penetration testing conversations covering DeFi protocols, governance attacks, cross-chain bridges, token/NFT vulnerabilities, and core logic flaws.

Model Details

Property Value
Base Model Qwen 2.5 Coder 7B Instruct
Training Method LoRA (r=64, alpha=128)
Dataset Size 5,000 conversations (~50M tokens)
Training Steps 846 (3 epochs)
Final Loss 0.215
Context Length 4,096 tokens
Precision BF16

Capabilities

This model is trained to perform as a senior smart contract security auditor, capable of:

  • πŸ” Code Review β€” Systematic analysis of Solidity smart contracts
  • πŸ›‘οΈ Vulnerability Detection β€” Reentrancy, access control, oracle manipulation, integer overflow, and 24+ vulnerability categories
  • πŸ§ͺ Proof of Concept β€” Writing Foundry/Forge test cases to demonstrate vulnerabilities
  • πŸ“Š Impact Analysis β€” Quantifying financial risk with CVSS scoring
  • πŸ“ Audit Report Generation β€” Professional findings with severity, description, attack path, and remediation
  • πŸ”„ Static Analysis Interpretation β€” Reading and triaging Slither, Mythril, and Semgrep outputs
  • πŸ”— Cross-Chain Security β€” Bridge replay attacks, signature replay, message spoofing

Vulnerability Coverage

Category Scenarios
DeFi Protocol Vulnerabilities Staking exploits, oracle manipulation, MEV front-running, AMM slippage, bridge fees
Governance & Access Control Vote weight manipulation, timelock bypass, proxy initialization, signature replay
Cross-Chain & Bridge Replay attacks, nonce reuse, message spoofing, race conditions, finality assumptions
Token & NFT ERC-20 inflation, ERC-721 bypass, storage collision, ERC-4626 attacks, vesting exploits
Core Logic & Math Reentrancy, integer overflow, rounding precision, DoS/griefing, signature malleability

Training Data

The dataset was generated using Modified Version of OffensiveSET, an MCP server that produces realistic multi-turn smart contract audit conversations. Each entry follows a real auditor workflow:

  1. Code Review & Architecture Analysis
  2. Static Analysis & Automated Scanning (Slither, Semgrep, Mythril)
  3. Proof of Concept Development (Foundry/Forge tests, Cast, Anvil)
  4. Impact Analysis (Tenderly simulation, financial quantification)
  5. Audit Report Writing (CVSS, SWC/CWE references, secure code fixes)

Dataset Statistics

  • Total entries: 5,000
  • Average tokens per entry: ~10,000
  • Thinking blocks: 61% of entries include chain-of-thought reasoning
  • Failure cases: 34% include test failures, false positives, and pivot strategies
  • Quality score: 0.83/1.00 (75% rated High Quality)
  • Tool coverage: 14/18 tools used across the dataset

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "AbdelrehmanFouad/offensiveset-qwen25-coder-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = """Review this Solidity smart contract for security vulnerabilities:

```solidity
function withdraw(uint256 amount) external {
    require(userBalances[msg.sender] >= amount);
    (bool success, ) = msg.sender.call{value: amount}("");
    userBalances[msg.sender] -= amount;
}

Provide a detailed audit finding with severity, description, and fix."""

messages = [ {"role": "system", "content": "You are a senior smart contract security auditor."}, {"role": "user", "content": prompt} ]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True))


### With vLLM (Faster Inference)

```python
from vllm import LLM, SamplingParams

llm = LLM(model="AbdelrehmanFouad/offensiveset-qwen25-coder-7b")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=2048)

prompt = "Review this Solidity contract for vulnerabilities..."
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)

Ollama

ollama pull qwen2.5-coder:7b
# Then merge the LoRA adapter and serve via Ollama modelfile

Example Output

## Audit Finding: Reentrancy Vulnerability in withdraw()

| Attribute | Value |
|-----------|-------|
| Severity | High |
| CVSS Vector | CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H |
| SWC | SWC-107 |
| CWE | CWE-841 |
| Contract | Vault.sol |
| Function | withdraw(uint256 amount) |

### Description
The withdraw function performs an external call via msg.sender.call{value: amount} 
before updating the user's balance. This allows a malicious contract to re-enter 
the withdraw function and drain the entire vault before the balance is decremented.

### Attack Path
1. Attacker deploys malicious contract with a receive() function
2. Attacker calls withdraw(1 ether) from the malicious contract
3. Vault sends 1 ether to attacker's contract, triggering receive()
4. receive() calls withdraw(1 ether) again β€” balance hasn't been updated yet
5. Repeat until vault is drained

### Impact
Full vault drainage. Total funds at risk: entire contract balance.

### Remediation
Apply the Checks-Effects-Interactions pattern:

```solidity
function withdraw(uint256 amount) external {
    require(userBalances[msg.sender] >= amount);  // Check
    userBalances[msg.sender] -= amount;            // Effect (BEFORE external call)
    (bool success, ) = msg.sender.call{value: amount}("");  // Interaction
    require(success);
}

Alternatively, use OpenZeppelin's ReentrancyGuard.


## Training Details

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Learning Rate | 1e-4 (cosine decay) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Batch Size | 16 (effective: 2 per GPU Γ— 8 accumulation) |
| Max Sequence Length | 4,096 |
| Epochs | 3 |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.01 |
| Optimizer | AdamW (fused) |

### Hardware

- **GPU:** 1Γ— NVIDIA A100 SXM4 80GB
- **Precision:** BF16 (full precision, no quantization)
- **Training Time:** ~1.5 hours

## Limitations

- Trained on **synthetic audit conversations**, not real-world audit reports
- May produce overly verbose responses for simple questions
- Not tested on Rust (Solana), Move (Sui/Aptos), or other non-Solidity chains
- Best used as an **auditing assistant**, not a replacement for human review
- No guarantee of completeness β€” always use multiple auditors and formal verification

## Disclaimer

⚠️ **This model is for educational and research purposes only.** Smart contract audit findings generated by this model should always be verified by qualified human auditors. Do not rely on this model for production security decisions.

## License

MIT β€” same as the base model and the OffensiveSET dataset generator.

## Author

**Abdelrehman Fouad **
- Model: [AbdelrehmanFouad/offensiveset-qwen25-coder-7b](https://huggingface.co/AbdelrehmanFouad/offensiveset-qwen25-coder-7b)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AbdelrehmanFouad/offensiveset-qwen25-coder-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(65)
this model

Space using AbdelrehmanFouad/offensiveset-qwen25-coder-7b 1