OffensiveSET — Smart Contract Security Auditor

A fine-tuned version of Qwen 2.5 Coder 7B Instruct specialized in smart contract security auditing. Trained on 5,000 multi-turn penetration testing conversations covering DeFi protocols, governance attacks, cross-chain bridges, token/NFT vulnerabilities, and core logic flaws.

Model Details

Property	Value
Base Model	Qwen 2.5 Coder 7B Instruct
Training Method	LoRA (r=64, alpha=128)
Dataset Size	5,000 conversations (~50M tokens)
Training Steps	846 (3 epochs)
Final Loss	0.215
Context Length	4,096 tokens
Precision	BF16

Capabilities

This model is trained to perform as a senior smart contract security auditor, capable of:

🔍 Code Review — Systematic analysis of Solidity smart contracts
🛡️ Vulnerability Detection — Reentrancy, access control, oracle manipulation, integer overflow, and 24+ vulnerability categories
🧪 Proof of Concept — Writing Foundry/Forge test cases to demonstrate vulnerabilities
📊 Impact Analysis — Quantifying financial risk with CVSS scoring
📝 Audit Report Generation — Professional findings with severity, description, attack path, and remediation
🔄 Static Analysis Interpretation — Reading and triaging Slither, Mythril, and Semgrep outputs
🔗 Cross-Chain Security — Bridge replay attacks, signature replay, message spoofing

Vulnerability Coverage

Category	Scenarios
DeFi Protocol Vulnerabilities	Staking exploits, oracle manipulation, MEV front-running, AMM slippage, bridge fees
Governance & Access Control	Vote weight manipulation, timelock bypass, proxy initialization, signature replay
Cross-Chain & Bridge	Replay attacks, nonce reuse, message spoofing, race conditions, finality assumptions
Token & NFT	ERC-20 inflation, ERC-721 bypass, storage collision, ERC-4626 attacks, vesting exploits
Core Logic & Math	Reentrancy, integer overflow, rounding precision, DoS/griefing, signature malleability

Training Data

The dataset was generated using Modified Version of OffensiveSET, an MCP server that produces realistic multi-turn smart contract audit conversations. Each entry follows a real auditor workflow:

Code Review & Architecture Analysis
Static Analysis & Automated Scanning (Slither, Semgrep, Mythril)
Proof of Concept Development (Foundry/Forge tests, Cast, Anvil)
Impact Analysis (Tenderly simulation, financial quantification)
Audit Report Writing (CVSS, SWC/CWE references, secure code fixes)

Dataset Statistics

Total entries: 5,000
Average tokens per entry: ~10,000
Thinking blocks: 61% of entries include chain-of-thought reasoning
Failure cases: 34% include test failures, false positives, and pivot strategies
Quality score: 0.83/1.00 (75% rated High Quality)
Tool coverage: 14/18 tools used across the dataset

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "AbdelrehmanFouad/offensiveset-qwen25-coder-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = """Review this Solidity smart contract for security vulnerabilities:

```solidity
function withdraw(uint256 amount) external {
    require(userBalances[msg.sender] >= amount);
    (bool success, ) = msg.sender.call{value: amount}("");
    userBalances[msg.sender] -= amount;
}

Provide a detailed audit finding with severity, description, and fix."""

messages = [ {"role": "system", "content": "You are a senior smart contract security auditor."}, {"role": "user", "content": prompt} ]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True))


### With vLLM (Faster Inference)

```python
from vllm import LLM, SamplingParams

llm = LLM(model="AbdelrehmanFouad/offensiveset-qwen25-coder-7b")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=2048)

prompt = "Review this Solidity contract for vulnerabilities..."
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)

Ollama

ollama pull qwen2.5-coder:7b
# Then merge the LoRA adapter and serve via Ollama modelfile

Example Output

## Audit Finding: Reentrancy Vulnerability in withdraw()

| Attribute | Value |
|-----------|-------|
| Severity | High |
| CVSS Vector | CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H |
| SWC | SWC-107 |
| CWE | CWE-841 |
| Contract | Vault.sol |
| Function | withdraw(uint256 amount) |

### Description
The withdraw function performs an external call via msg.sender.call{value: amount} 
before updating the user's balance. This allows a malicious contract to re-enter 
the withdraw function and drain the entire vault before the balance is decremented.

### Attack Path
1. Attacker deploys malicious contract with a receive() function
2. Attacker calls withdraw(1 ether) from the malicious contract
3. Vault sends 1 ether to attacker's contract, triggering receive()
4. receive() calls withdraw(1 ether) again — balance hasn't been updated yet
5. Repeat until vault is drained

### Impact
Full vault drainage. Total funds at risk: entire contract balance.

### Remediation
Apply the Checks-Effects-Interactions pattern:

```solidity
function withdraw(uint256 amount) external {
    require(userBalances[msg.sender] >= amount);  // Check
    userBalances[msg.sender] -= amount;            // Effect (BEFORE external call)
    (bool success, ) = msg.sender.call{value: amount}("");  // Interaction
    require(success);
}

Alternatively, use OpenZeppelin's ReentrancyGuard.


## Training Details

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Learning Rate | 1e-4 (cosine decay) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Batch Size | 16 (effective: 2 per GPU × 8 accumulation) |
| Max Sequence Length | 4,096 |
| Epochs | 3 |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.01 |
| Optimizer | AdamW (fused) |

### Hardware

- **GPU:** 1× NVIDIA A100 SXM4 80GB
- **Precision:** BF16 (full precision, no quantization)
- **Training Time:** ~1.5 hours

## Limitations

- Trained on **synthetic audit conversations**, not real-world audit reports
- May produce overly verbose responses for simple questions
- Not tested on Rust (Solana), Move (Sui/Aptos), or other non-Solidity chains
- Best used as an **auditing assistant**, not a replacement for human review
- No guarantee of completeness — always use multiple auditors and formal verification

## Disclaimer

⚠️ **This model is for educational and research purposes only.** Smart contract audit findings generated by this model should always be verified by qualified human auditors. Do not rely on this model for production security decisions.

## License

MIT — same as the base model and the OffensiveSET dataset generator.

## Author

**Abdelrehman Fouad **
- Model: [AbdelrehmanFouad/offensiveset-qwen25-coder-7b](https://huggingface.co/AbdelrehmanFouad/offensiveset-qwen25-coder-7b)