OffensiveSET β Smart Contract Security Auditor
A fine-tuned version of Qwen 2.5 Coder 7B Instruct specialized in smart contract security auditing. Trained on 5,000 multi-turn penetration testing conversations covering DeFi protocols, governance attacks, cross-chain bridges, token/NFT vulnerabilities, and core logic flaws.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen 2.5 Coder 7B Instruct |
| Training Method | LoRA (r=64, alpha=128) |
| Dataset Size | 5,000 conversations (~50M tokens) |
| Training Steps | 846 (3 epochs) |
| Final Loss | 0.215 |
| Context Length | 4,096 tokens |
| Precision | BF16 |
Capabilities
This model is trained to perform as a senior smart contract security auditor, capable of:
- π Code Review β Systematic analysis of Solidity smart contracts
- π‘οΈ Vulnerability Detection β Reentrancy, access control, oracle manipulation, integer overflow, and 24+ vulnerability categories
- π§ͺ Proof of Concept β Writing Foundry/Forge test cases to demonstrate vulnerabilities
- π Impact Analysis β Quantifying financial risk with CVSS scoring
- π Audit Report Generation β Professional findings with severity, description, attack path, and remediation
- π Static Analysis Interpretation β Reading and triaging Slither, Mythril, and Semgrep outputs
- π Cross-Chain Security β Bridge replay attacks, signature replay, message spoofing
Vulnerability Coverage
| Category | Scenarios |
|---|---|
| DeFi Protocol Vulnerabilities | Staking exploits, oracle manipulation, MEV front-running, AMM slippage, bridge fees |
| Governance & Access Control | Vote weight manipulation, timelock bypass, proxy initialization, signature replay |
| Cross-Chain & Bridge | Replay attacks, nonce reuse, message spoofing, race conditions, finality assumptions |
| Token & NFT | ERC-20 inflation, ERC-721 bypass, storage collision, ERC-4626 attacks, vesting exploits |
| Core Logic & Math | Reentrancy, integer overflow, rounding precision, DoS/griefing, signature malleability |
Training Data
The dataset was generated using Modified Version of OffensiveSET, an MCP server that produces realistic multi-turn smart contract audit conversations. Each entry follows a real auditor workflow:
- Code Review & Architecture Analysis
- Static Analysis & Automated Scanning (Slither, Semgrep, Mythril)
- Proof of Concept Development (Foundry/Forge tests, Cast, Anvil)
- Impact Analysis (Tenderly simulation, financial quantification)
- Audit Report Writing (CVSS, SWC/CWE references, secure code fixes)
Dataset Statistics
- Total entries: 5,000
- Average tokens per entry: ~10,000
- Thinking blocks: 61% of entries include chain-of-thought reasoning
- Failure cases: 34% include test failures, false positives, and pivot strategies
- Quality score: 0.83/1.00 (75% rated High Quality)
- Tool coverage: 14/18 tools used across the dataset
Usage
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "AbdelrehmanFouad/offensiveset-qwen25-coder-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
prompt = """Review this Solidity smart contract for security vulnerabilities:
```solidity
function withdraw(uint256 amount) external {
require(userBalances[msg.sender] >= amount);
(bool success, ) = msg.sender.call{value: amount}("");
userBalances[msg.sender] -= amount;
}
Provide a detailed audit finding with severity, description, and fix."""
messages = [ {"role": "system", "content": "You are a senior smart contract security auditor."}, {"role": "user", "content": prompt} ]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
### With vLLM (Faster Inference)
```python
from vllm import LLM, SamplingParams
llm = LLM(model="AbdelrehmanFouad/offensiveset-qwen25-coder-7b")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=2048)
prompt = "Review this Solidity contract for vulnerabilities..."
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)
Ollama
ollama pull qwen2.5-coder:7b
# Then merge the LoRA adapter and serve via Ollama modelfile
Example Output
## Audit Finding: Reentrancy Vulnerability in withdraw()
| Attribute | Value |
|-----------|-------|
| Severity | High |
| CVSS Vector | CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H |
| SWC | SWC-107 |
| CWE | CWE-841 |
| Contract | Vault.sol |
| Function | withdraw(uint256 amount) |
### Description
The withdraw function performs an external call via msg.sender.call{value: amount}
before updating the user's balance. This allows a malicious contract to re-enter
the withdraw function and drain the entire vault before the balance is decremented.
### Attack Path
1. Attacker deploys malicious contract with a receive() function
2. Attacker calls withdraw(1 ether) from the malicious contract
3. Vault sends 1 ether to attacker's contract, triggering receive()
4. receive() calls withdraw(1 ether) again β balance hasn't been updated yet
5. Repeat until vault is drained
### Impact
Full vault drainage. Total funds at risk: entire contract balance.
### Remediation
Apply the Checks-Effects-Interactions pattern:
```solidity
function withdraw(uint256 amount) external {
require(userBalances[msg.sender] >= amount); // Check
userBalances[msg.sender] -= amount; // Effect (BEFORE external call)
(bool success, ) = msg.sender.call{value: amount}(""); // Interaction
require(success);
}
Alternatively, use OpenZeppelin's ReentrancyGuard.
## Training Details
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| Learning Rate | 1e-4 (cosine decay) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Batch Size | 16 (effective: 2 per GPU Γ 8 accumulation) |
| Max Sequence Length | 4,096 |
| Epochs | 3 |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.01 |
| Optimizer | AdamW (fused) |
### Hardware
- **GPU:** 1Γ NVIDIA A100 SXM4 80GB
- **Precision:** BF16 (full precision, no quantization)
- **Training Time:** ~1.5 hours
## Limitations
- Trained on **synthetic audit conversations**, not real-world audit reports
- May produce overly verbose responses for simple questions
- Not tested on Rust (Solana), Move (Sui/Aptos), or other non-Solidity chains
- Best used as an **auditing assistant**, not a replacement for human review
- No guarantee of completeness β always use multiple auditors and formal verification
## Disclaimer
β οΈ **This model is for educational and research purposes only.** Smart contract audit findings generated by this model should always be verified by qualified human auditors. Do not rely on this model for production security decisions.
## License
MIT β same as the base model and the OffensiveSET dataset generator.
## Author
**Abdelrehman Fouad **
- Model: [AbdelrehmanFouad/offensiveset-qwen25-coder-7b](https://huggingface.co/AbdelrehmanFouad/offensiveset-qwen25-coder-7b)
Model tree for AbdelrehmanFouad/offensiveset-qwen25-coder-7b
Base model
Qwen/Qwen2.5-7B