π§ Model Summary
This is a QLoRA fine-tuned version of unsloth/Llama-3.2-3B-Instruct, specialized for detecting slopsquatting β a supply chain attack where adversaries register Python package names that AI code assistants are known to hallucinate.
Given a user prompt and an AI-suggested code snippet, the model performs binary classification:
| Output | Meaning |
|---|---|
0 |
β Safe β packages appear legitimate |
1 |
π« Threat β hallucinated or suspicious package detected |
This model is the core classifier inside the CyberSID agentic pipeline, which pairs it with a live PyPI API verification tool to eliminate false positives before blocking execution.
β‘ The Problem It Solves
~20% of packages recommended by AI code assistants do not exist. Open-source LLMs hallucinate packages at an average rate of 21.7%. Attackers register these hallucinated names with malicious payloads.
Traditional classifiers (Random Forest, Logistic Regression) fail this task because they rely on lexical pattern matching β they get fooled by adversarially named packages like pandas-data-helper. This model analyzes semantic intent of the code context, not just the package name string.
π How to Use
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=True,
device_map="auto"
)
def check_threat(user_prompt: str, ai_suggestion: str) -> str:
messages = [
{
"role": "system",
"content": (
"You are a cybersecurity expert specializing in detecting slopsquatting attacks. "
"Analyze the AI's code suggestion and determine if it contains hallucinated or "
"non-existent Python packages. Output ONLY '0' (safe) or '1' (threat)."
)
},
{
"role": "user",
"content": f"User asked: {user_prompt}\n\nAI suggested:\n{ai_suggestion}"
}
]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=5, temperature=0.1)
result = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True).strip()
return "π« THREAT DETECTED" if result == "1" else "β
SAFE"
# Example
prompt = "How do I parse a PDF in Python?"
suggestion = """
import pdf-parse-ultra
doc = pdf-parse-ultra.load('report.pdf')
"""
print(check_threat(prompt, suggestion))
# β π« THREAT DETECTED
With Unsloth (Faster Inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2",
max_seq_length=2048,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
π Evaluation Results
In-Distribution Performance
| Split | Accuracy |
|---|---|
| Training Validation | 100% |
Out-of-Distribution (OOD) Performance
Evaluated on 130 unseen samples spanning Unreal Engine scripting, AI agent frameworks, and financial data processing β domains the model was never trained on.
| Metric | Score |
|---|---|
| π― OOD Accuracy | 73.8% |
| π¨ Recall (Threat Detection Rate) | 92.3% |
| β True Threats Caught | 60 / 65 |
| β οΈ False Positives | 29 / 65 safe packages flagged |
Why 73.8% OOD is the honest result: The model is intentionally paranoid. In cybersecurity, false positives are acceptable β false negatives are not. A 92.3% recall means only 5 zero-day threats slipped through out of 65. The agent layer (PyPI live lookup) handles the false positives automatically.
vs. Traditional ML Baselines
| Model | Accuracy | Recall | Adversarial Naming |
|---|---|---|---|
| Random Forest (TF-IDF) | 99.5% | 98.9% | β Fails β lexical overfitting |
| Logistic Regression | 94.5% | 88.0% | β Fails β keyword matching |
| π‘οΈ This Model (CyberSID) | 73.8% OOD | 92.3% | β Handles β semantic reasoning |
π§ Training Details
| Parameter | Value |
|---|---|
| Base Model | unsloth/Llama-3.2-3B-Instruct |
| Fine-tuning Method | QLoRA (4-bit NF4 quantization) |
| Hardware | Google Colab T4 GPU (free tier) |
| Quantization | 4-bit (NF4) |
| Framework | Unsloth + HuggingFace Transformers |
| Training Dataset | ShravSiddhpura/cybersec-slopsquatting-crag |
| Task | Binary sequence classification |
β οΈ Intended Use & Limitations
Intended for:
- Intercepting AI-generated code before execution in developer pipelines
- CI/CD security layers that need lightweight LLM-based package validation
- Research into LLM hallucination detection and supply chain security
Not intended for:
- General-purpose code review or static analysis
- Standalone use without secondary verification (always pair with a live registry lookup)
- Non-Python package ecosystems (npm, cargo, etc.) β not trained on those
Known Limitations:
- 26.2% OOD accuracy gap indicates the model can struggle with highly novel domain vocabulary
- False positive rate is high (~44% of safe packages flagged) β must be used with a PyPI verification tool, not standalone
- Model was trained on synthetic LLM-generated data; real-world distribution may differ
π¦ Related Resources
| Resource | Link |
|---|---|
| π Full Project (GitHub) | CyberSiddh |
| π Training Dataset | cybersec-slopsquatting-crag |
π Citation
If you use this model in your research or projects:
@misc{siddhpura2026cybersid,
author = {Shrav Siddhpura},
title = {CyberSID: AI-Powered Slopsquatting Detection via Fine-tuned Llama-3.2-3B},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2}
}
Model tree for ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2
Base model
meta-llama/Llama-3.2-3B-Instruct