🛡️ Llama-3.2-3B-Cybersec-Slopsquatting-V2

Fine-tuned for AI Supply Chain Threat Detection

🧠 Model Summary

This is a QLoRA fine-tuned version of unsloth/Llama-3.2-3B-Instruct, specialized for detecting slopsquatting — a supply chain attack where adversaries register Python package names that AI code assistants are known to hallucinate.

Given a user prompt and an AI-suggested code snippet, the model performs binary classification:

Output	Meaning
`0`	✅ Safe — packages appear legitimate
`1`	🚫 Threat — hallucinated or suspicious package detected

This model is the core classifier inside the CyberSID agentic pipeline, which pairs it with a live PyPI API verification tool to eliminate false positives before blocking execution.

⚡ The Problem It Solves

~20% of packages recommended by AI code assistants do not exist. Open-source LLMs hallucinate packages at an average rate of 21.7%. Attackers register these hallucinated names with malicious payloads.

Traditional classifiers (Random Forest, Logistic Regression) fail this task because they rely on lexical pattern matching — they get fooled by adversarially named packages like pandas-data-helper. This model analyzes semantic intent of the code context, not just the package name string.

🚀 How to Use

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto"
)

def check_threat(user_prompt: str, ai_suggestion: str) -> str:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a cybersecurity expert specializing in detecting slopsquatting attacks. "
                "Analyze the AI's code suggestion and determine if it contains hallucinated or "
                "non-existent Python packages. Output ONLY '0' (safe) or '1' (threat)."
            )
        },
        {
            "role": "user",
            "content": f"User asked: {user_prompt}\n\nAI suggested:\n{ai_suggestion}"
        }
    ]

    input_ids = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=5, temperature=0.1)

    result = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True).strip()
    return "🚫 THREAT DETECTED" if result == "1" else "✅ SAFE"


# Example
prompt = "How do I parse a PDF in Python?"
suggestion = """
import pdf-parse-ultra
doc = pdf-parse-ultra.load('report.pdf')
"""

print(check_threat(prompt, suggestion))
# → 🚫 THREAT DETECTED

With Unsloth (Faster Inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

📊 Evaluation Results

In-Distribution Performance

Split	Accuracy
Training Validation	100%

Out-of-Distribution (OOD) Performance

Evaluated on 130 unseen samples spanning Unreal Engine scripting, AI agent frameworks, and financial data processing — domains the model was never trained on.

Metric	Score
🎯 OOD Accuracy	73.8%
🚨 Recall (Threat Detection Rate)	92.3%
✅ True Threats Caught	60 / 65
⚠️ False Positives	29 / 65 safe packages flagged

Why 73.8% OOD is the honest result: The model is intentionally paranoid. In cybersecurity, false positives are acceptable — false negatives are not. A 92.3% recall means only 5 zero-day threats slipped through out of 65. The agent layer (PyPI live lookup) handles the false positives automatically.

vs. Traditional ML Baselines

Model	Accuracy	Recall	Adversarial Naming
Random Forest (TF-IDF)	99.5%	98.9%	❌ Fails — lexical overfitting
Logistic Regression	94.5%	88.0%	❌ Fails — keyword matching
🛡️ This Model (CyberSID)	73.8% OOD	92.3%	✅ Handles — semantic reasoning

🔧 Training Details

Parameter	Value
Base Model	`unsloth/Llama-3.2-3B-Instruct`
Fine-tuning Method	QLoRA (4-bit NF4 quantization)
Hardware	Google Colab T4 GPU (free tier)
Quantization	4-bit (NF4)
Framework	Unsloth + HuggingFace Transformers
Training Dataset	`ShravSiddhpura/cybersec-slopsquatting-crag`
Task	Binary sequence classification

⚠️ Intended Use & Limitations

Intended for:

Intercepting AI-generated code before execution in developer pipelines
CI/CD security layers that need lightweight LLM-based package validation
Research into LLM hallucination detection and supply chain security

Not intended for:

General-purpose code review or static analysis
Standalone use without secondary verification (always pair with a live registry lookup)
Non-Python package ecosystems (npm, cargo, etc.) — not trained on those

Known Limitations:

26.2% OOD accuracy gap indicates the model can struggle with highly novel domain vocabulary
False positive rate is high (~44% of safe packages flagged) — must be used with a PyPI verification tool, not standalone
Model was trained on synthetic LLM-generated data; real-world distribution may differ

📦 Related Resources

Resource	Link
📂 Full Project (GitHub)	CyberSiddh
📊 Training Dataset	cybersec-slopsquatting-crag

📜 Citation

If you use this model in your research or projects:

@misc{siddhpura2026cybersid,
  author    = {Shrav Siddhpura},
  title     = {CyberSID: AI-Powered Slopsquatting Detection via Fine-tuned Llama-3.2-3B},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2}
}

Built to make AI-assisted development safer, one pip install at a time.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Finetuned

(571)

this model