πŸ›‘οΈ Llama-3.2-3B-Cybersec-Slopsquatting-V2

Fine-tuned for AI Supply Chain Threat Detection

Dataset GitHub Base Model


🧠 Model Summary

This is a QLoRA fine-tuned version of unsloth/Llama-3.2-3B-Instruct, specialized for detecting slopsquatting β€” a supply chain attack where adversaries register Python package names that AI code assistants are known to hallucinate.

Given a user prompt and an AI-suggested code snippet, the model performs binary classification:

Output Meaning
0 βœ… Safe β€” packages appear legitimate
1 🚫 Threat β€” hallucinated or suspicious package detected

This model is the core classifier inside the CyberSID agentic pipeline, which pairs it with a live PyPI API verification tool to eliminate false positives before blocking execution.


⚑ The Problem It Solves

~20% of packages recommended by AI code assistants do not exist. Open-source LLMs hallucinate packages at an average rate of 21.7%. Attackers register these hallucinated names with malicious payloads.

Traditional classifiers (Random Forest, Logistic Regression) fail this task because they rely on lexical pattern matching β€” they get fooled by adversarially named packages like pandas-data-helper. This model analyzes semantic intent of the code context, not just the package name string.


πŸš€ How to Use

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto"
)

def check_threat(user_prompt: str, ai_suggestion: str) -> str:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a cybersecurity expert specializing in detecting slopsquatting attacks. "
                "Analyze the AI's code suggestion and determine if it contains hallucinated or "
                "non-existent Python packages. Output ONLY '0' (safe) or '1' (threat)."
            )
        },
        {
            "role": "user",
            "content": f"User asked: {user_prompt}\n\nAI suggested:\n{ai_suggestion}"
        }
    ]

    input_ids = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=5, temperature=0.1)

    result = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True).strip()
    return "🚫 THREAT DETECTED" if result == "1" else "βœ… SAFE"


# Example
prompt = "How do I parse a PDF in Python?"
suggestion = """
import pdf-parse-ultra
doc = pdf-parse-ultra.load('report.pdf')
"""

print(check_threat(prompt, suggestion))
# β†’ 🚫 THREAT DETECTED

With Unsloth (Faster Inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

πŸ“Š Evaluation Results

In-Distribution Performance

Split Accuracy
Training Validation 100%

Out-of-Distribution (OOD) Performance

Evaluated on 130 unseen samples spanning Unreal Engine scripting, AI agent frameworks, and financial data processing β€” domains the model was never trained on.

Metric Score
🎯 OOD Accuracy 73.8%
🚨 Recall (Threat Detection Rate) 92.3%
βœ… True Threats Caught 60 / 65
⚠️ False Positives 29 / 65 safe packages flagged

Why 73.8% OOD is the honest result: The model is intentionally paranoid. In cybersecurity, false positives are acceptable β€” false negatives are not. A 92.3% recall means only 5 zero-day threats slipped through out of 65. The agent layer (PyPI live lookup) handles the false positives automatically.

vs. Traditional ML Baselines

Model Accuracy Recall Adversarial Naming
Random Forest (TF-IDF) 99.5% 98.9% ❌ Fails β€” lexical overfitting
Logistic Regression 94.5% 88.0% ❌ Fails β€” keyword matching
πŸ›‘οΈ This Model (CyberSID) 73.8% OOD 92.3% βœ… Handles β€” semantic reasoning

πŸ”§ Training Details

Parameter Value
Base Model unsloth/Llama-3.2-3B-Instruct
Fine-tuning Method QLoRA (4-bit NF4 quantization)
Hardware Google Colab T4 GPU (free tier)
Quantization 4-bit (NF4)
Framework Unsloth + HuggingFace Transformers
Training Dataset ShravSiddhpura/cybersec-slopsquatting-crag
Task Binary sequence classification

⚠️ Intended Use & Limitations

Intended for:

  • Intercepting AI-generated code before execution in developer pipelines
  • CI/CD security layers that need lightweight LLM-based package validation
  • Research into LLM hallucination detection and supply chain security

Not intended for:

  • General-purpose code review or static analysis
  • Standalone use without secondary verification (always pair with a live registry lookup)
  • Non-Python package ecosystems (npm, cargo, etc.) β€” not trained on those

Known Limitations:

  • 26.2% OOD accuracy gap indicates the model can struggle with highly novel domain vocabulary
  • False positive rate is high (~44% of safe packages flagged) β€” must be used with a PyPI verification tool, not standalone
  • Model was trained on synthetic LLM-generated data; real-world distribution may differ

πŸ“¦ Related Resources

Resource Link
πŸ“‚ Full Project (GitHub) CyberSiddh
πŸ“Š Training Dataset cybersec-slopsquatting-crag

πŸ“œ Citation

If you use this model in your research or projects:

@misc{siddhpura2026cybersid,
  author    = {Shrav Siddhpura},
  title     = {CyberSID: AI-Powered Slopsquatting Detection via Fine-tuned Llama-3.2-3B},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2}
}

Built to make AI-assisted development safer, one pip install at a time.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ShravSiddhpura/Llama-3.2-3B-Cybersec-Slopsquatting-V2

Finetuned
(571)
this model