Email Phishing Detection Model (RoBERTa + LoRA)

Classifies email content as Safe or Phishing.

Built on FacebookAI/roberta-base with LoRA fine-tuning.

Labels

  • LABEL_0 / Safe
  • LABEL_1 / Phishing

Model Details

  • Base model: FacebookAI/roberta-base
  • Task: Binary text classification
  • LoRA parameters: r=16, alpha=32, dropout=0.1
  • LoRA target modules: query, value
  • Max input length: 128 tokens
  • Weights file in this project: roberta_lora_phishing_detector.pt

How Inference Works (Model-Level)

  1. Build input text from email subject + body.
  2. Tokenize with RoBERTa tokenizer (padding=max_length, truncation=True, max_length=128).
  3. Run sequence classification forward pass.
  4. Apply softmax to logits.
  5. Return label + confidence and class probabilities.

How Aegis DLP Uses This Model (System-Level)

In Aegis DLP, this model is one signal in a multi-factor phishing decision pipeline:

  • AI body analysis: 40%
  • URL analysis against trusted domains: 25%
  • Attachment analysis (YARA): 15%
  • Content keyword heuristics: 10%
  • Sender trust / suspicious TLD signal: 10%

Decision policy used in the DLP app:

  • If any factor >= 0.85 -> force Phishing (single-factor escalation)
  • Else if weighted score >= 0.90 -> Safe
  • Else if weighted score >= 0.35 -> Phishing
  • Else -> Safe with needs_review=true

Additional app rules around this model:

  • Trusted sender/domain whitelist can short-circuit to Safe
  • Non-English content is routed to Unknown and review in app flow

Usage (transformers pipeline)

from transformers import pipeline

clf = pipeline("text-classification", model="YOUR_ORG/YOUR_MODEL")
print(clf("Urgent: Verify your account now to avoid suspension."))

Usage in This DLP Codebase

from modules.body_classifier import predict_body_label

label, confidence, probs = predict_body_label(
    "Urgent: Verify your account now to avoid suspension."
)
print(label, confidence, probs)

Intended Use

  • Email phishing risk triage
  • SOC/security analyst support workflows
  • Security-aware email filtering pipelines

Limitations

  • Primarily designed for English email content.
  • Final DLP outcome is multi-factor and not based on this model alone.
  • Confidence scores are model outputs and may not be calibrated risk probabilities.

Responsible Use

Use this model to assist review workflows, not as the sole basis for high-impact security or compliance decisions.

Downloads last month
35
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support