Email Phishing Detection Model (RoBERTa + LoRA)

Classifies email content as Safe or Phishing.

Built on FacebookAI/roberta-base with LoRA fine-tuning.

Labels

LABEL_0 / Safe
LABEL_1 / Phishing

Model Details

Base model: FacebookAI/roberta-base
Task: Binary text classification
LoRA parameters: r=16, alpha=32, dropout=0.1
LoRA target modules: query, value
Max input length: 128 tokens
Weights file in this project: roberta_lora_phishing_detector.pt

How Inference Works (Model-Level)

Build input text from email subject + body.
Tokenize with RoBERTa tokenizer (padding=max_length, truncation=True, max_length=128).
Run sequence classification forward pass.
Apply softmax to logits.
Return label + confidence and class probabilities.

How Aegis DLP Uses This Model (System-Level)

In Aegis DLP, this model is one signal in a multi-factor phishing decision pipeline:

AI body analysis: 40%
URL analysis against trusted domains: 25%
Attachment analysis (YARA): 15%
Content keyword heuristics: 10%
Sender trust / suspicious TLD signal: 10%

Decision policy used in the DLP app:

If any factor >= 0.85 -> force Phishing (single-factor escalation)
Else if weighted score >= 0.90 -> Safe
Else if weighted score >= 0.35 -> Phishing
Else -> Safe with needs_review=true

Additional app rules around this model:

Trusted sender/domain whitelist can short-circuit to Safe
Non-English content is routed to Unknown and review in app flow

Usage (transformers pipeline)

from transformers import pipeline

clf = pipeline("text-classification", model="YOUR_ORG/YOUR_MODEL")
print(clf("Urgent: Verify your account now to avoid suspension."))

Usage in This DLP Codebase

from modules.body_classifier import predict_body_label

label, confidence, probs = predict_body_label(
    "Urgent: Verify your account now to avoid suspension."
)
print(label, confidence, probs)

Intended Use

Email phishing risk triage
SOC/security analyst support workflows
Security-aware email filtering pipelines

Limitations

Primarily designed for English email content.
Final DLP outcome is multi-factor and not based on this model alone.
Confidence scores are model outputs and may not be calibrated risk probabilities.

Responsible Use

Use this model to assist review workflows, not as the sole basis for high-impact security or compliance decisions.

Downloads last month: 35

Safetensors

Model size

0.1B params

Tensor type

F32