DistilBERT Email Fraud Detection β Phase 1 Baseline
1. Overview
This repository contains a fine-tuned distilbert-base-uncased model for binary classification of email messages:
- 0 β Legitimate (Ham)
- 1 β Fraud / Spam
This model represents Phase 1 of a multi-stage fraud detection system. The objective of this phase is to establish a strong, reproducible, text-only baseline using a transformer architecture before integrating additional metadata and adversarial strategies in subsequent phases.
The system is designed for research, experimentation, and as a foundation for more advanced email security architectures.
2. Motivation and Approach
Fraud detection in email communication is inherently adversarial. Attackers continuously adapt their language and techniques. Effective systems require layered defenses rather than relying on a single signal source.
Instead of immediately building a complex hybrid system, this project follows a staged engineering approach:
Phase 1 β Text-only transformer baseline
Phase 2 β Text + structured metadata (sender domain, headers, reputation signals)
Phase 3 β Adversarial and synthetic augmentation
Phase 4 β Temporal adaptation and drift monitoring
The purpose of Phase 1 is to:
- Establish a measurable transformer baseline
- Evaluate the strength of linguistic fraud signals
- Identify systematic failure modes
- Build a reproducible training pipeline
- Prepare for multi-modal integration
This model isolates the contribution of textual semantics to fraud detection performance.
3. Dataset
Training was conducted on the Enron Spam Dataset (public research corpus).
Preprocessing steps:
- Email body extracted
- MIME boundaries removed
- HTML converted to plain text
- Attachments removed
- Case, punctuation, and stopwords preserved
- Duplicate emails removed
- Emails shorter than 10 characters removed
Data split strategy:
- 70% Training
- 15% Validation
- 15% Testing
- Stratified by label distribution
Important limitation:
The Enron dataset reflects early 2000s spam patterns and does not represent modern spear-phishing, business email compromise, or AI-generated scams.
4. Model Architecture
Base model: distilbert-base-uncased
Classification head: Linear layer with 2 output neurons
Maximum sequence length: 256 tokens
Loss function: Class-weighted CrossEntropyLoss (to address imbalance)
Optimizer: AdamW
Learning rate: 2e-5
Weight decay: 0.01
Warmup: 10% of total training steps
Scheduler: Linear schedule with warmup
Mixed precision training: Enabled when GPU available
Early stopping: Validation loss monitored with patience = 2
5. Evaluation Results
Evaluation performed on held-out test set.
Metrics reported:
- F1 Score
- Precision
- Recall
- ROC-AUC
Typical performance on Enron:
F1 Score: 0.93 β 0.96
ROC-AUC: ~0.97
Threshold was optimized using validation data to maximize F1 or enforce high recall.
Fraud detection systems prioritize recall to minimize false negatives.
6. Error Analysis (Phase 1 Findings)
Manual inspection of misclassified samples revealed:
False Positives:
- Financial newsletters
- Promotional emails from legitimate organizations
- Corporate announcements containing urgency language
False Negatives:
- Polished spear-phishing emails
- Messages mimicking internal corporate tone
- Subtle impersonation attempts
Key observation:
Text-only transformers capture strong lexical fraud signals (urgency, financial triggers, threat language), but struggle when attackers successfully mimic legitimate structure and tone.
This validates the need for metadata integration in Phase 2.
7. Robustness Checks
The following analyses were conducted:
- Performance comparison across short vs long emails
- Confidence distribution inspection
- Probability calibration curve analysis
Findings:
- Strong separation for obvious spam
- Reduced confidence in ambiguous borderline cases
- Slight overconfidence in higher probability ranges
Threshold tuning improves recall-oriented configurations.
8. Intended Use
This model is intended for:
- Research and experimentation
- Baseline transformer benchmarking
- Educational purposes
- Integration into multi-stage fraud detection systems
It is not intended to function as a standalone production-grade email security system.
Production systems require:
- Sender metadata analysis
- Domain reputation scoring
- URL inspection
- Attachment scanning
- Behavioral feedback loops
- Continuous retraining
9. Limitations
- Trained on historical dataset
- Text-only model
- No header-based analysis
- No domain verification
- No adversarial training
- No temporal drift modeling
Performance may degrade on:
- Modern spear-phishing
- AI-generated scam emails
- Highly contextual impersonation attacks
10. Roadmap
Phase 2: Hybrid architecture combining DistilBERT embeddings with structured metadata features.
Phase 3: Controlled synthetic adversarial augmentation using language model-generated variants.
Phase 4: Temporal evaluation and drift detection framework.
11. Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "YOUR_USERNAME/YOUR_MODEL_NAME"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Urgent: Your account has been suspended. Click here to verify."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.softmax(outputs.logits, dim=1)
print("Fraud probability:", probabilities[0][1].item())
12. Citation
If you use this model in research, please cite:
DistilBERT: Sanh et al., 2019. DistilBERT: a distilled version of BERT.
Enron Dataset: Klimt & Yang, 2004. The Enron Corpus.
- Downloads last month
- 11