DistilBERT Email Fraud Detection – Phase 1 Baseline

1. Overview

This repository contains a fine-tuned distilbert-base-uncased model for binary classification of email messages:

0 → Legitimate (Ham)
1 → Fraud / Spam

This model represents Phase 1 of a multi-stage fraud detection system. The objective of this phase is to establish a strong, reproducible, text-only baseline using a transformer architecture before integrating additional metadata and adversarial strategies in subsequent phases.

The system is designed for research, experimentation, and as a foundation for more advanced email security architectures.

2. Motivation and Approach

Fraud detection in email communication is inherently adversarial. Attackers continuously adapt their language and techniques. Effective systems require layered defenses rather than relying on a single signal source.

Instead of immediately building a complex hybrid system, this project follows a staged engineering approach:

Phase 1 – Text-only transformer baseline
Phase 2 – Text + structured metadata (sender domain, headers, reputation signals)
Phase 3 – Adversarial and synthetic augmentation
Phase 4 – Temporal adaptation and drift monitoring

The purpose of Phase 1 is to:

Establish a measurable transformer baseline
Evaluate the strength of linguistic fraud signals
Identify systematic failure modes
Build a reproducible training pipeline
Prepare for multi-modal integration

This model isolates the contribution of textual semantics to fraud detection performance.

3. Dataset

Training was conducted on the Enron Spam Dataset (public research corpus).

Preprocessing steps:

Email body extracted
MIME boundaries removed
HTML converted to plain text
Attachments removed
Case, punctuation, and stopwords preserved
Duplicate emails removed
Emails shorter than 10 characters removed

Data split strategy:

70% Training
15% Validation
15% Testing
Stratified by label distribution

Important limitation:

The Enron dataset reflects early 2000s spam patterns and does not represent modern spear-phishing, business email compromise, or AI-generated scams.

4. Model Architecture

Base model: distilbert-base-uncased

Classification head: Linear layer with 2 output neurons

Maximum sequence length: 256 tokens

Loss function: Class-weighted CrossEntropyLoss (to address imbalance)

Optimizer: AdamW

Learning rate: 2e-5

Weight decay: 0.01

Warmup: 10% of total training steps

Scheduler: Linear schedule with warmup

Mixed precision training: Enabled when GPU available

Early stopping: Validation loss monitored with patience = 2

5. Evaluation Results

Evaluation performed on held-out test set.

Metrics reported:

F1 Score
Precision
Recall
ROC-AUC

Typical performance on Enron:

F1 Score: 0.93 – 0.96
ROC-AUC: ~0.97

Threshold was optimized using validation data to maximize F1 or enforce high recall.

Fraud detection systems prioritize recall to minimize false negatives.

6. Error Analysis (Phase 1 Findings)

Manual inspection of misclassified samples revealed:

False Positives:

Financial newsletters
Promotional emails from legitimate organizations
Corporate announcements containing urgency language

False Negatives:

Polished spear-phishing emails
Messages mimicking internal corporate tone
Subtle impersonation attempts

Key observation:

Text-only transformers capture strong lexical fraud signals (urgency, financial triggers, threat language), but struggle when attackers successfully mimic legitimate structure and tone.

This validates the need for metadata integration in Phase 2.

7. Robustness Checks

The following analyses were conducted:

Performance comparison across short vs long emails
Confidence distribution inspection
Probability calibration curve analysis

Findings:

Strong separation for obvious spam
Reduced confidence in ambiguous borderline cases
Slight overconfidence in higher probability ranges

Threshold tuning improves recall-oriented configurations.

8. Intended Use

This model is intended for:

Research and experimentation
Baseline transformer benchmarking
Educational purposes
Integration into multi-stage fraud detection systems

It is not intended to function as a standalone production-grade email security system.

Production systems require:

Sender metadata analysis
Domain reputation scoring
URL inspection
Attachment scanning
Behavioral feedback loops
Continuous retraining

9. Limitations

Trained on historical dataset
Text-only model
No header-based analysis
No domain verification
No adversarial training
No temporal drift modeling

Performance may degrade on:

Modern spear-phishing
AI-generated scam emails
Highly contextual impersonation attacks

10. Roadmap

Phase 2: Hybrid architecture combining DistilBERT embeddings with structured metadata features.

Phase 3: Controlled synthetic adversarial augmentation using language model-generated variants.

Phase 4: Temporal evaluation and drift detection framework.

11. Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "YOUR_USERNAME/YOUR_MODEL_NAME"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Urgent: Your account has been suspended. Click here to verify."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=1)

print("Fraud probability:", probabilities[0][1].item())

12. Citation

If you use this model in research, please cite:

DistilBERT: Sanh et al., 2019. DistilBERT: a distilled version of BERT.

Enron Dataset: Klimt & Yang, 2004. The Enron Corpus.

Downloads last month: 11

Safetensors

Model size

67M params

Tensor type

F32