DistilBERT Email Fraud Detection – Phase 1 Baseline

1. Overview

This repository contains a fine-tuned distilbert-base-uncased model for binary classification of email messages:

  • 0 β†’ Legitimate (Ham)
  • 1 β†’ Fraud / Spam

This model represents Phase 1 of a multi-stage fraud detection system. The objective of this phase is to establish a strong, reproducible, text-only baseline using a transformer architecture before integrating additional metadata and adversarial strategies in subsequent phases.

The system is designed for research, experimentation, and as a foundation for more advanced email security architectures.


2. Motivation and Approach

Fraud detection in email communication is inherently adversarial. Attackers continuously adapt their language and techniques. Effective systems require layered defenses rather than relying on a single signal source.

Instead of immediately building a complex hybrid system, this project follows a staged engineering approach:

Phase 1 – Text-only transformer baseline
Phase 2 – Text + structured metadata (sender domain, headers, reputation signals)
Phase 3 – Adversarial and synthetic augmentation
Phase 4 – Temporal adaptation and drift monitoring

The purpose of Phase 1 is to:

  • Establish a measurable transformer baseline
  • Evaluate the strength of linguistic fraud signals
  • Identify systematic failure modes
  • Build a reproducible training pipeline
  • Prepare for multi-modal integration

This model isolates the contribution of textual semantics to fraud detection performance.


3. Dataset

Training was conducted on the Enron Spam Dataset (public research corpus).

Preprocessing steps:

  • Email body extracted
  • MIME boundaries removed
  • HTML converted to plain text
  • Attachments removed
  • Case, punctuation, and stopwords preserved
  • Duplicate emails removed
  • Emails shorter than 10 characters removed

Data split strategy:

  • 70% Training
  • 15% Validation
  • 15% Testing
  • Stratified by label distribution

Important limitation:

The Enron dataset reflects early 2000s spam patterns and does not represent modern spear-phishing, business email compromise, or AI-generated scams.


4. Model Architecture

Base model: distilbert-base-uncased

Classification head: Linear layer with 2 output neurons

Maximum sequence length: 256 tokens

Loss function: Class-weighted CrossEntropyLoss (to address imbalance)

Optimizer: AdamW

Learning rate: 2e-5

Weight decay: 0.01

Warmup: 10% of total training steps

Scheduler: Linear schedule with warmup

Mixed precision training: Enabled when GPU available

Early stopping: Validation loss monitored with patience = 2


5. Evaluation Results

Evaluation performed on held-out test set.

Metrics reported:

  • F1 Score
  • Precision
  • Recall
  • ROC-AUC

Typical performance on Enron:

F1 Score: 0.93 – 0.96
ROC-AUC: ~0.97

Threshold was optimized using validation data to maximize F1 or enforce high recall.

Fraud detection systems prioritize recall to minimize false negatives.


6. Error Analysis (Phase 1 Findings)

Manual inspection of misclassified samples revealed:

False Positives:

  • Financial newsletters
  • Promotional emails from legitimate organizations
  • Corporate announcements containing urgency language

False Negatives:

  • Polished spear-phishing emails
  • Messages mimicking internal corporate tone
  • Subtle impersonation attempts

Key observation:

Text-only transformers capture strong lexical fraud signals (urgency, financial triggers, threat language), but struggle when attackers successfully mimic legitimate structure and tone.

This validates the need for metadata integration in Phase 2.


7. Robustness Checks

The following analyses were conducted:

  • Performance comparison across short vs long emails
  • Confidence distribution inspection
  • Probability calibration curve analysis

Findings:

  • Strong separation for obvious spam
  • Reduced confidence in ambiguous borderline cases
  • Slight overconfidence in higher probability ranges

Threshold tuning improves recall-oriented configurations.


8. Intended Use

This model is intended for:

  • Research and experimentation
  • Baseline transformer benchmarking
  • Educational purposes
  • Integration into multi-stage fraud detection systems

It is not intended to function as a standalone production-grade email security system.

Production systems require:

  • Sender metadata analysis
  • Domain reputation scoring
  • URL inspection
  • Attachment scanning
  • Behavioral feedback loops
  • Continuous retraining

9. Limitations

  • Trained on historical dataset
  • Text-only model
  • No header-based analysis
  • No domain verification
  • No adversarial training
  • No temporal drift modeling

Performance may degrade on:

  • Modern spear-phishing
  • AI-generated scam emails
  • Highly contextual impersonation attacks

10. Roadmap

Phase 2: Hybrid architecture combining DistilBERT embeddings with structured metadata features.

Phase 3: Controlled synthetic adversarial augmentation using language model-generated variants.

Phase 4: Temporal evaluation and drift detection framework.


11. Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "YOUR_USERNAME/YOUR_MODEL_NAME"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Urgent: Your account has been suspended. Click here to verify."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=1)

print("Fraud probability:", probabilities[0][1].item())

12. Citation

If you use this model in research, please cite:

DistilBERT: Sanh et al., 2019. DistilBERT: a distilled version of BERT.

Enron Dataset: Klimt & Yang, 2004. The Enron Corpus.

Downloads last month
11
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support