DeBERTa-v3 Smishing & Spam Detector β v0.2.1
A fine-tuned DeBERTa-v3-base model for detecting smishing (SMS phishing) and spam messages. v0.2.1 is a ground-up revision of the architecture and training pipeline, driven by systematic error analysis of 624 misclassified messages from v0.1.
Model Performance
| Metric | v0.1 | v0.2.1 (@ 0.50) | v0.2.1 (@ 0.67) |
|---|---|---|---|
| F1 Score | 0.9299 | 0.8781 | 0.9022 |
| Precision | 0.8986 | 0.8226 | 0.9058 |
| Recall | 0.9634 | 0.9417 | 0.8986 |
| AUC-ROC | 0.9906 | 0.9857 | 0.9857 |
| Best Epoch | 6 | 8 | 8 |
| Optimal Threshold | 0.6993 | β | 0.6700 |
Confusion Matrix (test set @ optimised threshold)
Predicted Benign Predicted Spam
True Benign 32,136 612
True Spam 664 5,886
What the numbers don't show
Precision surpassed v0.1 (0.9058 vs 0.8986), and the FP rate dropped from 2.39% to 1.87%. More importantly, the quality of errors changed completely β see Error Analysis Comparison below.
What changed from v0.1
v0.1 achieved 0.93 F1 / 0.99 AUC-ROC but had two systematic problems revealed through error analysis of 624 misclassified test samples:
False positives (341): The model was confidently wrong on legitimate messages β 49 FPs had β₯0.99 spam probability. Bank transaction alerts, retail sale announcements, telecom notifications, and gaming discussions all triggered false alarms because they shared surface-level features (URLs, urgency words, phone numbers) with real spam.
False negatives (283): Short, featureless, or obfuscated spam slipped through. These messages had almost zero signal from the original 15 engineered features β only 1 of 283 FNs had a detected URL, zero had detected phone numbers, and average urgency score was 0.117. Many were truncated, used unicode evasion ("Y ou've got mail: new messa ge"), or were conversational-style scams.
Every architectural and training change in v0.2.1 targets one of these specific failure modes.
Architecture
Input Text
βββΊ DeBERTa-v3-base encoder (gradient checkpointing enabled)
βββΊ [CLS] embedding (768d)
βββΊ Attention-weighted pooling (768d) β NEW: learned attention over all tokens
β
23 Engineered Features β
βββΊ Linear(23β128) + LayerNorm + GELU βββΊ concat (1664d) β Residual classifier β logits
β
What's new in the architecture
Dual pooling: [CLS] + attention-weighted pooling. v0.1 relied solely on the [CLS] token embedding, which can miss signal in short messages where one suspicious word doesn't dominate the representation. v0.2.1 adds a learned attention pooling layer that computes a weighted sum across all token embeddings. The two representations are concatenated (2Γ768 = 1536d), giving the classifier both a global summary and a signal-focused view.
8 new engineered features targeting FN blind spots. The original 15 features (char_count, has_url, urgency_score, etc.) produced near-zero signal on false negatives. The 8 new features are designed to catch the specific evasion patterns those FNs used:
| Feature | Targets | Description |
|---|---|---|
unicode_ratio |
Unicode substitution ("VΓ©rΓfy yΓΈur Γ ccount") | % of non-ASCII characters |
char_entropy |
Short/repetitive spam | Shannon entropy over character distribution |
suspicious_spacing |
Spaced-out evasion ("m e s s a g e") | Count of space-separated character sequences |
leet_ratio |
Character substitution (l33t speak) | % of characters that map to leet translations |
max_digit_run |
Phone numbers, OTPs, account numbers | Longest consecutive digit sequence |
repeated_char_ratio |
"!!!!" or "aaaaaa" patterns | Ratio of consecutive repeated characters |
vocab_richness |
Template spam (low diversity) | Unique words / total words |
has_obfuscated_url |
Broken URLs ("httpscluesjdko") | Regex detection of evasive URL patterns |
Wider feature projection. 23 features β 128d (was 15 β 64d), with LayerNorm and GELU activation.
Residual classifier head. The 1664d combined representation passes through a bottleneck (β256d) with a residual block, improving gradient flow through the deeper head.
Multi-sample dropout. During training, 3 stochastic forward passes through the dropout layer are averaged, acting as a cheap ensemble. This improves probability calibration β v0.1 had 49 FPs at β₯0.99 confidence. v0.2.1 has zero.
Sequence length 256. Doubled from 128 to catch longer MMS messages and delivery scam templates that were being truncated.
Training changes
| Change | What | Why (error analysis) |
|---|---|---|
| Focal loss (Ξ³=1) | Replaces CrossEntropyLoss | v0.1's FPs clustered at 0.85β1.0 confidence. Focal loss applies (1βp)^Ξ³ modulation that down-weights easy predictions, forcing the model to learn the hard boundary between legitimate promos and real spam. |
| Label smoothing (Ξ΅=0.05) | Soft targets (0.025/0.975) | Error analysis found mislabeled examples β phishing messages (SBI YONO, AnPost customs, NHS COVID, Apple Pay) incorrectly labeled as benign. Smoothing prevents the model from memorising noisy labels. |
| Cosine warm restarts | LR restarts every 2 epochs | Gives the model multiple chances to escape local minima during 8 epochs. |
| Threshold optimisation | Sweep 0.30β0.85 on val set | v0.1 used a static 0.6993 threshold. v0.2.1 finds the optimal F1 threshold each epoch. |
| Label audit | 7 high-confidence corrections | Phishing messages confirmed mislabeled as benign were corrected before training. |
| 5:1 undersampling | ~235k messages (was 3:1 / ~150k) | Retains more training data while keeping manageable class imbalance. |
Error Analysis Comparison
v0.2.1 was evaluated against the same error categories from v0.1's analysis. The overlap analysis tracks the exact same messages across both versions.
Targeted failure modes β all fixed
| Failure Mode | v0.1 | v0.2.1 | Status |
|---|---|---|---|
| High-confidence FPs (prob β₯ 0.99) | 49 | 0 | β Eliminated |
| High-confidence FNs (prob < 0.10) | 68 | 0 | β Eliminated |
| Conversational-style scam FNs | 61 | 0 | β Eliminated |
| Obfuscated text FNs | 14 | 0 | β Eliminated |
| Mislabeled phishing FPs | 3 | 1 | β Nearly eliminated |
Overlap analysis β how many v0.1 errors are actually fixed?
| False Positives | False Negatives | |
|---|---|---|
| v0.1 errors | 341 | 283 |
| Fixed in v0.2.1 | 330 (96.8%) | 240 (84.8%) |
| Still broken | 11 | 43 |
| New in v0.2.1 | 596 | 491 |
96.8% of v0.1's false positives and 84.8% of v0.1's false negatives are resolved. The remaining v0.2.1 errors are predominantly near-threshold cases (274 FNs scoring 0.40β0.67) and short/truncated messages (288 FNs under 40 characters) β diffuse, hard cases rather than systematic failures.
Calibration improvement
| Metric | v0.1 | v0.2.1 |
|---|---|---|
| FP mean probability | 0.8963 | 0.8009 (β less confident) |
| FN mean probability | 0.3469 | 0.4829 (β closer to boundary) |
The model is no longer confidently wrong in either direction. Errors are concentrated near the decision threshold, which is the expected behavior of a well-calibrated classifier.
Training details
| Setting | v0.1 | v0.2.1 |
|---|---|---|
| Base model | deberta-v3-base |
deberta-v3-base |
| Max sequence length | 128 | 256 |
| Batch size (per GPU) | 16 | 8 |
| Gradient accumulation | 8 | 16 |
| Effective batch size | 512 | 512 |
| Epochs | 6 | 8 |
| Learning rate (encoder) | 2e-5 | 2e-5 |
| Learning rate (head) | 1e-3 | 1e-3 |
| LR schedule | Cosine + warmup | Cosine warm restarts (Tβ=2 epochs) |
| Warmup ratio | 0.1 | 0.1 |
| Loss function | CrossEntropyLoss | FocalLoss(Ξ³=1.0, smoothing=0.05) |
| Class weighting | Balanced | Balanced |
| Precision | bfloat16 | bfloat16 |
| Gradient checkpointing | No | Yes |
| Multi-sample dropout | β | 3 passes |
| Engineered features | 15 | 23 |
| Training time | ~45 min | ~189 min |
| Hardware | 4Γ RTX 3090 | 4Γ RTX 3090 |
Dataset
Trained on a curated English-only dataset compiled from 4+ source datasets (SpamDam, Discord, SmishTank, and others) covering SMS spam, smishing, and phishing messages. The raw dataset contains 700k messages, undersampled to a 5:1 benign-to-spam ratio (235k training samples).
v0.2.1 includes a label audit pass where 7 high-confidence mislabeled examples (phishing messages incorrectly labeled as benign) were corrected before training.
Label mapping: 0 = benign, 1 = spam/smishing
Usage
Installation
pip install torch transformers scikit-learn joblib sentencepiece
Inference
import re
import math
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib
# ββ Feature extraction (must match training) ββββββββββββββββββββββββββββββββ
URGENCY_WORDS = {
"urgent", "immediately", "expires", "verify", "confirm", "suspended",
"locked", "alert", "action required", "limited time", "click here",
"act now", "final notice", "winner", "prize", "claim", "free",
"blocked", "deactivated", "unusual activity",
}
URL_PATTERN = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$Β£β¬βΉΒ₯]|(usd|gbp|eur|inr)', re.I)
LEET_MAP = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL = re.compile(
r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
r"|(?:h\s*t\s*t\s*p)"
r"|(?:www\s*\.\s*\w)"
r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD = re.compile(r"\b(?:\w\s){3,}\w\b")
def extract_features(text):
"""Extract all 23 features for a single message."""
words = text.split()
letters = [c for c in text if c.isalpha()]
chars = list(text)
n = len(chars)
# Original 15 features
original = [
len(text),
len(words),
sum(len(w) for w in words) / max(len(words), 1),
sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
sum(1 for c in text if c.isdigit()) / max(len(text), 1),
sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
text.count('!'),
text.count('?'),
int(bool(URL_PATTERN.search(text))),
len(URL_PATTERN.findall(text)),
int(any(d in text.lower() for d in SHORTENED)),
int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
int(bool(EMAIL_PATTERN.search(text))),
int(bool(CURRENCY_PATTERN.search(text))),
sum(1 for w in URGENCY_WORDS if w in text.lower()),
]
# 8 new features (v0.2.1)
non_ascii = sum(1 for c in chars if ord(c) > 127)
counts = Counter(text.lower())
entropy = -sum((c/n) * math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
translated = text.translate(LEET_MAP)
leet_changes = sum(1 for a, b in zip(text, translated) if a != b)
max_drun, cur = 0, 0
for c in chars:
if c.isdigit(): cur += 1; max_drun = max(max_drun, cur)
else: cur = 0
repeats = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0
new = [
non_ascii / max(n, 1), # unicode_ratio
entropy, # char_entropy
len(SPACED_WORD.findall(text)), # suspicious_spacing
leet_changes / max(n, 1), # leet_ratio
max_drun, # max_digit_run
repeats / max(n - 1, 1), # repeated_char_ratio
len(set(w.lower() for w in words)) / max(len(words), 1), # vocab_richness
int(bool(OBFUSCATED_URL.search(text))), # has_obfuscated_url
]
return original + new
# ββ Model definition βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
class AttentionPooling(nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.attention = nn.Sequential(
nn.Linear(hidden_size, hidden_size),
nn.Tanh(),
nn.Linear(hidden_size, 1, bias=False),
)
def forward(self, hidden_states, attention_mask):
scores = self.attention(hidden_states).squeeze(-1)
scores = scores.masked_fill(attention_mask == 0, float("-inf"))
weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
return (hidden_states * weights).sum(dim=1)
class DeBERTaWithFeaturesV2(nn.Module):
def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
super().__init__()
self.deberta = AutoModel.from_pretrained(model_name)
H = self.deberta.config.hidden_size
self.attn_pool = AttentionPooling(H)
feat_dim = 128
self.feature_proj = nn.Sequential(
nn.Linear(num_extra_features, feat_dim),
nn.LayerNorm(feat_dim), nn.GELU(), nn.Dropout(dropout),
)
combined_dim = 2 * H + feat_dim
bottleneck = 256
self.fc1 = nn.Linear(combined_dim, bottleneck)
self.ln1 = nn.LayerNorm(bottleneck)
self.residual_block = nn.Sequential(
nn.Linear(bottleneck, bottleneck), nn.LayerNorm(bottleneck),
nn.GELU(), nn.Dropout(dropout),
nn.Linear(bottleneck, bottleneck), nn.LayerNorm(bottleneck),
)
self.dropout = nn.Dropout(dropout)
self.output_head = nn.Linear(bottleneck, num_labels)
def forward(self, input_ids, attention_mask, extra_features):
out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
hidden = out.last_hidden_state
cls_emb = hidden[:, 0, :]
attn_emb = self.attn_pool(hidden, attention_mask)
feat = self.feature_proj(extra_features)
combined = torch.cat([cls_emb, attn_emb, feat], dim=1)
x = F.gelu(self.ln1(self.fc1(combined)))
x = x + self.residual_block(x)
return self.output_head(self.dropout(x))
# ββ Load model βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_id = "notd5a/deberta-v3-malicious-sms-mms-detector"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler = joblib.load(hf_hub_download(model_id, "scaler.pkl"))
model = DeBERTaWithFeaturesV2(model_id)
state = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state)
model.to(device).eval()
# Load optimised threshold
with open(hf_hub_download(model_id, "threshold.json")) as f:
THRESHOLD = json.load(f)["threshold"]
# ββ Predict ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def predict(texts):
if isinstance(texts, str):
texts = [texts]
enc = tokenizer(texts, max_length=256, padding="max_length",
truncation=True, return_tensors="pt")
raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
scaled = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)
with torch.no_grad():
logits = model(enc["input_ids"].to(device), enc["attention_mask"].to(device), scaled)
probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
return [{"text": t, "label": int(p >= THRESHOLD),
"prob_spam": round(float(p), 4),
"prediction": "spam/smishing" if p >= THRESHOLD else "benign"}
for t, p in zip(texts, probs)]
# ββ Example ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
results = predict([
"Your account has been suspended. Verify immediately: http://bit.ly/abc123",
"Hey, are you free for lunch tomorrow?",
"Y ou've got mail: new messa ge w7",
"Flat 30% OFF on all ethnic wear! Shop now at bit.ly/sale2026",
"click httpscluesjdko to download app",
])
for r in results:
print(r)
Engineered features reference
All 23 features, computed at inference time from the raw message text:
Original 15 features (v0.1)
| # | Feature | Type | Description |
|---|---|---|---|
| 1 | char_count |
int | Total character count |
| 2 | word_count |
int | Total word count (whitespace split) |
| 3 | avg_word_length |
float | Mean word length |
| 4 | uppercase_ratio |
float | Uppercase letters / all letters |
| 5 | digit_ratio |
float | Digits / total characters |
| 6 | special_char_ratio |
float | Non-alphanumeric, non-space / total characters |
| 7 | exclamation_count |
int | Count of ! |
| 8 | question_mark_count |
int | Count of ? |
| 9 | has_url |
binary | Contains URL pattern |
| 10 | url_count |
int | Number of URLs detected |
| 11 | has_shortened_url |
binary | Contains bit.ly, t.co, etc. |
| 12 | has_phone_number |
binary | Contains phone number (β₯7 digits) |
| 13 | has_email |
binary | Contains email address |
| 14 | has_currency |
binary | Contains currency symbol or code |
| 15 | urgency_score |
int | Count of urgency keywords matched |
New 8 features (v0.2.1)
| # | Feature | Type | Targets | Description |
|---|---|---|---|---|
| 16 | unicode_ratio |
float | Unicode evasion | Non-ASCII characters / total characters |
| 17 | char_entropy |
float | Template spam | Shannon entropy over character distribution |
| 18 | suspicious_spacing |
int | Spaced-out evasion | Count of "m e s s a g e" style patterns |
| 19 | leet_ratio |
float | L33t speak | Characters that map to leet translations / total |
| 20 | max_digit_run |
int | Embedded numbers | Longest consecutive digit sequence |
| 21 | repeated_char_ratio |
float | Exclamation spam | Consecutive repeated chars / (length β 1) |
| 22 | vocab_richness |
float | Low-diversity spam | Unique words / total words |
| 23 | has_obfuscated_url |
binary | Broken URLs | Detects evasive URL patterns |
v0.1 error analysis methodology
The v0.1 β v0.2.1 changes were motivated by a structured error analysis on v0.1's 18,995-sample test set:
v0.1 Test Set (18,995 samples)
ββββββββββββββββββββ¬βββββββββββββββββββ
β Predicted β Predicted β
β Benign β Spam β
βββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββ€
β True Benign β 13,905 (TN) β 341 (FP) β
β True Spam β 283 (FN) β 4,466 (TP) β
βββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββ
False positive breakdown (341 β benign messages flagged as spam)
The FPs clustered at high confidence (49 with prob β₯ 0.99) and fell into distinct categories:
- Retail/brand promotions (~60): Legitimate sale announcements sharing surface features with spam (URLs, urgency words, "FLAT 30% OFF")
- Gaming/internet discussion (~30): Discord messages about GTA, Battlefield, CoD β high digit ratios and uppercase triggered false alarms
- Bank transaction alerts (~20): Nigerian bank debit notifications with account numbers, amounts, uppercase text
- Telecom notifications (~15): Carrier data plan alerts, customer service messages
- Mislabeled phishing (~3): Actual phishing (SBI YONO, NHS COVID, Apple Pay) incorrectly labeled benign β the model got these right
False negative breakdown (283 β spam that slipped through)
- Near-threshold (120): Spam scoring 0.40β0.70, just below the decision boundary
- Conversational scams (61): Normal-sounding text hiding malicious intent
- Truncated/tiny (23): Messages too short for any model to classify
- Low-signal promo spam (14): Marketing-style spam lacking typical spam indicators
- Obfuscated text (14): "Y ou've got mail: new messa ge w7", "httpscluesjdko"
Files
| File | Description |
|---|---|
pytorch_model.pt |
Model weights (DeBERTaWithFeaturesV2) |
tokenizer/ |
Saved DeBERTa tokenizer |
scaler.pkl |
StandardScaler fitted on 23 training features |
threshold.json |
Optimised classification threshold (0.67) |
config.json |
DeBERTa base config |
training_history.csv |
Per-epoch metrics for all 8 epochs |
Limitations
- English-only. Non-English messages in the training data were filtered; the model may misclassify non-English spam or flag non-English legitimate messages.
- Optimised for SMS/MMS (β€256 tokens). Longer content like emails will be truncated.
- Short message weakness. Messages under 40 characters are the dominant remaining failure mode (288 FNs). There is insufficient text for either DeBERTa or the engineered features to provide signal.
- Promotional boundary. Legitimate marketing with aggressive language (flash sales, urgency CTAs) can still trigger false positives, though at lower confidence than v0.1.
- Evasion arms race. Novel obfuscation techniques not in training data will reduce recall over time.
- No sender metadata. The model operates on message text only β sender reputation, short codes, and carrier signals are not available.
Version history
| Version | Date | F1 | AUC | Key changes |
|---|---|---|---|---|
| v0.1 | 2026-03 | 0.9299 | 0.9906 | Initial release, CLS pooling, 15 features, CrossEntropyLoss |
| v0.2.1 | 2026-03 | 0.9022 | 0.9857 | Attention pooling, 23 features, focal loss (Ξ³=1), label audit, 5:1 undersampling. Eliminated all high-confidence errors and obfuscation/conversational blind spots. |
License
CC BY-NC 4.0 β free for research and non-commercial use. Commercial use requires explicit permission.
Model tree for notd5a/deberta-v3-malicious-sms-mms-detector-v0.2.1
Base model
microsoft/deberta-v3-base