De-Scam-BERTa-v3 - A Smishing & Spam Hybrid Ensemble Detector β v0.3
A fine-tuned DeBERTa-v3-base model for detecting smishing and spam SMS/MMS messages. Designed to work as part of a hybrid routing system alongside a CharCNN short-message specialist, achieving 0.9666 F1 on the combined test set.
The model combines DeBERTa's contextual understanding with 23 handcrafted text features and includes built-in explainability β attention-based token importance and feature contribution analysis for every prediction.
Model performance
Hybrid system (recommended)
The hybrid router sends short messages (<=60 chars) to CharCNN and long messages (>=120 chars) to DeBERTa, with a sigmoid-blended ensemble for messages in between.
| Metric | Hybrid (CharCNN + DeBERTa) | DeBERTa-only |
|---|---|---|
| F1 | 0.9666 | 0.8934 |
| Precision | 0.9675 | 0.8833 |
| Recall | 0.9657 | 0.9037 |
| AUC-ROC | 0.9969 | 0.9794 |
Routing breakdown (40,251 test messages)
| Route | Messages | F1 |
|---|---|---|
| CharCNN (<=160 chars) | 35,959 (89.3%) | 0.9779 |
| DeBERTa (>160 chars) | 4,292 (10.7%) | 0.9231 |
| Combined | 40,251 | 0.9666 |
Confusion matrices
Hybrid system:
Predicted Benign Predicted Spam
True Benign 28,817 359
True Spam 380 10,695
DeBERTa-only (@ optimised threshold 0.56):
Predicted Benign Predicted Spam
True Benign 27,854 1,322
True Spam 1,066 10,009
Error analysis (XAI)
Explainable AI was run on all 739 misclassified messages to identify systematic failure patterns.
Summary
| False Positives | False Negatives | |
|---|---|---|
| Count | 359 | 380 |
| Mean spam prob | 0.7390 | 0.3380 |
| Confident errors | 305 (85%) | 193 (51%) |
| Primary route | DeBERTa (60%) | CharCNN (65%) |
Error concentration by model
| Model | Messages | FP | FN | Total Errors | Error Rate |
|---|---|---|---|---|---|
| CharCNN | 35,959 | 142 | 247 | 389 | 1.08% |
| DeBERTa | 4,292 | 217 | 133 | 350 | 8.15% |
DeBERTa has a 7.5Γ higher error rate than CharCNN despite handling far fewer messages.
False positives β benign messages flagged as spam
FPs are overwhelmingly legitimate service messages that share surface features with spam. 85% are high-confidence errors (probability > 0.6).
Top misleading features (by mean |z-score| across FP errors):
| Feature | Mean z-score | % Notable |
|---|---|---|
has_email |
+19.35 | 100% |
has_shortened_url |
+6.84 | 100% |
has_currency |
+5.22 | 100% |
has_phone_number |
+4.61 | 100% |
has_url / url_count |
+3.08 | 100% |
digit_ratio |
+2.89 | 75% |
has_obfuscated_url |
+2.70 | 100% |
urgency_score |
+1.62 | 100% |
Typical FP categories: bank transaction alerts with account numbers and currency symbols, delivery notifications with tracking URLs, promotional service messages with opt-out codes, and appointment/notification SMS from legitimate services.
False negatives β spam messages missed as benign
FNs are split between borderline (49%) and confident (51%) errors. 65% are routed to CharCNN, making short-message spam the primary gap.
Key patterns in missed spam:
| Pattern | Description |
|---|---|
| Conversational spam | Reads like casual chat with no traditional spam signals β no URLs, no urgency words |
| Non-English spam | Spanish and other languages with currency symbols slip through the English-trained model |
| Social engineering | Sophisticated scams disguised as friendly messages or legitimate requests |
| Truncated/ambiguous | Short spam fragments that lack enough context to classify |
Feature analysis: FN messages have weaker spam signals overall. Features like has_url and urgency_score β which are strong spam indicators β appear at much lower rates in FN errors compared to correctly-caught spam, confirming that these are structurally different from typical spam.
Key takeaways
- DeBERTa is the weak link. Its 8.15% error rate on longer messages (vs CharCNN's 1.08%) is the primary area for improvement.
- Feature overlap is the core FP problem. Legitimate service messages (bank alerts, delivery notifications) use the same features as spam (URLs, phone numbers, urgency language, currency symbols). The model cannot distinguish intent from surface signals alone.
- CharCNN misses subtle spam. Short conversational-style spam without traditional indicators is the main FN source. Character-level features alone lack the semantic understanding needed for these cases.
- Non-English content is a blind spot. The English-only training set causes systematic FNs on multilingual spam.
Architecture
Hybrid routing system
Input Message
|
βββ len <= 60 chars βββββββββββΊ CharCNN (100%)
|
βββ 60 < len < 120 chars βββββΊ Sigmoid ensemble blend
| prob = (1-w)*cnn + w*deberta
| w = sigmoid((len - 90) / 10)
|
βββ len >= 120 chars ββββββββββΊ DeBERTa (100%)
DeBERTa single-head classifier
Input Text
βββΊ DeBERTa-v3-base encoder (gradient checkpointed)
βββΊ [CLS] embedding (768d)
βββΊ Attention-weighted pooling (768d)
β
23 Engineered Features β
βββΊ Linear(23β128) + LayerNorm + GELU βββΊ concat (1664d)
β
βββΊ Linear(1664β256) + LayerNorm + GELU
βββΊ Residual block (256d β 256d)
βββΊ Linear(256β2) β spam logits
CharCNN short-message specialist
Character IDs (max 160 chars)
βββΊ Embedding(92, 64)
βββΊ Conv1d(64, 128, kernel=2) + BN + ReLU β MaxPool
βββΊ Conv1d(64, 128, kernel=3) + BN + ReLU β MaxPool
βββΊ Conv1d(64, 128, kernel=5) + BN + ReLU β MaxPool
β
Concat (384d) β
βββΊ concat (448d)
23 Features β Linear(23β64) + LayerNorm + GELU (64d) β
β
βββΊ Linear(448β128) + LayerNorm + GELU + Dropout
βββΊ Linear(128β2) β spam logits
Usage
Quick start (DeBERTa-only from HuggingFace)
import re
import math
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib
# ββ Feature extraction (must match training) ββββββββββββββββββββββββββββββββ
URGENCY_WORDS = {
"urgent", "immediately", "expires", "verify", "confirm", "suspended",
"locked", "alert", "action required", "limited time", "click here",
"act now", "final notice", "winner", "prize", "claim", "free",
"blocked", "deactivated", "unusual activity",
}
URL_PATTERN = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$\xa3\u20ac\u20b9\xa5]|(usd|gbp|eur|inr)', re.I)
LEET_MAP = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL = re.compile(
r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
r"|(?:h\s*t\s*t\s*p)"
r"|(?:www\s*\.\s*\w)"
r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD = re.compile(r"\b(?:\w\s){3,}\w\b")
def extract_features(text):
"""Extract all 23 features for a single message."""
words = text.split()
letters = [c for c in text if c.isalpha()]
chars = list(text)
n = len(chars)
original = [
len(text), len(words),
sum(len(w) for w in words) / max(len(words), 1),
sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
sum(1 for c in text if c.isdigit()) / max(len(text), 1),
sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
text.count('!'), text.count('?'),
int(bool(URL_PATTERN.search(text))), len(URL_PATTERN.findall(text)),
int(any(d in text.lower() for d in SHORTENED)),
int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
int(bool(EMAIL_PATTERN.search(text))), int(bool(CURRENCY_PATTERN.search(text))),
sum(1 for w in URGENCY_WORDS if w in text.lower()),
]
non_ascii = sum(1 for c in chars if ord(c) > 127)
counts = Counter(text.lower())
entropy = -sum((c/n)*math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
translated = text.translate(LEET_MAP)
leet = sum(1 for a, b in zip(text, translated) if a != b)
mdr, cr = 0, 0
for c in chars:
if c.isdigit(): cr += 1; mdr = max(mdr, cr)
else: cr = 0
reps = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0
new = [
non_ascii / max(n, 1), entropy,
len(SPACED_WORD.findall(text)), leet / max(n, 1), mdr,
reps / max(n-1, 1),
len(set(w.lower() for w in words)) / max(len(words), 1),
int(bool(OBFUSCATED_URL.search(text))),
]
return original + new
# ββ Model definition βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
class AttentionPooling(nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.attention = nn.Sequential(
nn.Linear(hidden_size, hidden_size), nn.Tanh(),
nn.Linear(hidden_size, 1, bias=False),
)
def forward(self, hidden_states, attention_mask):
scores = self.attention(hidden_states).squeeze(-1)
scores = scores.masked_fill(attention_mask == 0, float("-inf"))
weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
return (hidden_states * weights).sum(dim=1)
class DeBERTaSingleHead(nn.Module):
def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
super().__init__()
self.deberta = AutoModel.from_pretrained(model_name)
H = self.deberta.config.hidden_size
self.attn_pool = AttentionPooling(H)
feat_dim = 128
self.feature_proj = nn.Sequential(
nn.Linear(num_extra_features, feat_dim), nn.LayerNorm(feat_dim),
nn.GELU(), nn.Dropout(dropout),
)
combined_dim = 2 * H + feat_dim # 768*2 + 128 = 1664
self.fc1 = nn.Linear(combined_dim, 256)
self.ln1 = nn.LayerNorm(256)
self.residual_block = nn.Sequential(
nn.Linear(256, 256), nn.LayerNorm(256),
nn.GELU(), nn.Dropout(dropout),
nn.Linear(256, 256), nn.LayerNorm(256),
)
self.dropout = nn.Dropout(dropout)
self.output_head = nn.Linear(256, num_labels)
def forward(self, input_ids, attention_mask, extra_features):
out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
hidden = out.last_hidden_state
cls_emb = hidden[:, 0, :]
attn_emb = self.attn_pool(hidden, attention_mask)
feat = self.feature_proj(extra_features)
x = torch.cat([cls_emb, attn_emb, feat], dim=1)
x = F.gelu(self.ln1(self.fc1(x)))
x = x + self.residual_block(x)
return self.output_head(self.dropout(x))
# ββ Load model βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_id = "notd5a/deberta-v3-malicious-sms-mms-detector"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler = joblib.load(hf_hub_download(model_id, "scaler.pkl"))
model = DeBERTaSingleHead(model_id)
state = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state)
model.float().to(device).eval()
with open(hf_hub_download(model_id, "threshold.json")) as f:
thresholds = json.load(f)
SPAM_THRESHOLD = thresholds["optimal_threshold"]
# ββ Predict ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def predict(texts):
if isinstance(texts, str):
texts = [texts]
enc = tokenizer(texts, max_length=128, padding="max_length",
truncation=True, return_tensors="pt")
raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
scaled = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)
with torch.no_grad():
logits = model(
enc["input_ids"].to(device),
enc["attention_mask"].to(device),
scaled,
)
spam_probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
return [{
"text": t,
"prediction": "spam" if sp >= SPAM_THRESHOLD else "benign",
"is_spam": bool(sp >= SPAM_THRESHOLD),
"spam_probability": round(float(sp), 4),
} for t, sp in zip(texts, spam_probs)]
# ββ Example ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
results = predict([
"Your account has been suspended. Verify immediately: http://bit.ly/abc123",
"Hey, are you free for lunch tomorrow?",
"Flat 30% OFF on all ethnic wear! Shop now at bit.ly/sale2026",
])
for r in results:
flag = "SPAM" if r["is_spam"] else "benign"
print(f" [{flag}] (spam: {r['spam_probability']:.3f}) {r['text'][:80]}")
Hybrid inference from HuggingFace (recommended)
Download the full repo and run the hybrid router for best performance:
# Clone the repo
git lfs install
git clone https://huggingface.co/notd5a/deberta-v3-malicious-sms-mms-detector
cd deberta-v3-malicious-sms-mms-detector
# Install dependencies
pip install torch transformers scikit-learn joblib sentencepiece
# Run hybrid inference (repo root = DeBERTa dir, charcnn/ = CharCNN dir)
python hybrid_router_inference.py \
--deberta_dir . \
--short_dir charcnn \
--text "Your account has been suspended. Verify at bit.ly/xyz"
# With JSON output
python hybrid_router_inference.py \
--deberta_dir . \
--short_dir charcnn \
--text "Your account has been suspended" \
--json
# With explainability (token importance + feature contributions)
python hybrid_router_inference.py \
--deberta_dir . \
--short_dir charcnn \
--text "Your account has been suspended. Verify at bit.ly/xyz" \
--explain
# Batch inference on CSV
python hybrid_router_inference.py \
--deberta_dir . \
--short_dir charcnn \
--input test_messages.csv \
--output predictions.csv
Programmatic API
from hybrid_router_inference import HybridDetector
# From a cloned HuggingFace repo:
detector = HybridDetector.load(deberta_dir=".", short_dir="charcnn")
# Or from local training directories:
detector = HybridDetector.load(
deberta_dir="model_output_v2.4",
short_dir="cnn_model_v3",
)
# Single classification
result = detector.classify("Win a free iPhone! Click here now!")
# {
# "text": "Win a free iPhone! Click here now!",
# "prediction": "spam",
# "is_spam": True,
# "spam_probability": 0.9812,
# "model_used": "charcnn",
# "routing_reason": "Routed to CharCNN (length 34 <= 60)"
# }
# With explainability
result = detector.classify(
"Your Chase account has been locked. Verify: chase-secure.com/verify",
explain=True,
)
# Adds: token_importance, feature_contributions, explanation
# Batch prediction
results = detector.predict([
"Hey, are you coming to dinner tonight?",
"URGENT: Your bank account has been compromised. Act now!",
"Your package is on the way! Track: amzn.to/3xK9",
])
Explainability (XAI)
Every prediction can include an explanation showing why the model made its decision:
- Token importance β attention weights mapped back to input tokens, showing which words the model focused on
- Feature contributions β z-scores for each engineered feature, highlighting which are unusual compared to the training population
- Explanation string β human-readable summary combining both signals
Features with |z-score| > 1.5 are flagged as notable contributors.
Training details
| Setting | Value |
|---|---|
| Base model | microsoft/deberta-v3-base |
| Max sequence length | 128 |
| Batch size (per GPU) | 8 |
| Gradient accumulation | 4 |
| Effective batch size | 128 |
| Epochs | 10 (best @ epoch 9) |
| LR (encoder) | 2e-5 |
| LR (head) | 1e-3 |
| LR schedule | CosineAnnealingLR |
| Warmup ratio | 0.1 |
| Loss | FocalLoss(gamma=1.5, smoothing=0.05) |
| R-Drop alpha | 0.3 |
| FGM epsilon | 0.5 |
| EMA decay | 0.995 |
| Multi-sample dropout | 3 passes |
| Precision | bfloat16 |
| Gradient checkpointing | Yes |
| Engineered features | 23 |
| Hardware | 4x NVIDIA H200 SXM |
| Training time | 194 minutes |
Training progression
| Epoch | Train Loss | Val F1 (opt) | Val AUC | Threshold |
|---|---|---|---|---|
| 1 | 0.1488 | 0.8799 | 0.9749 | 0.59 |
| 2 | 0.1377 | 0.8855 | 0.9772 | 0.555 |
| 3 | 0.1341 | 0.8918 | 0.9785 | 0.57 |
| 4 | 0.1342 | 0.8924 | 0.9792 | 0.585 |
| 5 | 0.1319 | 0.8932 | 0.9797 | 0.54 |
| 6 | 0.1325 | 0.8952 | 0.9797 | 0.57 |
| 7 | 0.1305 | 0.8960 | 0.9800 | 0.57 |
| 8 | 0.1307 | 0.8961 | 0.9803 | 0.575 |
| 9 | 0.1301 | 0.8969 | 0.9803 | 0.56 |
| 10 | 0.1290 | 0.8965 | 0.9803 | 0.555 |
Dataset
Trained on 268,340 English SMS/MMS messages:
| Category | Count | Share |
|---|---|---|
| Benign | 194,504 | 72.5% |
| Spam/Smishing | 73,836 | 27.5% |
| Total | 268,340 |
Benign-to-spam ratio: 2.6:1
Label mapping: label = 0 (benign), 1 (spam/smishing)
Engineered features reference
All 23 features are computed at inference time from the raw message text and standardised using a fitted StandardScaler (scaler.pkl).
Original 15 features
| # | Feature | Type | Description |
|---|---|---|---|
| 1 | char_count |
int | Total character count |
| 2 | word_count |
int | Total word count |
| 3 | avg_word_length |
float | Mean word length |
| 4 | uppercase_ratio |
float | Uppercase letters / all letters |
| 5 | digit_ratio |
float | Digits / total characters |
| 6 | special_char_ratio |
float | Non-alphanumeric, non-space / total characters |
| 7 | exclamation_count |
int | Count of ! |
| 8 | question_mark_count |
int | Count of ? |
| 9 | has_url |
binary | Contains URL pattern |
| 10 | url_count |
int | Number of URLs detected |
| 11 | has_shortened_url |
binary | Contains bit.ly, t.co, etc. |
| 12 | has_phone_number |
binary | Contains phone number (>=7 digits) |
| 13 | has_email |
binary | Contains email address |
| 14 | has_currency |
binary | Contains currency symbol or code |
| 15 | urgency_score |
int | Count of urgency keywords matched |
Evasion detection features (v0.2+)
| # | Feature | Type | Description |
|---|---|---|---|
| 16 | unicode_ratio |
float | Non-ASCII characters / total characters |
| 17 | char_entropy |
float | Shannon entropy over character distribution |
| 18 | suspicious_spacing |
int | Count of spaced-out word patterns (e.g. "w o r d") |
| 19 | leet_ratio |
float | Characters that map to leet translations / total |
| 20 | max_digit_run |
int | Longest consecutive digit sequence |
| 21 | repeated_char_ratio |
float | Consecutive repeated chars / (length - 1) |
| 22 | vocab_richness |
float | Unique words / total words |
| 23 | has_obfuscated_url |
binary | Detects evasive URL patterns |
Model evolution
| Version | Architecture | Spam F1 | AUC | Key change |
|---|---|---|---|---|
| v0.1 | DeBERTa-base + CLS + 15 features | 0.9299 | 0.9906 | Initial release |
| v0.2 | + attention pool + 8 features + focal loss | 0.8456 | 0.9867 | Architecture overhaul |
| v0.2.1 | + focal gamma=1.0 + 5:1 undersample | 0.9022 | 0.9857 | Loss/data tuning |
| v0.2.2 | + dataset cleanup + diverse benign | 0.9096 | 0.9883 | Data quality |
| v0.3 | Single head + CharCNN hybrid | 0.9666 | 0.9969 | Hybrid routing system |
Files
| File | Description |
|---|---|
hybrid_router_inference.py |
Full hybrid inference pipeline (CharCNN + DeBERTa + routing + XAI) |
pytorch_model.pt |
DeBERTa model weights (DeBERTaSingleHead, epoch 9) |
tokenizer/ |
Saved DeBERTa-v3-base tokenizer |
scaler.pkl |
DeBERTa feature scaler (StandardScaler, 23 features) |
threshold.json |
Optimised DeBERTa classification threshold (0.56) |
charcnn/charcnn_best.pt |
CharCNN model weights (147K params) |
charcnn/charcnn_config.json |
CharCNN architecture config + threshold (0.57) |
charcnn/charcnn_scaler.pkl |
CharCNN feature scaler |
training_history.csv |
Per-epoch DeBERTa training metrics |
Limitations
- English-only. Non-English messages may be misclassified.
- Optimised for SMS/MMS. Messages are truncated to 128 tokens for DeBERTa, 160 characters for CharCNN.
- Promotional boundary. Legitimate marketing with aggressive urgency language remains the hardest category to classify correctly.
- Evasion arms race. Novel obfuscation techniques not represented in training data will reduce performance over time.
- No sender metadata. Classification is based on message text only β no phone number, carrier, or frequency signals.
- DeBERTa-only performance is lower. The 0.8934 F1 for DeBERTa alone reflects that many spam messages are short and better suited to CharCNN.
License
CC BY-NC 4.0 β free for research and non-commercial use. Commercial use requires explicit permission.
Model tree for notd5a/de-scam-berta-v3-hybrid-detector-v0.3
Base model
microsoft/deberta-v3-base