DeBERTa-v3 Smishing & Spam Detector β€” v0.2.1

A fine-tuned DeBERTa-v3-base model for detecting smishing (SMS phishing) and spam messages. v0.2.1 is a ground-up revision of the architecture and training pipeline, driven by systematic error analysis of 624 misclassified messages from v0.1.

Model Performance

Metric v0.1 v0.2.1 (@ 0.50) v0.2.1 (@ 0.67)
F1 Score 0.9299 0.8781 0.9022
Precision 0.8986 0.8226 0.9058
Recall 0.9634 0.9417 0.8986
AUC-ROC 0.9906 0.9857 0.9857
Best Epoch 6 8 8
Optimal Threshold 0.6993 β€” 0.6700

Confusion Matrix (test set @ optimised threshold)

                Predicted Benign    Predicted Spam
True Benign             32,136               612
True Spam                  664             5,886

What the numbers don't show

Precision surpassed v0.1 (0.9058 vs 0.8986), and the FP rate dropped from 2.39% to 1.87%. More importantly, the quality of errors changed completely β€” see Error Analysis Comparison below.

What changed from v0.1

v0.1 achieved 0.93 F1 / 0.99 AUC-ROC but had two systematic problems revealed through error analysis of 624 misclassified test samples:

False positives (341): The model was confidently wrong on legitimate messages β€” 49 FPs had β‰₯0.99 spam probability. Bank transaction alerts, retail sale announcements, telecom notifications, and gaming discussions all triggered false alarms because they shared surface-level features (URLs, urgency words, phone numbers) with real spam.

False negatives (283): Short, featureless, or obfuscated spam slipped through. These messages had almost zero signal from the original 15 engineered features β€” only 1 of 283 FNs had a detected URL, zero had detected phone numbers, and average urgency score was 0.117. Many were truncated, used unicode evasion ("Y ou've got mail: new messa ge"), or were conversational-style scams.

Every architectural and training change in v0.2.1 targets one of these specific failure modes.

Architecture

Input Text
    └─► DeBERTa-v3-base encoder (gradient checkpointing enabled)
            β”œβ”€β–Ί [CLS] embedding (768d)
            └─► Attention-weighted pooling (768d)     ← NEW: learned attention over all tokens
                                                  ┐
23 Engineered Features                            β”‚
    └─► Linear(23β†’128) + LayerNorm + GELU         β”œβ”€β–Ί concat (1664d) β†’ Residual classifier β†’ logits
                                                  β”˜

What's new in the architecture

Dual pooling: [CLS] + attention-weighted pooling. v0.1 relied solely on the [CLS] token embedding, which can miss signal in short messages where one suspicious word doesn't dominate the representation. v0.2.1 adds a learned attention pooling layer that computes a weighted sum across all token embeddings. The two representations are concatenated (2Γ—768 = 1536d), giving the classifier both a global summary and a signal-focused view.

8 new engineered features targeting FN blind spots. The original 15 features (char_count, has_url, urgency_score, etc.) produced near-zero signal on false negatives. The 8 new features are designed to catch the specific evasion patterns those FNs used:

Feature Targets Description
unicode_ratio Unicode substitution ("VΓ©rΓ­fy yΓΈur Γ ccount") % of non-ASCII characters
char_entropy Short/repetitive spam Shannon entropy over character distribution
suspicious_spacing Spaced-out evasion ("m e s s a g e") Count of space-separated character sequences
leet_ratio Character substitution (l33t speak) % of characters that map to leet translations
max_digit_run Phone numbers, OTPs, account numbers Longest consecutive digit sequence
repeated_char_ratio "!!!!" or "aaaaaa" patterns Ratio of consecutive repeated characters
vocab_richness Template spam (low diversity) Unique words / total words
has_obfuscated_url Broken URLs ("httpscluesjdko") Regex detection of evasive URL patterns

Wider feature projection. 23 features β†’ 128d (was 15 β†’ 64d), with LayerNorm and GELU activation.

Residual classifier head. The 1664d combined representation passes through a bottleneck (β†’256d) with a residual block, improving gradient flow through the deeper head.

Multi-sample dropout. During training, 3 stochastic forward passes through the dropout layer are averaged, acting as a cheap ensemble. This improves probability calibration β€” v0.1 had 49 FPs at β‰₯0.99 confidence. v0.2.1 has zero.

Sequence length 256. Doubled from 128 to catch longer MMS messages and delivery scam templates that were being truncated.

Training changes

Change What Why (error analysis)
Focal loss (Ξ³=1) Replaces CrossEntropyLoss v0.1's FPs clustered at 0.85–1.0 confidence. Focal loss applies (1βˆ’p)^Ξ³ modulation that down-weights easy predictions, forcing the model to learn the hard boundary between legitimate promos and real spam.
Label smoothing (Ξ΅=0.05) Soft targets (0.025/0.975) Error analysis found mislabeled examples β€” phishing messages (SBI YONO, AnPost customs, NHS COVID, Apple Pay) incorrectly labeled as benign. Smoothing prevents the model from memorising noisy labels.
Cosine warm restarts LR restarts every 2 epochs Gives the model multiple chances to escape local minima during 8 epochs.
Threshold optimisation Sweep 0.30–0.85 on val set v0.1 used a static 0.6993 threshold. v0.2.1 finds the optimal F1 threshold each epoch.
Label audit 7 high-confidence corrections Phishing messages confirmed mislabeled as benign were corrected before training.
5:1 undersampling ~235k messages (was 3:1 / ~150k) Retains more training data while keeping manageable class imbalance.

Error Analysis Comparison

v0.2.1 was evaluated against the same error categories from v0.1's analysis. The overlap analysis tracks the exact same messages across both versions.

Targeted failure modes β€” all fixed

Failure Mode v0.1 v0.2.1 Status
High-confidence FPs (prob β‰₯ 0.99) 49 0 βœ… Eliminated
High-confidence FNs (prob < 0.10) 68 0 βœ… Eliminated
Conversational-style scam FNs 61 0 βœ… Eliminated
Obfuscated text FNs 14 0 βœ… Eliminated
Mislabeled phishing FPs 3 1 βœ… Nearly eliminated

Overlap analysis β€” how many v0.1 errors are actually fixed?

False Positives False Negatives
v0.1 errors 341 283
Fixed in v0.2.1 330 (96.8%) 240 (84.8%)
Still broken 11 43
New in v0.2.1 596 491

96.8% of v0.1's false positives and 84.8% of v0.1's false negatives are resolved. The remaining v0.2.1 errors are predominantly near-threshold cases (274 FNs scoring 0.40–0.67) and short/truncated messages (288 FNs under 40 characters) β€” diffuse, hard cases rather than systematic failures.

Calibration improvement

Metric v0.1 v0.2.1
FP mean probability 0.8963 0.8009 (↓ less confident)
FN mean probability 0.3469 0.4829 (↑ closer to boundary)

The model is no longer confidently wrong in either direction. Errors are concentrated near the decision threshold, which is the expected behavior of a well-calibrated classifier.

Training details

Setting v0.1 v0.2.1
Base model deberta-v3-base deberta-v3-base
Max sequence length 128 256
Batch size (per GPU) 16 8
Gradient accumulation 8 16
Effective batch size 512 512
Epochs 6 8
Learning rate (encoder) 2e-5 2e-5
Learning rate (head) 1e-3 1e-3
LR schedule Cosine + warmup Cosine warm restarts (Tβ‚€=2 epochs)
Warmup ratio 0.1 0.1
Loss function CrossEntropyLoss FocalLoss(Ξ³=1.0, smoothing=0.05)
Class weighting Balanced Balanced
Precision bfloat16 bfloat16
Gradient checkpointing No Yes
Multi-sample dropout β€” 3 passes
Engineered features 15 23
Training time ~45 min ~189 min
Hardware 4Γ— RTX 3090 4Γ— RTX 3090

Dataset

Trained on a curated English-only dataset compiled from 4+ source datasets (SpamDam, Discord, SmishTank, and others) covering SMS spam, smishing, and phishing messages. The raw dataset contains 700k messages, undersampled to a 5:1 benign-to-spam ratio (235k training samples).

v0.2.1 includes a label audit pass where 7 high-confidence mislabeled examples (phishing messages incorrectly labeled as benign) were corrected before training.

Label mapping: 0 = benign, 1 = spam/smishing

Usage

Installation

pip install torch transformers scikit-learn joblib sentencepiece

Inference

import re
import math
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib

# ── Feature extraction (must match training) ────────────────────────────────

URGENCY_WORDS = {
    "urgent", "immediately", "expires", "verify", "confirm", "suspended",
    "locked", "alert", "action required", "limited time", "click here",
    "act now", "final notice", "winner", "prize", "claim", "free",
    "blocked", "deactivated", "unusual activity",
}
URL_PATTERN      = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED        = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN    = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN    = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$£€₹Β₯]|(usd|gbp|eur|inr)', re.I)
LEET_MAP         = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL   = re.compile(
    r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
    r"|(?:h\s*t\s*t\s*p)"
    r"|(?:www\s*\.\s*\w)"
    r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD      = re.compile(r"\b(?:\w\s){3,}\w\b")


def extract_features(text):
    """Extract all 23 features for a single message."""
    words   = text.split()
    letters = [c for c in text if c.isalpha()]
    chars   = list(text)
    n       = len(chars)

    # Original 15 features
    original = [
        len(text),
        len(words),
        sum(len(w) for w in words) / max(len(words), 1),
        sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
        sum(1 for c in text if c.isdigit()) / max(len(text), 1),
        sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
        text.count('!'),
        text.count('?'),
        int(bool(URL_PATTERN.search(text))),
        len(URL_PATTERN.findall(text)),
        int(any(d in text.lower() for d in SHORTENED)),
        int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
        int(bool(EMAIL_PATTERN.search(text))),
        int(bool(CURRENCY_PATTERN.search(text))),
        sum(1 for w in URGENCY_WORDS if w in text.lower()),
    ]

    # 8 new features (v0.2.1)
    non_ascii = sum(1 for c in chars if ord(c) > 127)
    counts = Counter(text.lower())
    entropy = -sum((c/n) * math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
    translated = text.translate(LEET_MAP)
    leet_changes = sum(1 for a, b in zip(text, translated) if a != b)
    max_drun, cur = 0, 0
    for c in chars:
        if c.isdigit(): cur += 1; max_drun = max(max_drun, cur)
        else: cur = 0
    repeats = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0

    new = [
        non_ascii / max(n, 1),                                       # unicode_ratio
        entropy,                                                     # char_entropy
        len(SPACED_WORD.findall(text)),                              # suspicious_spacing
        leet_changes / max(n, 1),                                    # leet_ratio
        max_drun,                                                    # max_digit_run
        repeats / max(n - 1, 1),                                     # repeated_char_ratio
        len(set(w.lower() for w in words)) / max(len(words), 1),     # vocab_richness
        int(bool(OBFUSCATED_URL.search(text))),                      # has_obfuscated_url
    ]

    return original + new


# ── Model definition ─────────────────────────────────────────────────────────

class AttentionPooling(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size),
            nn.Tanh(),
            nn.Linear(hidden_size, 1, bias=False),
        )

    def forward(self, hidden_states, attention_mask):
        scores = self.attention(hidden_states).squeeze(-1)
        scores = scores.masked_fill(attention_mask == 0, float("-inf"))
        weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
        return (hidden_states * weights).sum(dim=1)


class DeBERTaWithFeaturesV2(nn.Module):
    def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
        super().__init__()
        self.deberta = AutoModel.from_pretrained(model_name)
        H = self.deberta.config.hidden_size
        self.attn_pool = AttentionPooling(H)
        feat_dim = 128
        self.feature_proj = nn.Sequential(
            nn.Linear(num_extra_features, feat_dim),
            nn.LayerNorm(feat_dim), nn.GELU(), nn.Dropout(dropout),
        )
        combined_dim = 2 * H + feat_dim
        bottleneck = 256
        self.fc1 = nn.Linear(combined_dim, bottleneck)
        self.ln1 = nn.LayerNorm(bottleneck)
        self.residual_block = nn.Sequential(
            nn.Linear(bottleneck, bottleneck), nn.LayerNorm(bottleneck),
            nn.GELU(), nn.Dropout(dropout),
            nn.Linear(bottleneck, bottleneck), nn.LayerNorm(bottleneck),
        )
        self.dropout = nn.Dropout(dropout)
        self.output_head = nn.Linear(bottleneck, num_labels)

    def forward(self, input_ids, attention_mask, extra_features):
        out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
        hidden = out.last_hidden_state
        cls_emb  = hidden[:, 0, :]
        attn_emb = self.attn_pool(hidden, attention_mask)
        feat = self.feature_proj(extra_features)
        combined = torch.cat([cls_emb, attn_emb, feat], dim=1)
        x = F.gelu(self.ln1(self.fc1(combined)))
        x = x + self.residual_block(x)
        return self.output_head(self.dropout(x))


# ── Load model ───────────────────────────────────────────────────────────────

model_id  = "notd5a/deberta-v3-malicious-sms-mms-detector"
device    = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler    = joblib.load(hf_hub_download(model_id, "scaler.pkl"))

model = DeBERTaWithFeaturesV2(model_id)
state = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state)
model.to(device).eval()

# Load optimised threshold
with open(hf_hub_download(model_id, "threshold.json")) as f:
    THRESHOLD = json.load(f)["threshold"]


# ── Predict ──────────────────────────────────────────────────────────────────

def predict(texts):
    if isinstance(texts, str):
        texts = [texts]

    enc = tokenizer(texts, max_length=256, padding="max_length",
                    truncation=True, return_tensors="pt")
    raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
    scaled    = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)

    with torch.no_grad():
        logits = model(enc["input_ids"].to(device), enc["attention_mask"].to(device), scaled)
        probs  = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()

    return [{"text": t, "label": int(p >= THRESHOLD),
             "prob_spam": round(float(p), 4),
             "prediction": "spam/smishing" if p >= THRESHOLD else "benign"}
            for t, p in zip(texts, probs)]


# ── Example ──────────────────────────────────────────────────────────────────

results = predict([
    "Your account has been suspended. Verify immediately: http://bit.ly/abc123",
    "Hey, are you free for lunch tomorrow?",
    "Y ou've got mail: new messa ge w7",
    "Flat 30% OFF on all ethnic wear! Shop now at bit.ly/sale2026",
    "click httpscluesjdko to download app",
])
for r in results:
    print(r)

Engineered features reference

All 23 features, computed at inference time from the raw message text:

Original 15 features (v0.1)

# Feature Type Description
1 char_count int Total character count
2 word_count int Total word count (whitespace split)
3 avg_word_length float Mean word length
4 uppercase_ratio float Uppercase letters / all letters
5 digit_ratio float Digits / total characters
6 special_char_ratio float Non-alphanumeric, non-space / total characters
7 exclamation_count int Count of !
8 question_mark_count int Count of ?
9 has_url binary Contains URL pattern
10 url_count int Number of URLs detected
11 has_shortened_url binary Contains bit.ly, t.co, etc.
12 has_phone_number binary Contains phone number (β‰₯7 digits)
13 has_email binary Contains email address
14 has_currency binary Contains currency symbol or code
15 urgency_score int Count of urgency keywords matched

New 8 features (v0.2.1)

# Feature Type Targets Description
16 unicode_ratio float Unicode evasion Non-ASCII characters / total characters
17 char_entropy float Template spam Shannon entropy over character distribution
18 suspicious_spacing int Spaced-out evasion Count of "m e s s a g e" style patterns
19 leet_ratio float L33t speak Characters that map to leet translations / total
20 max_digit_run int Embedded numbers Longest consecutive digit sequence
21 repeated_char_ratio float Exclamation spam Consecutive repeated chars / (length βˆ’ 1)
22 vocab_richness float Low-diversity spam Unique words / total words
23 has_obfuscated_url binary Broken URLs Detects evasive URL patterns

v0.1 error analysis methodology

The v0.1 β†’ v0.2.1 changes were motivated by a structured error analysis on v0.1's 18,995-sample test set:

                        v0.1 Test Set (18,995 samples)
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  Predicted       β”‚  Predicted       β”‚
                β”‚  Benign          β”‚  Spam            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ True Benign   β”‚  13,905 (TN)     β”‚    341 (FP)      β”‚
β”‚ True Spam     β”‚    283 (FN)      β”‚  4,466 (TP)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

False positive breakdown (341 β†’ benign messages flagged as spam)

The FPs clustered at high confidence (49 with prob β‰₯ 0.99) and fell into distinct categories:

  • Retail/brand promotions (~60): Legitimate sale announcements sharing surface features with spam (URLs, urgency words, "FLAT 30% OFF")
  • Gaming/internet discussion (~30): Discord messages about GTA, Battlefield, CoD β€” high digit ratios and uppercase triggered false alarms
  • Bank transaction alerts (~20): Nigerian bank debit notifications with account numbers, amounts, uppercase text
  • Telecom notifications (~15): Carrier data plan alerts, customer service messages
  • Mislabeled phishing (~3): Actual phishing (SBI YONO, NHS COVID, Apple Pay) incorrectly labeled benign β€” the model got these right

False negative breakdown (283 β†’ spam that slipped through)

  • Near-threshold (120): Spam scoring 0.40–0.70, just below the decision boundary
  • Conversational scams (61): Normal-sounding text hiding malicious intent
  • Truncated/tiny (23): Messages too short for any model to classify
  • Low-signal promo spam (14): Marketing-style spam lacking typical spam indicators
  • Obfuscated text (14): "Y ou've got mail: new messa ge w7", "httpscluesjdko"

Files

File Description
pytorch_model.pt Model weights (DeBERTaWithFeaturesV2)
tokenizer/ Saved DeBERTa tokenizer
scaler.pkl StandardScaler fitted on 23 training features
threshold.json Optimised classification threshold (0.67)
config.json DeBERTa base config
training_history.csv Per-epoch metrics for all 8 epochs

Limitations

  • English-only. Non-English messages in the training data were filtered; the model may misclassify non-English spam or flag non-English legitimate messages.
  • Optimised for SMS/MMS (≀256 tokens). Longer content like emails will be truncated.
  • Short message weakness. Messages under 40 characters are the dominant remaining failure mode (288 FNs). There is insufficient text for either DeBERTa or the engineered features to provide signal.
  • Promotional boundary. Legitimate marketing with aggressive language (flash sales, urgency CTAs) can still trigger false positives, though at lower confidence than v0.1.
  • Evasion arms race. Novel obfuscation techniques not in training data will reduce recall over time.
  • No sender metadata. The model operates on message text only β€” sender reputation, short codes, and carrier signals are not available.

Version history

Version Date F1 AUC Key changes
v0.1 2026-03 0.9299 0.9906 Initial release, CLS pooling, 15 features, CrossEntropyLoss
v0.2.1 2026-03 0.9022 0.9857 Attention pooling, 23 features, focal loss (Ξ³=1), label audit, 5:1 undersampling. Eliminated all high-confidence errors and obfuscation/conversational blind spots.

License

CC BY-NC 4.0 β€” free for research and non-commercial use. Commercial use requires explicit permission.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for notd5a/deberta-v3-malicious-sms-mms-detector-v0.2.1

Finetuned
(589)
this model