De-Scam-BERTa-v3 - A Smishing & Spam Hybrid Ensemble Detector β€” v0.3

A fine-tuned DeBERTa-v3-base model for detecting smishing and spam SMS/MMS messages. Designed to work as part of a hybrid routing system alongside a CharCNN short-message specialist, achieving 0.9666 F1 on the combined test set.

The model combines DeBERTa's contextual understanding with 23 handcrafted text features and includes built-in explainability β€” attention-based token importance and feature contribution analysis for every prediction.

Model performance

Hybrid system (recommended)

The hybrid router sends short messages (<=60 chars) to CharCNN and long messages (>=120 chars) to DeBERTa, with a sigmoid-blended ensemble for messages in between.

Metric Hybrid (CharCNN + DeBERTa) DeBERTa-only
F1 0.9666 0.8934
Precision 0.9675 0.8833
Recall 0.9657 0.9037
AUC-ROC 0.9969 0.9794

Routing breakdown (40,251 test messages)

Route Messages F1
CharCNN (<=160 chars) 35,959 (89.3%) 0.9779
DeBERTa (>160 chars) 4,292 (10.7%) 0.9231
Combined 40,251 0.9666

Confusion matrices

Hybrid system:

                Predicted Benign    Predicted Spam
True Benign             28,817               359
True Spam                  380            10,695

DeBERTa-only (@ optimised threshold 0.56):

                Predicted Benign    Predicted Spam
True Benign             27,854             1,322
True Spam                1,066            10,009

Error analysis (XAI)

Explainable AI was run on all 739 misclassified messages to identify systematic failure patterns.

Summary

False Positives False Negatives
Count 359 380
Mean spam prob 0.7390 0.3380
Confident errors 305 (85%) 193 (51%)
Primary route DeBERTa (60%) CharCNN (65%)

Error concentration by model

Model Messages FP FN Total Errors Error Rate
CharCNN 35,959 142 247 389 1.08%
DeBERTa 4,292 217 133 350 8.15%

DeBERTa has a 7.5Γ— higher error rate than CharCNN despite handling far fewer messages.

False positives β€” benign messages flagged as spam

FPs are overwhelmingly legitimate service messages that share surface features with spam. 85% are high-confidence errors (probability > 0.6).

Top misleading features (by mean |z-score| across FP errors):

Feature Mean z-score % Notable
has_email +19.35 100%
has_shortened_url +6.84 100%
has_currency +5.22 100%
has_phone_number +4.61 100%
has_url / url_count +3.08 100%
digit_ratio +2.89 75%
has_obfuscated_url +2.70 100%
urgency_score +1.62 100%

Typical FP categories: bank transaction alerts with account numbers and currency symbols, delivery notifications with tracking URLs, promotional service messages with opt-out codes, and appointment/notification SMS from legitimate services.

False negatives β€” spam messages missed as benign

FNs are split between borderline (49%) and confident (51%) errors. 65% are routed to CharCNN, making short-message spam the primary gap.

Key patterns in missed spam:

Pattern Description
Conversational spam Reads like casual chat with no traditional spam signals β€” no URLs, no urgency words
Non-English spam Spanish and other languages with currency symbols slip through the English-trained model
Social engineering Sophisticated scams disguised as friendly messages or legitimate requests
Truncated/ambiguous Short spam fragments that lack enough context to classify

Feature analysis: FN messages have weaker spam signals overall. Features like has_url and urgency_score β€” which are strong spam indicators β€” appear at much lower rates in FN errors compared to correctly-caught spam, confirming that these are structurally different from typical spam.

Key takeaways

  1. DeBERTa is the weak link. Its 8.15% error rate on longer messages (vs CharCNN's 1.08%) is the primary area for improvement.
  2. Feature overlap is the core FP problem. Legitimate service messages (bank alerts, delivery notifications) use the same features as spam (URLs, phone numbers, urgency language, currency symbols). The model cannot distinguish intent from surface signals alone.
  3. CharCNN misses subtle spam. Short conversational-style spam without traditional indicators is the main FN source. Character-level features alone lack the semantic understanding needed for these cases.
  4. Non-English content is a blind spot. The English-only training set causes systematic FNs on multilingual spam.

Architecture

Hybrid routing system

Input Message
    |
    β”œβ”€β”€ len <= 60 chars ──────────► CharCNN (100%)
    |
    β”œβ”€β”€ 60 < len < 120 chars ────► Sigmoid ensemble blend
    |                                 prob = (1-w)*cnn + w*deberta
    |                                 w = sigmoid((len - 90) / 10)
    |
    └── len >= 120 chars ─────────► DeBERTa (100%)

DeBERTa single-head classifier

Input Text
    └─► DeBERTa-v3-base encoder (gradient checkpointed)
            β”œβ”€β–Ί [CLS] embedding (768d)
            └─► Attention-weighted pooling (768d)
                                                    ┐
23 Engineered Features                              β”‚
    └─► Linear(23β†’128) + LayerNorm + GELU           β”œβ”€β–Ί concat (1664d)
                                                    β”˜
        └─► Linear(1664β†’256) + LayerNorm + GELU
            └─► Residual block (256d β†’ 256d)
                └─► Linear(256β†’2) β†’ spam logits

CharCNN short-message specialist

Character IDs (max 160 chars)
    └─► Embedding(92, 64)
        β”œβ”€β–Ί Conv1d(64, 128, kernel=2) + BN + ReLU β†’ MaxPool
        β”œβ”€β–Ί Conv1d(64, 128, kernel=3) + BN + ReLU β†’ MaxPool
        └─► Conv1d(64, 128, kernel=5) + BN + ReLU β†’ MaxPool
                                                            ┐
            Concat (384d)                                   β”‚
                                                            β”œβ”€β–Ί concat (448d)
23 Features β†’ Linear(23β†’64) + LayerNorm + GELU (64d)       β”‚
                                                            β”˜
    └─► Linear(448β†’128) + LayerNorm + GELU + Dropout
        └─► Linear(128β†’2) β†’ spam logits

Usage

Quick start (DeBERTa-only from HuggingFace)

import re
import math
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib

# ── Feature extraction (must match training) ────────────────────────────────

URGENCY_WORDS = {
    "urgent", "immediately", "expires", "verify", "confirm", "suspended",
    "locked", "alert", "action required", "limited time", "click here",
    "act now", "final notice", "winner", "prize", "claim", "free",
    "blocked", "deactivated", "unusual activity",
}
URL_PATTERN      = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED        = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN    = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN    = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$\xa3\u20ac\u20b9\xa5]|(usd|gbp|eur|inr)', re.I)
LEET_MAP         = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL   = re.compile(
    r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
    r"|(?:h\s*t\s*t\s*p)"
    r"|(?:www\s*\.\s*\w)"
    r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD      = re.compile(r"\b(?:\w\s){3,}\w\b")


def extract_features(text):
    """Extract all 23 features for a single message."""
    words   = text.split()
    letters = [c for c in text if c.isalpha()]
    chars   = list(text)
    n       = len(chars)

    original = [
        len(text), len(words),
        sum(len(w) for w in words) / max(len(words), 1),
        sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
        sum(1 for c in text if c.isdigit()) / max(len(text), 1),
        sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
        text.count('!'), text.count('?'),
        int(bool(URL_PATTERN.search(text))), len(URL_PATTERN.findall(text)),
        int(any(d in text.lower() for d in SHORTENED)),
        int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
        int(bool(EMAIL_PATTERN.search(text))), int(bool(CURRENCY_PATTERN.search(text))),
        sum(1 for w in URGENCY_WORDS if w in text.lower()),
    ]

    non_ascii = sum(1 for c in chars if ord(c) > 127)
    counts = Counter(text.lower())
    entropy = -sum((c/n)*math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
    translated = text.translate(LEET_MAP)
    leet = sum(1 for a, b in zip(text, translated) if a != b)
    mdr, cr = 0, 0
    for c in chars:
        if c.isdigit(): cr += 1; mdr = max(mdr, cr)
        else: cr = 0
    reps = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0

    new = [
        non_ascii / max(n, 1), entropy,
        len(SPACED_WORD.findall(text)), leet / max(n, 1), mdr,
        reps / max(n-1, 1),
        len(set(w.lower() for w in words)) / max(len(words), 1),
        int(bool(OBFUSCATED_URL.search(text))),
    ]
    return original + new


# ── Model definition ─────────────────────────────────────────────────────────

class AttentionPooling(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1, bias=False),
        )

    def forward(self, hidden_states, attention_mask):
        scores = self.attention(hidden_states).squeeze(-1)
        scores = scores.masked_fill(attention_mask == 0, float("-inf"))
        weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
        return (hidden_states * weights).sum(dim=1)


class DeBERTaSingleHead(nn.Module):
    def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
        super().__init__()
        self.deberta = AutoModel.from_pretrained(model_name)
        H = self.deberta.config.hidden_size
        self.attn_pool = AttentionPooling(H)
        feat_dim = 128
        self.feature_proj = nn.Sequential(
            nn.Linear(num_extra_features, feat_dim), nn.LayerNorm(feat_dim),
            nn.GELU(), nn.Dropout(dropout),
        )
        combined_dim = 2 * H + feat_dim  # 768*2 + 128 = 1664
        self.fc1 = nn.Linear(combined_dim, 256)
        self.ln1 = nn.LayerNorm(256)
        self.residual_block = nn.Sequential(
            nn.Linear(256, 256), nn.LayerNorm(256),
            nn.GELU(), nn.Dropout(dropout),
            nn.Linear(256, 256), nn.LayerNorm(256),
        )
        self.dropout = nn.Dropout(dropout)
        self.output_head = nn.Linear(256, num_labels)

    def forward(self, input_ids, attention_mask, extra_features):
        out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
        hidden = out.last_hidden_state
        cls_emb  = hidden[:, 0, :]
        attn_emb = self.attn_pool(hidden, attention_mask)
        feat     = self.feature_proj(extra_features)
        x = torch.cat([cls_emb, attn_emb, feat], dim=1)
        x = F.gelu(self.ln1(self.fc1(x)))
        x = x + self.residual_block(x)
        return self.output_head(self.dropout(x))


# ── Load model ───────────────────────────────────────────────────────────────

model_id  = "notd5a/deberta-v3-malicious-sms-mms-detector"
device    = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler    = joblib.load(hf_hub_download(model_id, "scaler.pkl"))

model = DeBERTaSingleHead(model_id)
state = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state)
model.float().to(device).eval()

with open(hf_hub_download(model_id, "threshold.json")) as f:
    thresholds = json.load(f)
SPAM_THRESHOLD = thresholds["optimal_threshold"]


# ── Predict ──────────────────────────────────────────────────────────────────

def predict(texts):
    if isinstance(texts, str):
        texts = [texts]

    enc = tokenizer(texts, max_length=128, padding="max_length",
                    truncation=True, return_tensors="pt")
    raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
    scaled = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)

    with torch.no_grad():
        logits = model(
            enc["input_ids"].to(device),
            enc["attention_mask"].to(device),
            scaled,
        )
        spam_probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()

    return [{
        "text":             t,
        "prediction":       "spam" if sp >= SPAM_THRESHOLD else "benign",
        "is_spam":          bool(sp >= SPAM_THRESHOLD),
        "spam_probability": round(float(sp), 4),
    } for t, sp in zip(texts, spam_probs)]


# ── Example ──────────────────────────────────────────────────────────────────

results = predict([
    "Your account has been suspended. Verify immediately: http://bit.ly/abc123",
    "Hey, are you free for lunch tomorrow?",
    "Flat 30% OFF on all ethnic wear! Shop now at bit.ly/sale2026",
])
for r in results:
    flag = "SPAM" if r["is_spam"] else "benign"
    print(f"  [{flag}] (spam: {r['spam_probability']:.3f}) {r['text'][:80]}")

Hybrid inference from HuggingFace (recommended)

Download the full repo and run the hybrid router for best performance:

# Clone the repo
git lfs install
git clone https://huggingface.co/notd5a/deberta-v3-malicious-sms-mms-detector
cd deberta-v3-malicious-sms-mms-detector

# Install dependencies
pip install torch transformers scikit-learn joblib sentencepiece

# Run hybrid inference (repo root = DeBERTa dir, charcnn/ = CharCNN dir)
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended. Verify at bit.ly/xyz"

# With JSON output
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended" \
    --json

# With explainability (token importance + feature contributions)
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended. Verify at bit.ly/xyz" \
    --explain

# Batch inference on CSV
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --input test_messages.csv \
    --output predictions.csv

Programmatic API

from hybrid_router_inference import HybridDetector

# From a cloned HuggingFace repo:
detector = HybridDetector.load(deberta_dir=".", short_dir="charcnn")

# Or from local training directories:
detector = HybridDetector.load(
    deberta_dir="model_output_v2.4",
    short_dir="cnn_model_v3",
)

# Single classification
result = detector.classify("Win a free iPhone! Click here now!")
# {
#     "text": "Win a free iPhone! Click here now!",
#     "prediction": "spam",
#     "is_spam": True,
#     "spam_probability": 0.9812,
#     "model_used": "charcnn",
#     "routing_reason": "Routed to CharCNN (length 34 <= 60)"
# }

# With explainability
result = detector.classify(
    "Your Chase account has been locked. Verify: chase-secure.com/verify",
    explain=True,
)
# Adds: token_importance, feature_contributions, explanation

# Batch prediction
results = detector.predict([
    "Hey, are you coming to dinner tonight?",
    "URGENT: Your bank account has been compromised. Act now!",
    "Your package is on the way! Track: amzn.to/3xK9",
])

Explainability (XAI)

Every prediction can include an explanation showing why the model made its decision:

  • Token importance β€” attention weights mapped back to input tokens, showing which words the model focused on
  • Feature contributions β€” z-scores for each engineered feature, highlighting which are unusual compared to the training population
  • Explanation string β€” human-readable summary combining both signals

Features with |z-score| > 1.5 are flagged as notable contributors.

Training details

Setting Value
Base model microsoft/deberta-v3-base
Max sequence length 128
Batch size (per GPU) 8
Gradient accumulation 4
Effective batch size 128
Epochs 10 (best @ epoch 9)
LR (encoder) 2e-5
LR (head) 1e-3
LR schedule CosineAnnealingLR
Warmup ratio 0.1
Loss FocalLoss(gamma=1.5, smoothing=0.05)
R-Drop alpha 0.3
FGM epsilon 0.5
EMA decay 0.995
Multi-sample dropout 3 passes
Precision bfloat16
Gradient checkpointing Yes
Engineered features 23
Hardware 4x NVIDIA H200 SXM
Training time 194 minutes

Training progression

Epoch Train Loss Val F1 (opt) Val AUC Threshold
1 0.1488 0.8799 0.9749 0.59
2 0.1377 0.8855 0.9772 0.555
3 0.1341 0.8918 0.9785 0.57
4 0.1342 0.8924 0.9792 0.585
5 0.1319 0.8932 0.9797 0.54
6 0.1325 0.8952 0.9797 0.57
7 0.1305 0.8960 0.9800 0.57
8 0.1307 0.8961 0.9803 0.575
9 0.1301 0.8969 0.9803 0.56
10 0.1290 0.8965 0.9803 0.555

Dataset

Trained on 268,340 English SMS/MMS messages:

Category Count Share
Benign 194,504 72.5%
Spam/Smishing 73,836 27.5%
Total 268,340

Benign-to-spam ratio: 2.6:1

Label mapping: label = 0 (benign), 1 (spam/smishing)

Engineered features reference

All 23 features are computed at inference time from the raw message text and standardised using a fitted StandardScaler (scaler.pkl).

Original 15 features

# Feature Type Description
1 char_count int Total character count
2 word_count int Total word count
3 avg_word_length float Mean word length
4 uppercase_ratio float Uppercase letters / all letters
5 digit_ratio float Digits / total characters
6 special_char_ratio float Non-alphanumeric, non-space / total characters
7 exclamation_count int Count of !
8 question_mark_count int Count of ?
9 has_url binary Contains URL pattern
10 url_count int Number of URLs detected
11 has_shortened_url binary Contains bit.ly, t.co, etc.
12 has_phone_number binary Contains phone number (>=7 digits)
13 has_email binary Contains email address
14 has_currency binary Contains currency symbol or code
15 urgency_score int Count of urgency keywords matched

Evasion detection features (v0.2+)

# Feature Type Description
16 unicode_ratio float Non-ASCII characters / total characters
17 char_entropy float Shannon entropy over character distribution
18 suspicious_spacing int Count of spaced-out word patterns (e.g. "w o r d")
19 leet_ratio float Characters that map to leet translations / total
20 max_digit_run int Longest consecutive digit sequence
21 repeated_char_ratio float Consecutive repeated chars / (length - 1)
22 vocab_richness float Unique words / total words
23 has_obfuscated_url binary Detects evasive URL patterns

Model evolution

Version Architecture Spam F1 AUC Key change
v0.1 DeBERTa-base + CLS + 15 features 0.9299 0.9906 Initial release
v0.2 + attention pool + 8 features + focal loss 0.8456 0.9867 Architecture overhaul
v0.2.1 + focal gamma=1.0 + 5:1 undersample 0.9022 0.9857 Loss/data tuning
v0.2.2 + dataset cleanup + diverse benign 0.9096 0.9883 Data quality
v0.3 Single head + CharCNN hybrid 0.9666 0.9969 Hybrid routing system

Files

File Description
hybrid_router_inference.py Full hybrid inference pipeline (CharCNN + DeBERTa + routing + XAI)
pytorch_model.pt DeBERTa model weights (DeBERTaSingleHead, epoch 9)
tokenizer/ Saved DeBERTa-v3-base tokenizer
scaler.pkl DeBERTa feature scaler (StandardScaler, 23 features)
threshold.json Optimised DeBERTa classification threshold (0.56)
charcnn/charcnn_best.pt CharCNN model weights (147K params)
charcnn/charcnn_config.json CharCNN architecture config + threshold (0.57)
charcnn/charcnn_scaler.pkl CharCNN feature scaler
training_history.csv Per-epoch DeBERTa training metrics

Limitations

  • English-only. Non-English messages may be misclassified.
  • Optimised for SMS/MMS. Messages are truncated to 128 tokens for DeBERTa, 160 characters for CharCNN.
  • Promotional boundary. Legitimate marketing with aggressive urgency language remains the hardest category to classify correctly.
  • Evasion arms race. Novel obfuscation techniques not represented in training data will reduce performance over time.
  • No sender metadata. Classification is based on message text only β€” no phone number, carrier, or frequency signals.
  • DeBERTa-only performance is lower. The 0.8934 F1 for DeBERTa alone reflects that many spam messages are short and better suited to CharCNN.

License

CC BY-NC 4.0 β€” free for research and non-commercial use. Commercial use requires explicit permission.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for notd5a/de-scam-berta-v3-hybrid-detector-v0.3

Finetuned
(589)
this model

Dataset used to train notd5a/de-scam-berta-v3-hybrid-detector-v0.3