De-Scam-BERTa-v3 - A Smishing & Spam Hybrid Ensemble Detector — v0.3

A fine-tuned DeBERTa-v3-base model for detecting smishing and spam SMS/MMS messages. Designed to work as part of a hybrid routing system alongside a CharCNN short-message specialist, achieving 0.9666 F1 on the combined test set.

The model combines DeBERTa's contextual understanding with 23 handcrafted text features and includes built-in explainability — attention-based token importance and feature contribution analysis for every prediction.

Model performance

Hybrid system (recommended)

The hybrid router sends short messages (<=60 chars) to CharCNN and long messages (>=120 chars) to DeBERTa, with a sigmoid-blended ensemble for messages in between.

Metric	Hybrid (CharCNN + DeBERTa)	DeBERTa-only
F1	0.9666	0.8934
Precision	0.9675	0.8833
Recall	0.9657	0.9037
AUC-ROC	0.9969	0.9794

Routing breakdown (40,251 test messages)

Route	Messages	F1
CharCNN (<=160 chars)	35,959 (89.3%)	0.9779
DeBERTa (>160 chars)	4,292 (10.7%)	0.9231
Combined	40,251	0.9666

Confusion matrices

Hybrid system:

                Predicted Benign    Predicted Spam
True Benign             28,817               359
True Spam                  380            10,695

DeBERTa-only (@ optimised threshold 0.56):

                Predicted Benign    Predicted Spam
True Benign             27,854             1,322
True Spam                1,066            10,009

Error analysis (XAI)

Explainable AI was run on all 739 misclassified messages to identify systematic failure patterns.

Summary

	False Positives	False Negatives
Count	359	380
Mean spam prob	0.7390	0.3380
Confident errors	305 (85%)	193 (51%)
Primary route	DeBERTa (60%)	CharCNN (65%)

Error concentration by model

Model	Messages	FP	FN	Total Errors	Error Rate
CharCNN	35,959	142	247	389	1.08%
DeBERTa	4,292	217	133	350	8.15%

DeBERTa has a 7.5× higher error rate than CharCNN despite handling far fewer messages.

False positives — benign messages flagged as spam

FPs are overwhelmingly legitimate service messages that share surface features with spam. 85% are high-confidence errors (probability > 0.6).

Top misleading features (by mean |z-score| across FP errors):

Feature	Mean z-score	% Notable
`has_email`	+19.35	100%
`has_shortened_url`	+6.84	100%
`has_currency`	+5.22	100%
`has_phone_number`	+4.61	100%
`has_url` / `url_count`	+3.08	100%
`digit_ratio`	+2.89	75%
`has_obfuscated_url`	+2.70	100%
`urgency_score`	+1.62	100%

Typical FP categories: bank transaction alerts with account numbers and currency symbols, delivery notifications with tracking URLs, promotional service messages with opt-out codes, and appointment/notification SMS from legitimate services.

False negatives — spam messages missed as benign

FNs are split between borderline (49%) and confident (51%) errors. 65% are routed to CharCNN, making short-message spam the primary gap.

Key patterns in missed spam:

Pattern	Description
Conversational spam	Reads like casual chat with no traditional spam signals — no URLs, no urgency words
Non-English spam	Spanish and other languages with currency symbols slip through the English-trained model
Social engineering	Sophisticated scams disguised as friendly messages or legitimate requests
Truncated/ambiguous	Short spam fragments that lack enough context to classify

Feature analysis: FN messages have weaker spam signals overall. Features like has_url and urgency_score — which are strong spam indicators — appear at much lower rates in FN errors compared to correctly-caught spam, confirming that these are structurally different from typical spam.

Key takeaways

DeBERTa is the weak link. Its 8.15% error rate on longer messages (vs CharCNN's 1.08%) is the primary area for improvement.
Feature overlap is the core FP problem. Legitimate service messages (bank alerts, delivery notifications) use the same features as spam (URLs, phone numbers, urgency language, currency symbols). The model cannot distinguish intent from surface signals alone.
CharCNN misses subtle spam. Short conversational-style spam without traditional indicators is the main FN source. Character-level features alone lack the semantic understanding needed for these cases.
Non-English content is a blind spot. The English-only training set causes systematic FNs on multilingual spam.

Architecture

Hybrid routing system

Input Message
    |
    ├── len <= 60 chars ──────────► CharCNN (100%)
    |
    ├── 60 < len < 120 chars ────► Sigmoid ensemble blend
    |                                 prob = (1-w)*cnn + w*deberta
    |                                 w = sigmoid((len - 90) / 10)
    |
    └── len >= 120 chars ─────────► DeBERTa (100%)

DeBERTa single-head classifier

Input Text
    └─► DeBERTa-v3-base encoder (gradient checkpointed)
            ├─► [CLS] embedding (768d)
            └─► Attention-weighted pooling (768d)
                                                    ┐
23 Engineered Features                              │
    └─► Linear(23→128) + LayerNorm + GELU           ├─► concat (1664d)
                                                    ┘
        └─► Linear(1664→256) + LayerNorm + GELU
            └─► Residual block (256d → 256d)
                └─► Linear(256→2) → spam logits

CharCNN short-message specialist

Character IDs (max 160 chars)
    └─► Embedding(92, 64)
        ├─► Conv1d(64, 128, kernel=2) + BN + ReLU → MaxPool
        ├─► Conv1d(64, 128, kernel=3) + BN + ReLU → MaxPool
        └─► Conv1d(64, 128, kernel=5) + BN + ReLU → MaxPool
                                                            ┐
            Concat (384d)                                   │
                                                            ├─► concat (448d)
23 Features → Linear(23→64) + LayerNorm + GELU (64d)       │
                                                            ┘
    └─► Linear(448→128) + LayerNorm + GELU + Dropout
        └─► Linear(128→2) → spam logits

Usage

Quick start (DeBERTa-only from HuggingFace)

import re
import math
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib

# ── Feature extraction (must match training) ────────────────────────────────

URGENCY_WORDS = {
    "urgent", "immediately", "expires", "verify", "confirm", "suspended",
    "locked", "alert", "action required", "limited time", "click here",
    "act now", "final notice", "winner", "prize", "claim", "free",
    "blocked", "deactivated", "unusual activity",
}
URL_PATTERN      = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED        = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN    = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN    = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$\xa3\u20ac\u20b9\xa5]|(usd|gbp|eur|inr)', re.I)
LEET_MAP         = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL   = re.compile(
    r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
    r"|(?:h\s*t\s*t\s*p)"
    r"|(?:www\s*\.\s*\w)"
    r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD      = re.compile(r"\b(?:\w\s){3,}\w\b")


def extract_features(text):
    """Extract all 23 features for a single message."""
    words   = text.split()
    letters = [c for c in text if c.isalpha()]
    chars   = list(text)
    n       = len(chars)

    original = [
        len(text), len(words),
        sum(len(w) for w in words) / max(len(words), 1),
        sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
        sum(1 for c in text if c.isdigit()) / max(len(text), 1),
        sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
        text.count('!'), text.count('?'),
        int(bool(URL_PATTERN.search(text))), len(URL_PATTERN.findall(text)),
        int(any(d in text.lower() for d in SHORTENED)),
        int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
        int(bool(EMAIL_PATTERN.search(text))), int(bool(CURRENCY_PATTERN.search(text))),
        sum(1 for w in URGENCY_WORDS if w in text.lower()),
    ]

    non_ascii = sum(1 for c in chars if ord(c) > 127)
    counts = Counter(text.lower())
    entropy = -sum((c/n)*math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
    translated = text.translate(LEET_MAP)
    leet = sum(1 for a, b in zip(text, translated) if a != b)
    mdr, cr = 0, 0
    for c in chars:
        if c.isdigit(): cr += 1; mdr = max(mdr, cr)
        else: cr = 0
    reps = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0

    new = [
        non_ascii / max(n, 1), entropy,
        len(SPACED_WORD.findall(text)), leet / max(n, 1), mdr,
        reps / max(n-1, 1),
        len(set(w.lower() for w in words)) / max(len(words), 1),
        int(bool(OBFUSCATED_URL.search(text))),
    ]
    return original + new


# ── Model definition ─────────────────────────────────────────────────────────

class AttentionPooling(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1, bias=False),
        )

    def forward(self, hidden_states, attention_mask):
        scores = self.attention(hidden_states).squeeze(-1)
        scores = scores.masked_fill(attention_mask == 0, float("-inf"))
        weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
        return (hidden_states * weights).sum(dim=1)


class DeBERTaSingleHead(nn.Module):
    def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
        super().__init__()
        self.deberta = AutoModel.from_pretrained(model_name)
        H = self.deberta.config.hidden_size
        self.attn_pool = AttentionPooling(H)
        feat_dim = 128
        self.feature_proj = nn.Sequential(
            nn.Linear(num_extra_features, feat_dim), nn.LayerNorm(feat_dim),
            nn.GELU(), nn.Dropout(dropout),
        )
        combined_dim = 2 * H + feat_dim  # 768*2 + 128 = 1664
        self.fc1 = nn.Linear(combined_dim, 256)
        self.ln1 = nn.LayerNorm(256)
        self.residual_block = nn.Sequential(
            nn.Linear(256, 256), nn.LayerNorm(256),
            nn.GELU(), nn.Dropout(dropout),
            nn.Linear(256, 256), nn.LayerNorm(256),
        )
        self.dropout = nn.Dropout(dropout)
        self.output_head = nn.Linear(256, num_labels)

    def forward(self, input_ids, attention_mask, extra_features):
        out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
        hidden = out.last_hidden_state
        cls_emb  = hidden[:, 0, :]
        attn_emb = self.attn_pool(hidden, attention_mask)
        feat     = self.feature_proj(extra_features)
        x = torch.cat([cls_emb, attn_emb, feat], dim=1)
        x = F.gelu(self.ln1(self.fc1(x)))
        x = x + self.residual_block(x)
        return self.output_head(self.dropout(x))


# ── Load model ───────────────────────────────────────────────────────────────

model_id  = "notd5a/deberta-v3-malicious-sms-mms-detector"
device    = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler    = joblib.load(hf_hub_download(model_id, "scaler.pkl"))

model = DeBERTaSingleHead(model_id)
state = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state)
model.float().to(device).eval()

with open(hf_hub_download(model_id, "threshold.json")) as f:
    thresholds = json.load(f)
SPAM_THRESHOLD = thresholds["optimal_threshold"]


# ── Predict ──────────────────────────────────────────────────────────────────

def predict(texts):
    if isinstance(texts, str):
        texts = [texts]

    enc = tokenizer(texts, max_length=128, padding="max_length",
                    truncation=True, return_tensors="pt")
    raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
    scaled = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)

    with torch.no_grad():
        logits = model(
            enc["input_ids"].to(device),
            enc["attention_mask"].to(device),
            scaled,
        )
        spam_probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()

    return [{
        "text":             t,
        "prediction":       "spam" if sp >= SPAM_THRESHOLD else "benign",
        "is_spam":          bool(sp >= SPAM_THRESHOLD),
        "spam_probability": round(float(sp), 4),
    } for t, sp in zip(texts, spam_probs)]


# ── Example ──────────────────────────────────────────────────────────────────

results = predict([
    "Your account has been suspended. Verify immediately: http://bit.ly/abc123",
    "Hey, are you free for lunch tomorrow?",
    "Flat 30% OFF on all ethnic wear! Shop now at bit.ly/sale2026",
])
for r in results:
    flag = "SPAM" if r["is_spam"] else "benign"
    print(f"  [{flag}] (spam: {r['spam_probability']:.3f}) {r['text'][:80]}")

Hybrid inference from HuggingFace (recommended)

Download the full repo and run the hybrid router for best performance:

# Clone the repo
git lfs install
git clone https://huggingface.co/notd5a/deberta-v3-malicious-sms-mms-detector
cd deberta-v3-malicious-sms-mms-detector

# Install dependencies
pip install torch transformers scikit-learn joblib sentencepiece

# Run hybrid inference (repo root = DeBERTa dir, charcnn/ = CharCNN dir)
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended. Verify at bit.ly/xyz"

# With JSON output
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended" \
    --json

# With explainability (token importance + feature contributions)
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --text "Your account has been suspended. Verify at bit.ly/xyz" \
    --explain

# Batch inference on CSV
python hybrid_router_inference.py \
    --deberta_dir . \
    --short_dir charcnn \
    --input test_messages.csv \
    --output predictions.csv

Programmatic API

from hybrid_router_inference import HybridDetector

# From a cloned HuggingFace repo:
detector = HybridDetector.load(deberta_dir=".", short_dir="charcnn")

# Or from local training directories:
detector = HybridDetector.load(
    deberta_dir="model_output_v2.4",
    short_dir="cnn_model_v3",
)

# Single classification
result = detector.classify("Win a free iPhone! Click here now!")
# {
#     "text": "Win a free iPhone! Click here now!",
#     "prediction": "spam",
#     "is_spam": True,
#     "spam_probability": 0.9812,
#     "model_used": "charcnn",
#     "routing_reason": "Routed to CharCNN (length 34 <= 60)"
# }

# With explainability
result = detector.classify(
    "Your Chase account has been locked. Verify: chase-secure.com/verify",
    explain=True,
)
# Adds: token_importance, feature_contributions, explanation

# Batch prediction
results = detector.predict([
    "Hey, are you coming to dinner tonight?",
    "URGENT: Your bank account has been compromised. Act now!",
    "Your package is on the way! Track: amzn.to/3xK9",
])

Explainability (XAI)

Every prediction can include an explanation showing why the model made its decision:

Token importance — attention weights mapped back to input tokens, showing which words the model focused on
Feature contributions — z-scores for each engineered feature, highlighting which are unusual compared to the training population
Explanation string — human-readable summary combining both signals

Features with |z-score| > 1.5 are flagged as notable contributors.

Training details

Setting	Value
Base model	`microsoft/deberta-v3-base`
Max sequence length	128
Batch size (per GPU)	8
Gradient accumulation	4
Effective batch size	128
Epochs	10 (best @ epoch 9)
LR (encoder)	2e-5
LR (head)	1e-3
LR schedule	CosineAnnealingLR
Warmup ratio	0.1
Loss	FocalLoss(gamma=1.5, smoothing=0.05)
R-Drop alpha	0.3
FGM epsilon	0.5
EMA decay	0.995
Multi-sample dropout	3 passes
Precision	bfloat16
Gradient checkpointing	Yes
Engineered features	23
Hardware	4x NVIDIA H200 SXM
Training time	194 minutes

Training progression

Epoch	Train Loss	Val F1 (opt)	Val AUC	Threshold
1	0.1488	0.8799	0.9749	0.59
2	0.1377	0.8855	0.9772	0.555
3	0.1341	0.8918	0.9785	0.57
4	0.1342	0.8924	0.9792	0.585
5	0.1319	0.8932	0.9797	0.54
6	0.1325	0.8952	0.9797	0.57
7	0.1305	0.8960	0.9800	0.57
8	0.1307	0.8961	0.9803	0.575
9	0.1301	0.8969	0.9803	0.56
10	0.1290	0.8965	0.9803	0.555

Dataset

Trained on 268,340 English SMS/MMS messages:

Category	Count	Share
Benign	194,504	72.5%
Spam/Smishing	73,836	27.5%
Total	268,340

Benign-to-spam ratio: 2.6:1

Label mapping: label = 0 (benign), 1 (spam/smishing)

Engineered features reference

All 23 features are computed at inference time from the raw message text and standardised using a fitted StandardScaler (scaler.pkl).

Original 15 features

#	Feature	Type	Description
1	`char_count`	int	Total character count
2	`word_count`	int	Total word count
3	`avg_word_length`	float	Mean word length
4	`uppercase_ratio`	float	Uppercase letters / all letters
5	`digit_ratio`	float	Digits / total characters
6	`special_char_ratio`	float	Non-alphanumeric, non-space / total characters
7	`exclamation_count`	int	Count of `!`
8	`question_mark_count`	int	Count of `?`
9	`has_url`	binary	Contains URL pattern
10	`url_count`	int	Number of URLs detected
11	`has_shortened_url`	binary	Contains bit.ly, t.co, etc.
12	`has_phone_number`	binary	Contains phone number (>=7 digits)
13	`has_email`	binary	Contains email address
14	`has_currency`	binary	Contains currency symbol or code
15	`urgency_score`	int	Count of urgency keywords matched

Evasion detection features (v0.2+)

#	Feature	Type	Description
16	`unicode_ratio`	float	Non-ASCII characters / total characters
17	`char_entropy`	float	Shannon entropy over character distribution
18	`suspicious_spacing`	int	Count of spaced-out word patterns (e.g. "w o r d")
19	`leet_ratio`	float	Characters that map to leet translations / total
20	`max_digit_run`	int	Longest consecutive digit sequence
21	`repeated_char_ratio`	float	Consecutive repeated chars / (length - 1)
22	`vocab_richness`	float	Unique words / total words
23	`has_obfuscated_url`	binary	Detects evasive URL patterns

Model evolution

Version	Architecture	Spam F1	AUC	Key change
v0.1	DeBERTa-base + CLS + 15 features	0.9299	0.9906	Initial release
v0.2	+ attention pool + 8 features + focal loss	0.8456	0.9867	Architecture overhaul
v0.2.1	+ focal gamma=1.0 + 5:1 undersample	0.9022	0.9857	Loss/data tuning
v0.2.2	+ dataset cleanup + diverse benign	0.9096	0.9883	Data quality
v0.3	Single head + CharCNN hybrid	0.9666	0.9969	Hybrid routing system

Files

File	Description
`hybrid_router_inference.py`	Full hybrid inference pipeline (CharCNN + DeBERTa + routing + XAI)
`pytorch_model.pt`	DeBERTa model weights (DeBERTaSingleHead, epoch 9)
`tokenizer/`	Saved DeBERTa-v3-base tokenizer
`scaler.pkl`	DeBERTa feature scaler (StandardScaler, 23 features)
`threshold.json`	Optimised DeBERTa classification threshold (0.56)
`charcnn/charcnn_best.pt`	CharCNN model weights (147K params)
`charcnn/charcnn_config.json`	CharCNN architecture config + threshold (0.57)
`charcnn/charcnn_scaler.pkl`	CharCNN feature scaler
`training_history.csv`	Per-epoch DeBERTa training metrics

Limitations

English-only. Non-English messages may be misclassified.
Optimised for SMS/MMS. Messages are truncated to 128 tokens for DeBERTa, 160 characters for CharCNN.
Promotional boundary. Legitimate marketing with aggressive urgency language remains the hardest category to classify correctly.
Evasion arms race. Novel obfuscation techniques not represented in training data will reduce performance over time.
No sender metadata. Classification is based on message text only — no phone number, carrier, or frequency signals.
DeBERTa-only performance is lower. The 0.8934 F1 for DeBERTa alone reflects that many spam messages are short and better suited to CharCNN.

License

CC BY-NC 4.0 — free for research and non-commercial use. Commercial use requires explicit permission.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for notd5a/de-scam-berta-v3-hybrid-detector-v0.3

Base model

microsoft/deberta-v3-base

Finetuned

(589)

this model

notd5a
/

de-scam-berta-v3-hybrid-detector-v0.3