// Conference Visualization — DSMOTE

Dynamic SMOTE: Interactive Visual Explainer

A Hybrid Oversampling Framework for NIDS Class Imbalance

SMOTE is Blind
// Standard SMOTE interpolates between any two minority samples — crossing cluster boundaries
⚠ Standard SMOTE — Noise Generation
✓ DSMOTE — Cluster-Aware Sampling
Cluster A — Minority
Cluster B — Minority
SMOTE — Noise points (wrong region)
DSMOTE — Safe synthetic points
// Key Message

SMOTE selects two minority samples at random — regardless of which cluster they belong to — and interpolates between them. This creates points in empty space between clusters, introducing noise and confusion for the classifier.

DSMOTE Algorithm Pipeline
// Click each step to explore the details
✂️
Majority Reduction
📉
PCA Reduction
🔵
KMeans Clustering
Smart Sampling
🎯
Density Filter
⚖️
Class Weights

👆 Click a Step Above

Each step in DSMOTE solves a specific problem. Click any step icon to learn what it does and why it matters.

Pipeline Data Flow
// How raw data transforms through each stage
Step 3 — KMeans Clustering of Minority Class
// DSMOTE understands the internal structure of minority classes
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Uncolored (before clustering)
// Key Message

Instead of treating all minority samples as one blob, DSMOTE uses KMeans to discover sub-groups. Synthetic samples are then generated within each cluster — not across them.

Step 5 — Density Constraint Filtering
// Only synthetic points within the mean intra-cluster distance are accepted
Original minority points
✓ Accepted synthetic points
✗ Rejected (outside d_mean)
d_mean radius boundary
// The Innovation

A synthetic sample x_new is accepted only if ‖x_new − x_i‖ ≤ d_mean. This density gate keeps new points inside the safe zone — preventing noise, overfitting, and cluster bleeding.

if ‖x_new − xᵢ‖₂ ≤ d_mean → ACCEPT ✓ else → REJECT ✗
Before vs After DSMOTE — KDD Cup 99
// Class distribution transformation after applying DSMOTE

⚠ Before — Severe Imbalance

✓ After DSMOTE — Balanced

// Impact

Minority classes like pod (264 samples) and warezclient (1,020 samples) are boosted to ~256K–264K samples, achieving near-parity with the majority class after controlled reduction.

Macro-F1 Score Comparison — UNSW-NF Dataset
// DSMOTE vs conventional oversampling methods
ROS
SMOTE
Borderline-SMOTE
ADASYN
DSMOTE (Proposed)
DSMOTE BEST F1
0.588
RF Model
SMOTE BEST F1
0.096
RF / DT Model
IMPROVEMENT
6.1×
vs SMOTE RF
BAL. ACCURACY
0.573
DSMOTE RF
Macro-F1 by Model — All Methods (UNSW-NF)
// Real experimental results — DSMOTE is the ONLY method that meaningfully works on UNSW
RAW
ROS
SMOTE
BSMOTE
ADASYN
★ DSMOTE
Confusion Matrix Collapse — SMOTE vs DSMOTE (UNSW-NF, RF)
// SMOTE predicts EVERYTHING as "Benign" — DSMOTE correctly distributes predictions
⚠ SMOTE — Total Collapse (F1: 0.096)
✓ DSMOTE — Proper Distribution (F1: 0.588)
// The Collapse Problem

Under SMOTE, RF learns to predict every single sample as "Benign" (91.9% accuracy by doing nothing). DSMOTE forces the model to actually learn minority attack classes — Exploits, Fuzzers, Backdoor, Shellcode — that matter for security.

Balanced Accuracy — DSMOTE vs All Methods (UNSW-NF)
// Balanced accuracy weights each class equally — exposes true minority class performance
DSMOTE RF F1
0.9954
Matches RAW best
DSMOTE ANN F1
0.9285
+0.013 vs SMOTE ANN
BAL. ACCURACY
0.9940
DSMOTE RF
G-MEAN
0.9939
DSMOTE RF
Macro-F1 by Model — All Methods (KDD Cup 99)
// KDD is harder to fail on — DSMOTE matches top performance while improving minority class stability
RAW
ROS
SMOTE
BSMOTE
★ DSMOTE
Multi-Metric Radar — Best Models Comparison (KDD)
// DSMOTE (RF) vs RAW (RF): Accuracy · Balanced Acc · F1 · G-Mean · Precision · Recall
RAW RF
DSMOTE RF
SMOTE RF
KDD — G-Mean Comparison Across Models
// G-Mean = geometric mean of per-class recalls — punishes models that ignore minority classes