// Conference Visualization — DSMOTE

Dynamic SMOTE: Interactive Visual Explainer

A Hybrid Oversampling Framework for NIDS Class Imbalance

SMOTE is Blind

// Standard SMOTE interpolates between any two minority samples — crossing cluster boundaries

⚠ Standard SMOTE — Noise Generation

✓ DSMOTE — Cluster-Aware Sampling

Cluster A — Minority

Cluster B — Minority

SMOTE — Noise points (wrong region)

DSMOTE — Safe synthetic points

// Key Message

SMOTE selects two minority samples at random — regardless of which cluster they belong to — and interpolates between them. This creates points in empty space between clusters, introducing noise and confusion for the classifier.

DSMOTE Algorithm Pipeline

// Click each step to explore the details

✂️

Majority Reduction

→

📉

PCA Reduction

→

🔵

KMeans Clustering

→

⚡

Smart Sampling

→

🎯

Density Filter

→

⚖️

Class Weights

👆 Click a Step Above

Each step in DSMOTE solves a specific problem. Click any step icon to learn what it does and why it matters.

Pipeline Data Flow

// How raw data transforms through each stage

Step 3 — KMeans Clustering of Minority Class

// DSMOTE understands the internal structure of minority classes

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Uncolored (before clustering)

// Key Message

Instead of treating all minority samples as one blob, DSMOTE uses KMeans to discover sub-groups. Synthetic samples are then generated within each cluster — not across them.

Step 5 — Density Constraint Filtering

// Only synthetic points within the mean intra-cluster distance are accepted

Original minority points

✓ Accepted synthetic points

✗ Rejected (outside d_mean)

d_mean radius boundary

// The Innovation

A synthetic sample x_new is accepted only if ‖x_new − x_i‖ ≤ d_mean. This density gate keeps new points inside the safe zone — preventing noise, overfitting, and cluster bleeding.

if ‖x_new − xᵢ‖₂ ≤ d_mean → ACCEPT ✓ else → REJECT ✗

Before vs After DSMOTE — KDD Cup 99

// Class distribution transformation after applying DSMOTE

⚠ Before — Severe Imbalance

✓ After DSMOTE — Balanced

// Impact

Minority classes like pod (264 samples) and warezclient (1,020 samples) are boosted to ~256K–264K samples, achieving near-parity with the majority class after controlled reduction.

Macro-F1 Score Comparison — UNSW-NF Dataset

// DSMOTE vs conventional oversampling methods

ROS

SMOTE

Borderline-SMOTE

ADASYN

DSMOTE (Proposed)

DSMOTE BEST F1

0.588

RF Model

SMOTE BEST F1

0.096

RF / DT Model

IMPROVEMENT

6.1×

vs SMOTE RF

BAL. ACCURACY

0.573

DSMOTE RF

Macro-F1 by Model — All Methods (UNSW-NF)

// Real experimental results — DSMOTE is the ONLY method that meaningfully works on UNSW

RAW

ROS

SMOTE

BSMOTE

ADASYN

★ DSMOTE

Confusion Matrix Collapse — SMOTE vs DSMOTE (UNSW-NF, RF)

// SMOTE predicts EVERYTHING as "Benign" — DSMOTE correctly distributes predictions

⚠ SMOTE — Total Collapse (F1: 0.096)

✓ DSMOTE — Proper Distribution (F1: 0.588)

// The Collapse Problem

Under SMOTE, RF learns to predict every single sample as "Benign" (91.9% accuracy by doing nothing). DSMOTE forces the model to actually learn minority attack classes — Exploits, Fuzzers, Backdoor, Shellcode — that matter for security.

Balanced Accuracy — DSMOTE vs All Methods (UNSW-NF)

// Balanced accuracy weights each class equally — exposes true minority class performance

DSMOTE RF F1

0.9954

Matches RAW best

DSMOTE ANN F1

0.9285

+0.013 vs SMOTE ANN

BAL. ACCURACY

0.9940

DSMOTE RF

G-MEAN

0.9939

DSMOTE RF

Macro-F1 by Model — All Methods (KDD Cup 99)

// KDD is harder to fail on — DSMOTE matches top performance while improving minority class stability

RAW

ROS

SMOTE

BSMOTE

★ DSMOTE

Multi-Metric Radar — Best Models Comparison (KDD)

// DSMOTE (RF) vs RAW (RF): Accuracy · Balanced Acc · F1 · G-Mean · Precision · Recall

RAW RF

DSMOTE RF

SMOTE RF

KDD — G-Mean Comparison Across Models

// G-Mean = geometric mean of per-class recalls — punishes models that ignore minority classes