Koda-NIDS: XGBoost Intrusion Detection System
Koda-NIDS is a high-performance gradient-boosted decision tree classifier designed to identify malicious network activity. Built on the UNSW-NB15 dataset, it provides a robust defense mechanism for identifying various network attacks through behavioral flow analysis.
Model Details
| Field | Value |
|---|---|
| Model Name | Koda-NIDS |
| Model Type | XGBoost Classifier |
| Task | Binary Network Intrusion Detection (NIDS) |
| Input Features | 42 network flow features (duration, protocols, TTL, etc.) |
| Target | label β 0: Normal, 1: Attack |
Performance Summary
Evaluated on UNSW_NB15_testing-set.csv (175,341 samples), Koda-NIDS demonstrates strong generalization and exceptional precision in identifying threats.
Overall
| Metric | Value |
|---|---|
| Test Accuracy | 89.54% |
| Macro Avg Precision | 0.88 |
| Macro Avg Recall | 0.92 |
| Macro Avg F1-Score | 0.89 |
| Weighted Avg F1-Score | 0.90 |
Per-class breakdown
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
0 Normal |
0.76 | 0.98 | 0.86 | 56,000 |
1 Attack |
0.99 | 0.86 | 0.92 | 119,341 |
Top 5 Predictive Features
| Rank | Feature | Importance | Description |
|---|---|---|---|
| 1 | sttl |
34.2% | Source Time-to-Live β high importance in identifying packet origin anomalies |
| 2 | ct_dst_sport_ltm |
10.7% | Concentration of connections to the same destination and source port |
| 3 | tcprtt |
9.1% | TCP round-trip time metrics |
| 4 | ct_dst_src_ltm |
5.5% | Density of connections between specific host pairs |
| 5 | synack |
5.3% | Timing of the TCP three-way handshake |
Usage & Deployment
To deploy Koda-NIDS, ensure you have both the model weights (xgb_intrusion_model.json) and the associated label encoders (label_encoders.pkl).
Python Implementation
import xgboost as xgb
import pickle
import pandas as pd
# 1. Load the Koda-NIDS brain
model = xgb.XGBClassifier()
model.load_model("xgb_intrusion_model.json")
# 2. Load the specific encoders for proto, service, and state
with open("label_encoders.pkl", "rb") as f:
encoders = pickle.load(f)
# 3. Predict
# Ensure your input data is preprocessed using the exact same logic
# found in the training pipeline.
Preprocessing Requirements
The model expects raw network features to be cleaned as follows:
- String Normalization β Categorical features (
proto,service,state) must be lowercased and stripped of whitespace. - Label Encoding β Apply the mappings from
label_encoders.pkl. - Imputation β Missing numerical values should be filled with
0. Unseen categorical labels should be mapped to the missing category defined during training.
Limitations & Scope
- Extraction Dependency β Koda-NIDS requires pre-extracted flow features. It does not ingest raw
.pcapfiles directly without a feature extraction layer. - Protocol Drift β Trained on 2015 network behaviors; effectiveness against ultra-modern evasion techniques may vary.
Citation
If you use Koda-NIDS in your research, please cite the underlying dataset:
Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015.