anddali's picture
Upload README.md with huggingface_hub
5a13aa5 verified
---
tags:
- anomaly-detection
- intrusion-detection
- vehicle-security
- automotive
- CAN-bus
- cybersecurity
- sklearn
- xgboost
- tabular-classification
license: apache-2.0
library_name: sklearn
metrics:
- accuracy
- f1
- precision
- recall
---
# Vehicle Intrusion Detection System (IDS) — Anomaly Detector
**Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.**
Based on the [MTH-IDS architecture](https://arxiv.org/abs/2105.13289) (340 citations, 99.99% accuracy).
## Architecture
### Tier 1: Signature-Based IDS (Multi-Class Classification)
- **Stacking Ensemble**: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner
- Detects 4 known attack types: **DoS, Fuzzy, RPM Spoofing, Gear Spoofing**
### Tier 2: Anomaly-Based IDS (Zero-Day Detection)
- **Isolation Forest** trained on normal traffic only
- Detects unknown/novel attacks not seen during training
### Combined Detection
- If Tier 1 classifies as Normal but Tier 2 flags as anomaly → **UNKNOWN_ATTACK** alert
- Catches zero-day attacks that evade signature-based detection
## Performance
| Metric | Tier 1 (Multi-Class) | Tier 2 (Anomaly) |
|--------|---------------------|-------------------|
| Accuracy | 0.9586 | 0.6103 |
| F1 (weighted) | 0.9584 | 0.7035 |
| Precision | 0.9597 | 0.9243 |
| Recall | 0.9586 | 0.5678 |
### Base Learner Validation Accuracies
| Model | Accuracy |
|-------|----------|
| Decision Tree | 0.9664 |
| Random Forest | 0.9690 |
| Extra Trees | 0.9690 |
| XGBoost | 0.9689 |
## Attack Types Detected
| Attack | Description | Detection |
|--------|-------------|-----------|
| **DoS** | Flood CAN bus with dominant ID (0x0000) every 0.3ms | Signature (Tier 1) |
| **Fuzzy** | Random CAN ID and data injection every 0.5ms | Signature (Tier 1) |
| **RPM Spoofing** | Inject fake RPM gauge values every 1ms | Signature (Tier 1) |
| **Gear Spoofing** | Inject fake drive gear values every 1ms | Signature (Tier 1) |
| **Unknown/Zero-Day** | Any novel attack pattern | Anomaly (Tier 2) |
## Usage
```python
import pickle
import pandas as pd
from inference import load_model, preprocess, predict
# Load model
model = load_model('vehicle_ids_model.pkl')
# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
df = pd.read_csv('can_traffic.csv')
# Preprocess and predict
X = preprocess(df, model)
results = predict(X, model)
# Results contain: attack_type, anomaly_score, is_anomaly, alert
# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
print(results['alert'].value_counts())
```
## Input Format
CAN bus message CSV with columns:
- `timestamp`: Recording time (seconds)
- `can_id`: CAN identifier in HEX (e.g., "043F")
- `dlc`: Data Length Code (0-8)
- `d0`-`d7`: Data bytes in HEX (e.g., "FF")
- `flag`: R (normal) or T (injected/attack)
## Feature Engineering
10 features extracted from raw CAN messages:
- CAN ID (decimal), DLC
- 8 data bytes (decimal)
- Statistical features: mean, std, min, max, range, sum of data bytes
- Temporal: inter-arrival time (IAT)
- Frequency: CAN ID frequency in traffic
- Information-theoretic: data byte entropy
## Training Details
- Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
- Train/Val/Test split: 70/15/15
- SMOTE for class imbalance handling
- Feature selection via Mutual Information
- Z-score normalization
## References
1. Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. [arXiv:2105.13289](https://arxiv.org/abs/2105.13289)
2. Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. [arXiv:1907.07377](https://arxiv.org/abs/1907.07377)
3. Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.