---
tags:
  - anomaly-detection
  - intrusion-detection
  - vehicle-security
  - automotive
  - CAN-bus
  - cybersecurity
  - sklearn
  - xgboost
  - tabular-classification
license: apache-2.0
library_name: sklearn
metrics:
  - accuracy
  - f1
  - precision
  - recall
---

# Vehicle Intrusion Detection System (IDS) — Anomaly Detector

**Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.**

Based on the [MTH-IDS architecture](https://arxiv.org/abs/2105.13289) (340 citations, 99.99% accuracy).

## Architecture

### Tier 1: Signature-Based IDS (Multi-Class Classification)
- **Stacking Ensemble**: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner
- Detects 4 known attack types: **DoS, Fuzzy, RPM Spoofing, Gear Spoofing**

### Tier 2: Anomaly-Based IDS (Zero-Day Detection)
- **Isolation Forest** trained on normal traffic only
- Detects unknown/novel attacks not seen during training

### Combined Detection
- If Tier 1 classifies as Normal but Tier 2 flags as anomaly → **UNKNOWN_ATTACK** alert
- Catches zero-day attacks that evade signature-based detection

## Performance

| Metric | Tier 1 (Multi-Class) | Tier 2 (Anomaly) |
|--------|---------------------|-------------------|
| Accuracy | 0.9586 | 0.6103 |
| F1 (weighted) | 0.9584 | 0.7035 |
| Precision | 0.9597 | 0.9243 |
| Recall | 0.9586 | 0.5678 |

### Base Learner Validation Accuracies
| Model | Accuracy |
|-------|----------|
| Decision Tree | 0.9664 |
| Random Forest | 0.9690 |
| Extra Trees | 0.9690 |
| XGBoost | 0.9689 |

## Attack Types Detected

| Attack | Description | Detection |
|--------|-------------|-----------|
| **DoS** | Flood CAN bus with dominant ID (0x0000) every 0.3ms | Signature (Tier 1) |
| **Fuzzy** | Random CAN ID and data injection every 0.5ms | Signature (Tier 1) |
| **RPM Spoofing** | Inject fake RPM gauge values every 1ms | Signature (Tier 1) |
| **Gear Spoofing** | Inject fake drive gear values every 1ms | Signature (Tier 1) |
| **Unknown/Zero-Day** | Any novel attack pattern | Anomaly (Tier 2) |

## Usage

```python
import pickle
import pandas as pd
from inference import load_model, preprocess, predict

# Load model
model = load_model('vehicle_ids_model.pkl')

# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
df = pd.read_csv('can_traffic.csv')

# Preprocess and predict
X = preprocess(df, model)
results = predict(X, model)

# Results contain: attack_type, anomaly_score, is_anomaly, alert
# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
print(results['alert'].value_counts())
```

## Input Format

CAN bus message CSV with columns:
- `timestamp`: Recording time (seconds)
- `can_id`: CAN identifier in HEX (e.g., "043F")
- `dlc`: Data Length Code (0-8)
- `d0`-`d7`: Data bytes in HEX (e.g., "FF")
- `flag`: R (normal) or T (injected/attack)

## Feature Engineering

10 features extracted from raw CAN messages:
- CAN ID (decimal), DLC
- 8 data bytes (decimal)
- Statistical features: mean, std, min, max, range, sum of data bytes
- Temporal: inter-arrival time (IAT)
- Frequency: CAN ID frequency in traffic
- Information-theoretic: data byte entropy

## Training Details

- Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
- Train/Val/Test split: 70/15/15
- SMOTE for class imbalance handling
- Feature selection via Mutual Information
- Z-score normalization

## References

1. Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. [arXiv:2105.13289](https://arxiv.org/abs/2105.13289)
2. Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. [arXiv:1907.07377](https://arxiv.org/abs/1907.07377)
3. Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.