--- tags: - anomaly-detection - intrusion-detection - vehicle-security - automotive - CAN-bus - cybersecurity - sklearn - xgboost - tabular-classification license: apache-2.0 library_name: sklearn metrics: - accuracy - f1 - precision - recall --- # Vehicle Intrusion Detection System (IDS) — Anomaly Detector **Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.** Based on the [MTH-IDS architecture](https://arxiv.org/abs/2105.13289) (340 citations, 99.99% accuracy). ## Architecture ### Tier 1: Signature-Based IDS (Multi-Class Classification) - **Stacking Ensemble**: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner - Detects 4 known attack types: **DoS, Fuzzy, RPM Spoofing, Gear Spoofing** ### Tier 2: Anomaly-Based IDS (Zero-Day Detection) - **Isolation Forest** trained on normal traffic only - Detects unknown/novel attacks not seen during training ### Combined Detection - If Tier 1 classifies as Normal but Tier 2 flags as anomaly → **UNKNOWN_ATTACK** alert - Catches zero-day attacks that evade signature-based detection ## Performance | Metric | Tier 1 (Multi-Class) | Tier 2 (Anomaly) | |--------|---------------------|-------------------| | Accuracy | 0.9586 | 0.6103 | | F1 (weighted) | 0.9584 | 0.7035 | | Precision | 0.9597 | 0.9243 | | Recall | 0.9586 | 0.5678 | ### Base Learner Validation Accuracies | Model | Accuracy | |-------|----------| | Decision Tree | 0.9664 | | Random Forest | 0.9690 | | Extra Trees | 0.9690 | | XGBoost | 0.9689 | ## Attack Types Detected | Attack | Description | Detection | |--------|-------------|-----------| | **DoS** | Flood CAN bus with dominant ID (0x0000) every 0.3ms | Signature (Tier 1) | | **Fuzzy** | Random CAN ID and data injection every 0.5ms | Signature (Tier 1) | | **RPM Spoofing** | Inject fake RPM gauge values every 1ms | Signature (Tier 1) | | **Gear Spoofing** | Inject fake drive gear values every 1ms | Signature (Tier 1) | | **Unknown/Zero-Day** | Any novel attack pattern | Anomaly (Tier 2) | ## Usage ```python import pickle import pandas as pd from inference import load_model, preprocess, predict # Load model model = load_model('vehicle_ids_model.pkl') # Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag) df = pd.read_csv('can_traffic.csv') # Preprocess and predict X = preprocess(df, model) results = predict(X, model) # Results contain: attack_type, anomaly_score, is_anomaly, alert # alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK print(results['alert'].value_counts()) ``` ## Input Format CAN bus message CSV with columns: - `timestamp`: Recording time (seconds) - `can_id`: CAN identifier in HEX (e.g., "043F") - `dlc`: Data Length Code (0-8) - `d0`-`d7`: Data bytes in HEX (e.g., "FF") - `flag`: R (normal) or T (injected/attack) ## Feature Engineering 10 features extracted from raw CAN messages: - CAN ID (decimal), DLC - 8 data bytes (decimal) - Statistical features: mean, std, min, max, range, sum of data bytes - Temporal: inter-arrival time (IAT) - Frequency: CAN ID frequency in traffic - Information-theoretic: data byte entropy ## Training Details - Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format) - Train/Val/Test split: 70/15/15 - SMOTE for class imbalance handling - Feature selection via Mutual Information - Z-score normalization ## References 1. Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. [arXiv:2105.13289](https://arxiv.org/abs/2105.13289) 2. Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. [arXiv:1907.07377](https://arxiv.org/abs/1907.07377) 3. Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.