Vehicle Intrusion Detection System (IDS) β€” Anomaly Detector

Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.

Based on the MTH-IDS architecture (340 citations, 99.99% accuracy).

Architecture

Tier 1: Signature-Based IDS (Multi-Class Classification)

  • Stacking Ensemble: XGBoost + RandomForest + ExtraTrees + DecisionTree β†’ Logistic Regression meta-learner
  • Detects 4 known attack types: DoS, Fuzzy, RPM Spoofing, Gear Spoofing

Tier 2: Anomaly-Based IDS (Zero-Day Detection)

  • Isolation Forest trained on normal traffic only
  • Detects unknown/novel attacks not seen during training

Combined Detection

  • If Tier 1 classifies as Normal but Tier 2 flags as anomaly β†’ UNKNOWN_ATTACK alert
  • Catches zero-day attacks that evade signature-based detection

Performance

Metric Tier 1 (Multi-Class) Tier 2 (Anomaly)
Accuracy 0.9586 0.6103
F1 (weighted) 0.9584 0.7035
Precision 0.9597 0.9243
Recall 0.9586 0.5678

Base Learner Validation Accuracies

Model Accuracy
Decision Tree 0.9664
Random Forest 0.9690
Extra Trees 0.9690
XGBoost 0.9689

Attack Types Detected

Attack Description Detection
DoS Flood CAN bus with dominant ID (0x0000) every 0.3ms Signature (Tier 1)
Fuzzy Random CAN ID and data injection every 0.5ms Signature (Tier 1)
RPM Spoofing Inject fake RPM gauge values every 1ms Signature (Tier 1)
Gear Spoofing Inject fake drive gear values every 1ms Signature (Tier 1)
Unknown/Zero-Day Any novel attack pattern Anomaly (Tier 2)

Usage

import pickle
import pandas as pd
from inference import load_model, preprocess, predict

# Load model
model = load_model('vehicle_ids_model.pkl')

# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
df = pd.read_csv('can_traffic.csv')

# Preprocess and predict
X = preprocess(df, model)
results = predict(X, model)

# Results contain: attack_type, anomaly_score, is_anomaly, alert
# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
print(results['alert'].value_counts())

Input Format

CAN bus message CSV with columns:

  • timestamp: Recording time (seconds)
  • can_id: CAN identifier in HEX (e.g., "043F")
  • dlc: Data Length Code (0-8)
  • d0-d7: Data bytes in HEX (e.g., "FF")
  • flag: R (normal) or T (injected/attack)

Feature Engineering

10 features extracted from raw CAN messages:

  • CAN ID (decimal), DLC
  • 8 data bytes (decimal)
  • Statistical features: mean, std, min, max, range, sum of data bytes
  • Temporal: inter-arrival time (IAT)
  • Frequency: CAN ID frequency in traffic
  • Information-theoretic: data byte entropy

Training Details

  • Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
  • Train/Val/Test split: 70/15/15
  • SMOTE for class imbalance handling
  • Feature selection via Mutual Information
  • Z-score normalization

References

  1. Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. arXiv:2105.13289
  2. Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. arXiv:1907.07377
  3. Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for anddali/vehicle-ids-anomaly-detector