Upload README.md with huggingface_hub

5a13aa5 verified about 1 month ago

3.8 kB

tags:
  - anomaly-detection
  - intrusion-detection
  - vehicle-security
  - automotive
  - CAN-bus
  - cybersecurity
  - sklearn
  - xgboost
  - tabular-classification
license: apache-2.0
library_name: sklearn
metrics:
  - accuracy
  - f1
  - precision
  - recall

Vehicle Intrusion Detection System (IDS) — Anomaly Detector

Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.

Based on the MTH-IDS architecture (340 citations, 99.99% accuracy).

Architecture

Tier 1: Signature-Based IDS (Multi-Class Classification)

Stacking Ensemble: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner
Detects 4 known attack types: DoS, Fuzzy, RPM Spoofing, Gear Spoofing

Tier 2: Anomaly-Based IDS (Zero-Day Detection)

Isolation Forest trained on normal traffic only
Detects unknown/novel attacks not seen during training

Combined Detection

If Tier 1 classifies as Normal but Tier 2 flags as anomaly → UNKNOWN_ATTACK alert
Catches zero-day attacks that evade signature-based detection

Performance

Metric	Tier 1 (Multi-Class)	Tier 2 (Anomaly)
Accuracy	0.9586	0.6103
F1 (weighted)	0.9584	0.7035
Precision	0.9597	0.9243
Recall	0.9586	0.5678

Base Learner Validation Accuracies

Model	Accuracy
Decision Tree	0.9664
Random Forest	0.9690
Extra Trees	0.9690
XGBoost	0.9689

Attack Types Detected

Attack	Description	Detection
DoS	Flood CAN bus with dominant ID (0x0000) every 0.3ms	Signature (Tier 1)
Fuzzy	Random CAN ID and data injection every 0.5ms	Signature (Tier 1)
RPM Spoofing	Inject fake RPM gauge values every 1ms	Signature (Tier 1)
Gear Spoofing	Inject fake drive gear values every 1ms	Signature (Tier 1)
Unknown/Zero-Day	Any novel attack pattern	Anomaly (Tier 2)

Usage

import pickle
import pandas as pd
from inference import load_model, preprocess, predict

# Load model
model = load_model('vehicle_ids_model.pkl')

# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
df = pd.read_csv('can_traffic.csv')

# Preprocess and predict
X = preprocess(df, model)
results = predict(X, model)

# Results contain: attack_type, anomaly_score, is_anomaly, alert
# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
print(results['alert'].value_counts())

Input Format

CAN bus message CSV with columns:

timestamp: Recording time (seconds)
can_id: CAN identifier in HEX (e.g., "043F")
dlc: Data Length Code (0-8)
d0-d7: Data bytes in HEX (e.g., "FF")
flag: R (normal) or T (injected/attack)

Feature Engineering

10 features extracted from raw CAN messages:

CAN ID (decimal), DLC
8 data bytes (decimal)
Statistical features: mean, std, min, max, range, sum of data bytes
Temporal: inter-arrival time (IAT)
Frequency: CAN ID frequency in traffic
Information-theoretic: data byte entropy

Training Details

Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
Train/Val/Test split: 70/15/15
SMOTE for class imbalance handling
Feature selection via Mutual Information
Z-score normalization

References

Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. arXiv:2105.13289
Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. arXiv:1907.07377
Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.