Tabular Classification
Scikit-learn
anomaly-detection
intrusion-detection
vehicle-security
automotive
CAN-bus
cybersecurity
xgboost
Instructions to use anddali/vehicle-ids-anomaly-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use anddali/vehicle-ids-anomaly-detector with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("anddali/vehicle-ids-anomaly-detector", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
metadata
tags:
- anomaly-detection
- intrusion-detection
- vehicle-security
- automotive
- CAN-bus
- cybersecurity
- sklearn
- xgboost
- tabular-classification
license: apache-2.0
library_name: sklearn
metrics:
- accuracy
- f1
- precision
- recall
Vehicle Intrusion Detection System (IDS) — Anomaly Detector
Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.
Based on the MTH-IDS architecture (340 citations, 99.99% accuracy).
Architecture
Tier 1: Signature-Based IDS (Multi-Class Classification)
- Stacking Ensemble: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner
- Detects 4 known attack types: DoS, Fuzzy, RPM Spoofing, Gear Spoofing
Tier 2: Anomaly-Based IDS (Zero-Day Detection)
- Isolation Forest trained on normal traffic only
- Detects unknown/novel attacks not seen during training
Combined Detection
- If Tier 1 classifies as Normal but Tier 2 flags as anomaly → UNKNOWN_ATTACK alert
- Catches zero-day attacks that evade signature-based detection
Performance
| Metric | Tier 1 (Multi-Class) | Tier 2 (Anomaly) |
|---|---|---|
| Accuracy | 0.9586 | 0.6103 |
| F1 (weighted) | 0.9584 | 0.7035 |
| Precision | 0.9597 | 0.9243 |
| Recall | 0.9586 | 0.5678 |
Base Learner Validation Accuracies
| Model | Accuracy |
|---|---|
| Decision Tree | 0.9664 |
| Random Forest | 0.9690 |
| Extra Trees | 0.9690 |
| XGBoost | 0.9689 |
Attack Types Detected
| Attack | Description | Detection |
|---|---|---|
| DoS | Flood CAN bus with dominant ID (0x0000) every 0.3ms | Signature (Tier 1) |
| Fuzzy | Random CAN ID and data injection every 0.5ms | Signature (Tier 1) |
| RPM Spoofing | Inject fake RPM gauge values every 1ms | Signature (Tier 1) |
| Gear Spoofing | Inject fake drive gear values every 1ms | Signature (Tier 1) |
| Unknown/Zero-Day | Any novel attack pattern | Anomaly (Tier 2) |
Usage
import pickle
import pandas as pd
from inference import load_model, preprocess, predict
# Load model
model = load_model('vehicle_ids_model.pkl')
# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
df = pd.read_csv('can_traffic.csv')
# Preprocess and predict
X = preprocess(df, model)
results = predict(X, model)
# Results contain: attack_type, anomaly_score, is_anomaly, alert
# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
print(results['alert'].value_counts())
Input Format
CAN bus message CSV with columns:
timestamp: Recording time (seconds)can_id: CAN identifier in HEX (e.g., "043F")dlc: Data Length Code (0-8)d0-d7: Data bytes in HEX (e.g., "FF")flag: R (normal) or T (injected/attack)
Feature Engineering
10 features extracted from raw CAN messages:
- CAN ID (decimal), DLC
- 8 data bytes (decimal)
- Statistical features: mean, std, min, max, range, sum of data bytes
- Temporal: inter-arrival time (IAT)
- Frequency: CAN ID frequency in traffic
- Information-theoretic: data byte entropy
Training Details
- Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
- Train/Val/Test split: 70/15/15
- SMOTE for class imbalance handling
- Feature selection via Mutual Information
- Z-score normalization
References
- Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. arXiv:2105.13289
- Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. arXiv:1907.07377
- Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.