Upload README.md with huggingface_hub

5a13aa5 verified about 1 month ago

3.8 kB

	---
	tags:
	- anomaly-detection
	- intrusion-detection
	- vehicle-security
	- automotive
	- CAN-bus
	- cybersecurity
	- sklearn
	- xgboost
	- tabular-classification
	license: apache-2.0
	library_name: sklearn
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Vehicle Intrusion Detection System (IDS) — Anomaly Detector

	Multi-Tiered Hybrid IDS for detecting hacking attempts in vehicle CAN bus telecom data.

	Based on the [MTH-IDS architecture](https://arxiv.org/abs/2105.13289) (340 citations, 99.99% accuracy).

	## Architecture

	### Tier 1: Signature-Based IDS (Multi-Class Classification)
	- Stacking Ensemble: XGBoost + RandomForest + ExtraTrees + DecisionTree → Logistic Regression meta-learner
	- Detects 4 known attack types: DoS, Fuzzy, RPM Spoofing, Gear Spoofing

	### Tier 2: Anomaly-Based IDS (Zero-Day Detection)
	- Isolation Forest trained on normal traffic only
	- Detects unknown/novel attacks not seen during training

	### Combined Detection
	- If Tier 1 classifies as Normal but Tier 2 flags as anomaly → UNKNOWN_ATTACK alert
	- Catches zero-day attacks that evade signature-based detection

	## Performance

	\| Metric \| Tier 1 (Multi-Class) \| Tier 2 (Anomaly) \|
	\|--------\|---------------------\|-------------------\|
	\| Accuracy \| 0.9586 \| 0.6103 \|
	\| F1 (weighted) \| 0.9584 \| 0.7035 \|
	\| Precision \| 0.9597 \| 0.9243 \|
	\| Recall \| 0.9586 \| 0.5678 \|

	### Base Learner Validation Accuracies
	\| Model \| Accuracy \|
	\|-------\|----------\|
	\| Decision Tree \| 0.9664 \|
	\| Random Forest \| 0.9690 \|
	\| Extra Trees \| 0.9690 \|
	\| XGBoost \| 0.9689 \|

	## Attack Types Detected

	\| Attack \| Description \| Detection \|
	\|--------\|-------------\|-----------\|
	\| DoS \| Flood CAN bus with dominant ID (0x0000) every 0.3ms \| Signature (Tier 1) \|
	\| Fuzzy \| Random CAN ID and data injection every 0.5ms \| Signature (Tier 1) \|
	\| RPM Spoofing \| Inject fake RPM gauge values every 1ms \| Signature (Tier 1) \|
	\| Gear Spoofing \| Inject fake drive gear values every 1ms \| Signature (Tier 1) \|
	\| Unknown/Zero-Day \| Any novel attack pattern \| Anomaly (Tier 2) \|

	## Usage

	```python
	import pickle
	import pandas as pd
	from inference import load_model, preprocess, predict

	# Load model
	model = load_model('vehicle_ids_model.pkl')

	# Load CAN bus data (CSV format: timestamp, can_id, dlc, d0-d7, flag)
	df = pd.read_csv('can_traffic.csv')

	# Preprocess and predict
	X = preprocess(df, model)
	results = predict(X, model)

	# Results contain: attack_type, anomaly_score, is_anomaly, alert
	# alert values: NORMAL, KNOWN_ATTACK, UNKNOWN_ATTACK
	print(results['alert'].value_counts())
	```

	## Input Format

	CAN bus message CSV with columns:
	- `timestamp`: Recording time (seconds)
	- `can_id`: CAN identifier in HEX (e.g., "043F")
	- `dlc`: Data Length Code (0-8)
	- `d0`-`d7`: Data bytes in HEX (e.g., "FF")
	- `flag`: R (normal) or T (injected/attack)

	## Feature Engineering

	10 features extracted from raw CAN messages:
	- CAN ID (decimal), DLC
	- 8 data bytes (decimal)
	- Statistical features: mean, std, min, max, range, sum of data bytes
	- Temporal: inter-arrival time (IAT)
	- Frequency: CAN ID frequency in traffic
	- Information-theoretic: data byte entropy

	## Training Details

	- Dataset: ~1,077,264 CAN bus messages (HCRL Car-Hacking format)
	- Train/Val/Test split: 70/15/15
	- SMOTE for class imbalance handling
	- Feature selection via Mutual Information
	- Z-score normalization

	## References

	1. Li et al., "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for IoV", IEEE IoT Journal, 2021. [arXiv:2105.13289](https://arxiv.org/abs/2105.13289)
	2. Seo et al., "GIDS: GAN based IDS for In-Vehicle Network", PST 2018. [arXiv:1907.07377](https://arxiv.org/abs/1907.07377)
	3. Song et al., "In-vehicle network intrusion detection using deep CNN", Vehicular Communications, 2020.