Update README: clean clone-and-run instructions

aeb9c44 verified 12 days ago

3.92 kB

	# Multimodal PC Fault Detection Using Audio-Visual Evidence Fusion

	Two-branch architecture (ViT visual + AST audio) with late fusion for 5 PC fault classes.

	## Fault Classes
	\| ID \| Class \| Audio Signal \| Visual Signal \|
	\|----\|-------\|-------------\|---------------\|
	\| 0 \| `normal_operation` \| Quiet fan hum \| Clean desktop \|
	\| 1 \| `boot_failure` \| BIOS beep codes \| POST error screen \|
	\| 2 \| `overheating_fan` \| Loud/grinding fan \| Thermal warning UI \|
	\| 3 \| `storage_failure` \| HDD clicking \| SMART/CHKDSK errors \|
	\| 4 \| `system_crash` \| Audio glitch/silence \| BSOD \|

	## Quick Start

	```bash
	# Clone
	git clone https://huggingface.co/Ellaft/multimodal-pc-fault-detector
	cd multimodal-pc-fault-detector

	# Install
	pip install -r requirements.txt

	# Train (downloads dataset automatically from Hub)
	cd src
	python train.py --quick_test --no_push

	# Full training (15 epochs, ~1hr on A100)
	python train.py --eval_robustness

	# All 6 ablation experiments
	python run_ablations.py --quick_test
	```

	## Dataset

	[Ellaft/pc-fault-real-dataset](https://huggingface.co/datasets/Ellaft/pc-fault-real-dataset) — 1,500 audio-visual pairs, auto-downloaded when you run `train.py`.

	\| Source \| Content \|
	\|--------\|---------\|
	\| Real fan recordings \| [HenriqueFrancaa/cooling-fans-db0](https://huggingface.co/datasets/HenriqueFrancaa/cooling-fans-db0) — normal vs abnormal PC cooling fans \|
	\| Synthetic beep codes \| 12 real AMI/Award/Phoenix BIOS beep patterns with timing jitter \|
	\| Synthetic HDD clicks \| Repetitive clicking, motor hum, head crash grinding \|
	\| Synthetic crash audio \| Noise bursts, buffer glitches, feedback loops, system hangs \|
	\| Synthetic BSOD images \| Windows 10/11/7/XP styles with real stop codes \|
	\| Synthetic POST screens \| BIOS vendor screens with real error messages \|
	\| Synthetic thermal UIs \| HWMonitor, BIOS warning, notification popup styles \|
	\| Synthetic disk errors \| SMART warnings, CHKDSK, CrystalDiskInfo displays \|

	To rebuild or extend the dataset (add YouTube scraping, etc.):
	```bash
	cd data
	pip install -r requirements_data.txt
	python build_dataset.py --max_per_class 500 --upload
	```

	## Architecture

	```
	Audio (WAV) ──→ AST (AudioSet) + LoRA ──→ [CLS] 768d ──→ audio_head ──→ L_audio
	│
	├──→ concat ──→ fusion_classifier ──→ L_fusion
	│
	Visual (JPG) ─→ ViT-B/16 (IN-21k) + LoRA ─→ [CLS] 768d ──→ visual_head ──→ L_visual
	```

	Loss = L_fusion + 1.5 × L_visual + 0.5 × L_audio

	## Anti-Modality-Collapse

	Three techniques prevent the visual branch from being ignored:

	1. Auxiliary unimodal heads — force each branch to independently classify
	2. OGM-GE ([Peng et al., CVPR 2022](https://arxiv.org/abs/2203.15332)) — suppress dominant modality gradients at each step
	3. Asymmetric learning rates — visual branch gets 3× base LR, audio gets 0.5×

	## Files

	```
	src/
	config.py — All hyperparameters
	models.py — ViT + AST + LateFusion + OGM-GE + auxiliary heads
	dataset_v2.py — Loads from Ellaft/pc-fault-real-dataset
	train.py — Training loop with OGM-GE
	run_ablations.py — 6-experiment ablation runner

	data/
	build_dataset.py — Dataset builder (YouTube + HF + synthetic)
	```

	## CLI Options

	```bash
	python train.py --mode multimodal # default
	python train.py --mode visual_only # unimodal ablation
	python train.py --mode audio_only # unimodal ablation
	python train.py --finetune full --lr 2e-5 # full fine-tuning
	python train.py --no_ogm # disable OGM-GE
	python train.py --ogm_alpha 0.5 # more aggressive modulation
	python train.py --lambda_visual 2.0 # stronger visual auxiliary loss
	python train.py --visual_lr_mult 5.0 # 5× LR for visual branch
	```