| # Multimodal PC Fault Detection Using Audio-Visual Evidence Fusion |
|
|
| Two-branch architecture (ViT visual + AST audio) with late fusion for 5 PC fault classes. |
|
|
| ## Fault Classes |
| | ID | Class | Audio Signal | Visual Signal | |
| |----|-------|-------------|---------------| |
| | 0 | `normal_operation` | Quiet fan hum | Clean desktop | |
| | 1 | `boot_failure` | BIOS beep codes | POST error screen | |
| | 2 | `overheating_fan` | Loud/grinding fan | Thermal warning UI | |
| | 3 | `storage_failure` | HDD clicking | SMART/CHKDSK errors | |
| | 4 | `system_crash` | Audio glitch/silence | BSOD | |
|
|
| ## Quick Start |
|
|
| ```bash |
| # Clone |
| git clone https://huggingface.co/Ellaft/multimodal-pc-fault-detector |
| cd multimodal-pc-fault-detector |
| |
| # Install |
| pip install -r requirements.txt |
| |
| # Train (downloads dataset automatically from Hub) |
| cd src |
| python train.py --quick_test --no_push |
| |
| # Full training (15 epochs, ~1hr on A100) |
| python train.py --eval_robustness |
| |
| # All 6 ablation experiments |
| python run_ablations.py --quick_test |
| ``` |
|
|
| ## Dataset |
|
|
| **[Ellaft/pc-fault-real-dataset](https://huggingface.co/datasets/Ellaft/pc-fault-real-dataset)** β 1,500 audio-visual pairs, auto-downloaded when you run `train.py`. |
|
|
| | Source | Content | |
| |--------|---------| |
| | Real fan recordings | [HenriqueFrancaa/cooling-fans-db0](https://huggingface.co/datasets/HenriqueFrancaa/cooling-fans-db0) β normal vs abnormal PC cooling fans | |
| | Synthetic beep codes | 12 real AMI/Award/Phoenix BIOS beep patterns with timing jitter | |
| | Synthetic HDD clicks | Repetitive clicking, motor hum, head crash grinding | |
| | Synthetic crash audio | Noise bursts, buffer glitches, feedback loops, system hangs | |
| | Synthetic BSOD images | Windows 10/11/7/XP styles with real stop codes | |
| | Synthetic POST screens | BIOS vendor screens with real error messages | |
| | Synthetic thermal UIs | HWMonitor, BIOS warning, notification popup styles | |
| | Synthetic disk errors | SMART warnings, CHKDSK, CrystalDiskInfo displays | |
|
|
| To rebuild or extend the dataset (add YouTube scraping, etc.): |
| ```bash |
| cd data |
| pip install -r requirements_data.txt |
| python build_dataset.py --max_per_class 500 --upload |
| ``` |
|
|
| ## Architecture |
|
|
| ``` |
| Audio (WAV) βββ AST (AudioSet) + LoRA βββ [CLS] 768d βββ audio_head βββ L_audio |
| β |
| ββββ concat βββ fusion_classifier βββ L_fusion |
| β |
| Visual (JPG) ββ ViT-B/16 (IN-21k) + LoRA ββ [CLS] 768d βββ visual_head βββ L_visual |
| ``` |
|
|
| **Loss** = L_fusion + 1.5 Γ L_visual + 0.5 Γ L_audio |
| |
| ## Anti-Modality-Collapse |
| |
| Three techniques prevent the visual branch from being ignored: |
| |
| 1. **Auxiliary unimodal heads** β force each branch to independently classify |
| 2. **OGM-GE** ([Peng et al., CVPR 2022](https://arxiv.org/abs/2203.15332)) β suppress dominant modality gradients at each step |
| 3. **Asymmetric learning rates** β visual branch gets 3Γ base LR, audio gets 0.5Γ |
| |
| ## Files |
| |
| ``` |
| src/ |
| config.py β All hyperparameters |
| models.py β ViT + AST + LateFusion + OGM-GE + auxiliary heads |
| dataset_v2.py β Loads from Ellaft/pc-fault-real-dataset |
| train.py β Training loop with OGM-GE |
| run_ablations.py β 6-experiment ablation runner |
| |
| data/ |
| build_dataset.py β Dataset builder (YouTube + HF + synthetic) |
| ``` |
| |
| ## CLI Options |
| |
| ```bash |
| python train.py --mode multimodal # default |
| python train.py --mode visual_only # unimodal ablation |
| python train.py --mode audio_only # unimodal ablation |
| python train.py --finetune full --lr 2e-5 # full fine-tuning |
| python train.py --no_ogm # disable OGM-GE |
| python train.py --ogm_alpha 0.5 # more aggressive modulation |
| python train.py --lambda_visual 2.0 # stronger visual auxiliary loss |
| python train.py --visual_lr_mult 5.0 # 5Γ LR for visual branch |
| ``` |
| |