# Multimodal PC Fault Detection Using Audio-Visual Evidence Fusion

Two-branch architecture (ViT visual + AST audio) with late fusion for 5 PC fault classes.

## Fault Classes
| ID | Class | Audio Signal | Visual Signal |
|----|-------|-------------|---------------|
| 0 | `normal_operation` | Quiet fan hum | Clean desktop |
| 1 | `boot_failure` | BIOS beep codes | POST error screen |
| 2 | `overheating_fan` | Loud/grinding fan | Thermal warning UI |
| 3 | `storage_failure` | HDD clicking | SMART/CHKDSK errors |
| 4 | `system_crash` | Audio glitch/silence | BSOD |

## Quick Start

```bash
# Clone
git clone https://huggingface.co/Ellaft/multimodal-pc-fault-detector
cd multimodal-pc-fault-detector

# Install
pip install -r requirements.txt

# Train (downloads dataset automatically from Hub)
cd src
python train.py --quick_test --no_push

# Full training (15 epochs, ~1hr on A100)
python train.py --eval_robustness

# All 6 ablation experiments
python run_ablations.py --quick_test
```

## Dataset

**[Ellaft/pc-fault-real-dataset](https://huggingface.co/datasets/Ellaft/pc-fault-real-dataset)** — 1,500 audio-visual pairs, auto-downloaded when you run `train.py`.

| Source | Content |
|--------|---------|
| Real fan recordings | [HenriqueFrancaa/cooling-fans-db0](https://huggingface.co/datasets/HenriqueFrancaa/cooling-fans-db0) — normal vs abnormal PC cooling fans |
| Synthetic beep codes | 12 real AMI/Award/Phoenix BIOS beep patterns with timing jitter |
| Synthetic HDD clicks | Repetitive clicking, motor hum, head crash grinding |
| Synthetic crash audio | Noise bursts, buffer glitches, feedback loops, system hangs |
| Synthetic BSOD images | Windows 10/11/7/XP styles with real stop codes |
| Synthetic POST screens | BIOS vendor screens with real error messages |
| Synthetic thermal UIs | HWMonitor, BIOS warning, notification popup styles |
| Synthetic disk errors | SMART warnings, CHKDSK, CrystalDiskInfo displays |

To rebuild or extend the dataset (add YouTube scraping, etc.):
```bash
cd data
pip install -r requirements_data.txt
python build_dataset.py --max_per_class 500 --upload
```

## Architecture

```
Audio (WAV) ──→ AST (AudioSet) + LoRA ──→ [CLS] 768d ──→ audio_head ──→ L_audio
                                              │
                                              ├──→ concat ──→ fusion_classifier ──→ L_fusion
                                              │
Visual (JPG) ─→ ViT-B/16 (IN-21k) + LoRA ─→ [CLS] 768d ──→ visual_head ──→ L_visual
```

**Loss** = L_fusion + 1.5 × L_visual + 0.5 × L_audio

## Anti-Modality-Collapse

Three techniques prevent the visual branch from being ignored:

1. **Auxiliary unimodal heads** — force each branch to independently classify
2. **OGM-GE** ([Peng et al., CVPR 2022](https://arxiv.org/abs/2203.15332)) — suppress dominant modality gradients at each step
3. **Asymmetric learning rates** — visual branch gets 3× base LR, audio gets 0.5×

## Files

```
src/
  config.py          — All hyperparameters
  models.py          — ViT + AST + LateFusion + OGM-GE + auxiliary heads
  dataset_v2.py      — Loads from Ellaft/pc-fault-real-dataset
  train.py           — Training loop with OGM-GE
  run_ablations.py   — 6-experiment ablation runner

data/
  build_dataset.py   — Dataset builder (YouTube + HF + synthetic)
```

## CLI Options

```bash
python train.py --mode multimodal      # default
python train.py --mode visual_only     # unimodal ablation
python train.py --mode audio_only      # unimodal ablation
python train.py --finetune full --lr 2e-5  # full fine-tuning
python train.py --no_ogm              # disable OGM-GE
python train.py --ogm_alpha 0.5       # more aggressive modulation
python train.py --lambda_visual 2.0   # stronger visual auxiliary loss
python train.py --visual_lr_mult 5.0  # 5× LR for visual branch
```