File size: 3,915 Bytes

b8730cc
 
aeb9c44
b8730cc
aeb9c44
 
 
 
 
 
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
 
b8730cc
aeb9c44
 
b8730cc
aeb9c44
 
 
b8730cc
aeb9c44
 
b8730cc
aeb9c44
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
 
 
 
 
 
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
 
 
 
b8730cc
 
aeb9c44
b8730cc
aeb9c44
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44
 
 
 
 
 
 
 
 
 
 
b8730cc
aeb9c44
b8730cc
aeb9c44

# Multimodal PC Fault Detection Using Audio-Visual Evidence Fusion

Two-branch architecture (ViT visual + AST audio) with late fusion for 5 PC fault classes.

## Fault Classes
| ID | Class | Audio Signal | Visual Signal |
|----|-------|-------------|---------------|
| 0 | `normal_operation` | Quiet fan hum | Clean desktop |
| 1 | `boot_failure` | BIOS beep codes | POST error screen |
| 2 | `overheating_fan` | Loud/grinding fan | Thermal warning UI |
| 3 | `storage_failure` | HDD clicking | SMART/CHKDSK errors |
| 4 | `system_crash` | Audio glitch/silence | BSOD |

## Quick Start

```bash
# Clone
git clone https://huggingface.co/Ellaft/multimodal-pc-fault-detector
cd multimodal-pc-fault-detector

# Install
pip install -r requirements.txt

# Train (downloads dataset automatically from Hub)
cd src
python train.py --quick_test --no_push

# Full training (15 epochs, ~1hr on A100)
python train.py --eval_robustness

# All 6 ablation experiments
python run_ablations.py --quick_test
```

## Dataset

**[Ellaft/pc-fault-real-dataset](https://huggingface.co/datasets/Ellaft/pc-fault-real-dataset)** — 1,500 audio-visual pairs, auto-downloaded when you run `train.py`.

| Source | Content |
|--------|---------|
| Real fan recordings | [HenriqueFrancaa/cooling-fans-db0](https://huggingface.co/datasets/HenriqueFrancaa/cooling-fans-db0) — normal vs abnormal PC cooling fans |
| Synthetic beep codes | 12 real AMI/Award/Phoenix BIOS beep patterns with timing jitter |
| Synthetic HDD clicks | Repetitive clicking, motor hum, head crash grinding |
| Synthetic crash audio | Noise bursts, buffer glitches, feedback loops, system hangs |
| Synthetic BSOD images | Windows 10/11/7/XP styles with real stop codes |
| Synthetic POST screens | BIOS vendor screens with real error messages |
| Synthetic thermal UIs | HWMonitor, BIOS warning, notification popup styles |
| Synthetic disk errors | SMART warnings, CHKDSK, CrystalDiskInfo displays |

To rebuild or extend the dataset (add YouTube scraping, etc.):
```bash
cd data
pip install -r requirements_data.txt
python build_dataset.py --max_per_class 500 --upload
```

## Architecture

```
Audio (WAV) ──→ AST (AudioSet) + LoRA ──→ [CLS] 768d ──→ audio_head ──→ L_audio
                                              │
                                              ├──→ concat ──→ fusion_classifier ──→ L_fusion
                                              │
Visual (JPG) ─→ ViT-B/16 (IN-21k) + LoRA ─→ [CLS] 768d ──→ visual_head ──→ L_visual
```

**Loss** = L_fusion + 1.5 × L_visual + 0.5 × L_audio

## Anti-Modality-Collapse

Three techniques prevent the visual branch from being ignored:

1. **Auxiliary unimodal heads** — force each branch to independently classify
2. **OGM-GE** ([Peng et al., CVPR 2022](https://arxiv.org/abs/2203.15332)) — suppress dominant modality gradients at each step
3. **Asymmetric learning rates** — visual branch gets 3× base LR, audio gets 0.5×

## Files

```
src/
  config.py          — All hyperparameters
  models.py          — ViT + AST + LateFusion + OGM-GE + auxiliary heads
  dataset_v2.py      — Loads from Ellaft/pc-fault-real-dataset
  train.py           — Training loop with OGM-GE
  run_ablations.py   — 6-experiment ablation runner

data/
  build_dataset.py   — Dataset builder (YouTube + HF + synthetic)
```

## CLI Options

```bash
python train.py --mode multimodal      # default
python train.py --mode visual_only     # unimodal ablation
python train.py --mode audio_only      # unimodal ablation
python train.py --finetune full --lr 2e-5  # full fine-tuning
python train.py --no_ogm              # disable OGM-GE
python train.py --ogm_alpha 0.5       # more aggressive modulation
python train.py --lambda_visual 2.0   # stronger visual auxiliary loss
python train.py --visual_lr_mult 5.0  # 5× LR for visual branch
```