YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Multimodal PC Fault Detection Using Audio-Visual Evidence Fusion

Two-branch architecture (ViT visual + AST audio) with late fusion for 5 PC fault classes.

Fault Classes

ID Class Audio Signal Visual Signal
0 normal_operation Quiet fan hum Clean desktop
1 boot_failure BIOS beep codes POST error screen
2 overheating_fan Loud/grinding fan Thermal warning UI
3 storage_failure HDD clicking SMART/CHKDSK errors
4 system_crash Audio glitch/silence BSOD

Quick Start

# Clone
git clone https://huggingface.co/Ellaft/multimodal-pc-fault-detector
cd multimodal-pc-fault-detector

# Install
pip install -r requirements.txt

# Train (downloads dataset automatically from Hub)
cd src
python train.py --quick_test --no_push

# Full training (15 epochs, ~1hr on A100)
python train.py --eval_robustness

# All 6 ablation experiments
python run_ablations.py --quick_test

Dataset

Ellaft/pc-fault-real-dataset β€” 1,500 audio-visual pairs, auto-downloaded when you run train.py.

Source Content
Real fan recordings HenriqueFrancaa/cooling-fans-db0 β€” normal vs abnormal PC cooling fans
Synthetic beep codes 12 real AMI/Award/Phoenix BIOS beep patterns with timing jitter
Synthetic HDD clicks Repetitive clicking, motor hum, head crash grinding
Synthetic crash audio Noise bursts, buffer glitches, feedback loops, system hangs
Synthetic BSOD images Windows 10/11/7/XP styles with real stop codes
Synthetic POST screens BIOS vendor screens with real error messages
Synthetic thermal UIs HWMonitor, BIOS warning, notification popup styles
Synthetic disk errors SMART warnings, CHKDSK, CrystalDiskInfo displays

To rebuild or extend the dataset (add YouTube scraping, etc.):

cd data
pip install -r requirements_data.txt
python build_dataset.py --max_per_class 500 --upload

Architecture

Audio (WAV) ──→ AST (AudioSet) + LoRA ──→ [CLS] 768d ──→ audio_head ──→ L_audio
                                              β”‚
                                              β”œβ”€β”€β†’ concat ──→ fusion_classifier ──→ L_fusion
                                              β”‚
Visual (JPG) ─→ ViT-B/16 (IN-21k) + LoRA ─→ [CLS] 768d ──→ visual_head ──→ L_visual

Loss = L_fusion + 1.5 Γ— L_visual + 0.5 Γ— L_audio

Anti-Modality-Collapse

Three techniques prevent the visual branch from being ignored:

  1. Auxiliary unimodal heads β€” force each branch to independently classify
  2. OGM-GE (Peng et al., CVPR 2022) β€” suppress dominant modality gradients at each step
  3. Asymmetric learning rates β€” visual branch gets 3Γ— base LR, audio gets 0.5Γ—

Files

src/
  config.py          β€” All hyperparameters
  models.py          β€” ViT + AST + LateFusion + OGM-GE + auxiliary heads
  dataset_v2.py      β€” Loads from Ellaft/pc-fault-real-dataset
  train.py           β€” Training loop with OGM-GE
  run_ablations.py   β€” 6-experiment ablation runner

data/
  build_dataset.py   β€” Dataset builder (YouTube + HF + synthetic)

CLI Options

python train.py --mode multimodal      # default
python train.py --mode visual_only     # unimodal ablation
python train.py --mode audio_only      # unimodal ablation
python train.py --finetune full --lr 2e-5  # full fine-tuning
python train.py --no_ogm              # disable OGM-GE
python train.py --ogm_alpha 0.5       # more aggressive modulation
python train.py --lambda_visual 2.0   # stronger visual auxiliary loss
python train.py --visual_lr_mult 5.0  # 5Γ— LR for visual branch
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for Ellaft/multimodal-pc-fault-detector