mic-id / docs /clip02-misclassification.md
connork
Align Space with latest Mic-ID release
b6c1b75

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

Clip 02 Misclassification Case Study

Issue Summary

  • Symptom: python predict.py data/iphone/*.wav classified data/iphone/clip_02.wav as “MacBook built-in microphone” (~53 %) instead of “Local iPhone recordings” (≈47 %).
  • Impact: Undermined trust in the classifier for quiet iPhone speech, indicating poor separation between the iPhone and AirPods/Mac classes.

Investigation

  • Confirmed the mismatch reproduced after the first training run with the new TAU batches.
  • Compared class distributions via train.py --dry-run; highlighted severe imbalance: TAU devices (≈295 clips each) vs. iPhone (15 wav + 47 m4a) vs. AirPods/Mac (15 wav + 14 m4a).
  • Noted identical feature extraction between training and inference (features.extract_features), driving suspicion toward data coverage rather than pipeline drift.

Actions Taken

  1. Data Organisation
    • Split the TAU Mobile archive into data/audio, data/audio2, and data/audio9 based on filename suffixes (-a/-b/-c).
    • Normalised provenance defaults in configs/base.yaml for the new device buckets.
  2. Metadata Refresh
    • Ran python3 scripts/refresh_metadata.py --config configs/base.yaml to register hashes and sources for all clips (including new iPhone/AirPods captures).
    • Repeated after each data ingest to keep data/metadata.csv consistent.
  3. Model Retraining
    • Executed python train.py to rebuild models/model.pkl and models/label_encoder.pkl with the expanded dataset (990 clips total).
  4. Inference UX Improvements
    • Allowed directory inputs in predict.py so python predict.py data/iphone expands automatically.
    • Updated the “laptop” friendly name to “AirPods Pro / MacBook built-in microphone” to reflect the mixed capture source.

Verification

  • Post-retrain prediction:
    File: data/iphone/clip_02.wav
    RMS loudness: -40.8 dBFS
      1. Local iPhone recordings — 96.1%
      2. AirPods Pro / MacBook built-in microphone — 3.9%
      3. Samsung Galaxy S7 (TAU device B) — 0.0%
    
  • The confidence inversion (≈96 % iPhone) confirms the classifier now separates the classes even for low-level speech content.

Feature Changes for Improved Results

  • configs/base.yaml: added TAU device folders to include_devices and defined CC-BY provenance defaults.
  • data/metadata.csv: regenerated with 990 entries to incorporate the new recordings (62 iPhone, 29 AirPods/Mac).
  • devices.py: renamed the “laptop” label to “AirPods Pro / MacBook built-in microphone” for accurate reporting.
  • predict.py: added directory expansion and broader audio-extension support to streamline batch evaluation.
  • Dataset restructuring: migrated TAU archive clips into data/audio, data/audio2, data/audio9 directories, preserving the -a/-b/-c microphone mapping.

Follow-Up Recommendations

  • Continue collecting parallel iPhone vs. AirPods recordings, especially in quiet environments, until class counts approach parity with TAU devices.
  • Maintain a held-out validation set (not yet captured) to quantify gains objectively beyond spot checks.
  • Document future ingestion runs by appending to this case study or a dedicated experiment log under docs/.