mic-id / docs /clip02-misclassification.md
connork
Align Space with latest Mic-ID release
b6c1b75
# Clip 02 Misclassification Case Study
## Issue Summary
- **Symptom**: `python predict.py data/iphone/*.wav` classified `data/iphone/clip_02.wav` as “MacBook built-in microphone” (~53 %) instead of “Local iPhone recordings” (≈47 %).
- **Impact**: Undermined trust in the classifier for quiet iPhone speech, indicating poor separation between the iPhone and AirPods/Mac classes.
## Investigation
- Confirmed the mismatch reproduced after the first training run with the new TAU batches.
- Compared class distributions via `train.py --dry-run`; highlighted severe imbalance: TAU devices (≈295 clips each) vs. iPhone (15 wav + 47 m4a) vs. AirPods/Mac (15 wav + 14 m4a).
- Noted identical feature extraction between training and inference (`features.extract_features`), driving suspicion toward data coverage rather than pipeline drift.
## Actions Taken
1. **Data Organisation**
- Split the TAU Mobile archive into `data/audio`, `data/audio2`, and `data/audio9` based on filename suffixes (`-a/-b/-c`).
- Normalised provenance defaults in `configs/base.yaml` for the new device buckets.
2. **Metadata Refresh**
- Ran `python3 scripts/refresh_metadata.py --config configs/base.yaml` to register hashes and sources for all clips (including new iPhone/AirPods captures).
- Repeated after each data ingest to keep `data/metadata.csv` consistent.
3. **Model Retraining**
- Executed `python train.py` to rebuild `models/model.pkl` and `models/label_encoder.pkl` with the expanded dataset (990 clips total).
4. **Inference UX Improvements**
- Allowed directory inputs in `predict.py` so `python predict.py data/iphone` expands automatically.
- Updated the “laptop” friendly name to “AirPods Pro / MacBook built-in microphone” to reflect the mixed capture source.
## Verification
- Post-retrain prediction:
```
File: data/iphone/clip_02.wav
RMS loudness: -40.8 dBFS
1. Local iPhone recordings — 96.1%
2. AirPods Pro / MacBook built-in microphone — 3.9%
3. Samsung Galaxy S7 (TAU device B) — 0.0%
```
- The confidence inversion (≈96 % iPhone) confirms the classifier now separates the classes even for low-level speech content.
## Feature Changes for Improved Results
- `configs/base.yaml`: added TAU device folders to `include_devices` and defined CC-BY provenance defaults.
- `data/metadata.csv`: regenerated with 990 entries to incorporate the new recordings (62 iPhone, 29 AirPods/Mac).
- `devices.py`: renamed the “laptop” label to “AirPods Pro / MacBook built-in microphone” for accurate reporting.
- `predict.py`: added directory expansion and broader audio-extension support to streamline batch evaluation.
- Dataset restructuring: migrated TAU archive clips into `data/audio`, `data/audio2`, `data/audio9` directories, preserving the `-a/-b/-c` microphone mapping.
## Follow-Up Recommendations
- Continue collecting parallel iPhone vs. AirPods recordings, especially in quiet environments, until class counts approach parity with TAU devices.
- Maintain a held-out validation set (not yet captured) to quantify gains objectively beyond spot checks.
- Document future ingestion runs by appending to this case study or a dedicated experiment log under `docs/`.