Spaces:

connaaa
/

mic-id

Sleeping

App Files Files Community

mic-id / docs /clip02-misclassification.md

connork

Align Space with latest Mic-ID release

b6c1b75 7 months ago

preview code

raw

history blame contribute delete

3.25 kB

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

Clip 02 Misclassification Case Study

Issue Summary

Symptom: python predict.py data/iphone/*.wav classified data/iphone/clip_02.wav as “MacBook built-in microphone” (~53 %) instead of “Local iPhone recordings” (≈47 %).
Impact: Undermined trust in the classifier for quiet iPhone speech, indicating poor separation between the iPhone and AirPods/Mac classes.

Investigation

Confirmed the mismatch reproduced after the first training run with the new TAU batches.
Compared class distributions via train.py --dry-run; highlighted severe imbalance: TAU devices (≈295 clips each) vs. iPhone (15 wav + 47 m4a) vs. AirPods/Mac (15 wav + 14 m4a).
Noted identical feature extraction between training and inference (features.extract_features), driving suspicion toward data coverage rather than pipeline drift.

Actions Taken

Data Organisation
- Split the TAU Mobile archive into data/audio, data/audio2, and data/audio9 based on filename suffixes (-a/-b/-c).
- Normalised provenance defaults in configs/base.yaml for the new device buckets.
Metadata Refresh
- Ran python3 scripts/refresh_metadata.py --config configs/base.yaml to register hashes and sources for all clips (including new iPhone/AirPods captures).
- Repeated after each data ingest to keep data/metadata.csv consistent.
Model Retraining
- Executed python train.py to rebuild models/model.pkl and models/label_encoder.pkl with the expanded dataset (990 clips total).
Inference UX Improvements
- Allowed directory inputs in predict.py so python predict.py data/iphone expands automatically.
- Updated the “laptop” friendly name to “AirPods Pro / MacBook built-in microphone” to reflect the mixed capture source.

Verification

Post-retrain prediction:

File: data/iphone/clip_02.wav
RMS loudness: -40.8 dBFS
  1. Local iPhone recordings — 96.1%
  2. AirPods Pro / MacBook built-in microphone — 3.9%
  3. Samsung Galaxy S7 (TAU device B) — 0.0%

The confidence inversion (≈96 % iPhone) confirms the classifier now separates the classes even for low-level speech content.

Feature Changes for Improved Results

configs/base.yaml: added TAU device folders to include_devices and defined CC-BY provenance defaults.
data/metadata.csv: regenerated with 990 entries to incorporate the new recordings (62 iPhone, 29 AirPods/Mac).
devices.py: renamed the “laptop” label to “AirPods Pro / MacBook built-in microphone” for accurate reporting.
predict.py: added directory expansion and broader audio-extension support to streamline batch evaluation.
Dataset restructuring: migrated TAU archive clips into data/audio, data/audio2, data/audio9 directories, preserving the -a/-b/-c microphone mapping.

Follow-Up Recommendations

Continue collecting parallel iPhone vs. AirPods recordings, especially in quiet environments, until class counts approach parity with TAU devices.
Maintain a held-out validation set (not yet captured) to quantify gains objectively beyond spot checks.
Document future ingestion runs by appending to this case study or a dedicated experiment log under docs/.