Spaces:

connaaa
/

mic-id

Sleeping

App Files Files Community

mic-id / docs /clip02-misclassification.md

connork

Align Space with latest Mic-ID release

b6c1b75 7 months ago

preview code

raw

history blame contribute delete

3.25 kB

	# Clip 02 Misclassification Case Study

	## Issue Summary
	- Symptom: `python predict.py data/iphone/*.wav` classified `data/iphone/clip_02.wav` as “MacBook built-in microphone” (~53 %) instead of “Local iPhone recordings” (≈47 %).
	- Impact: Undermined trust in the classifier for quiet iPhone speech, indicating poor separation between the iPhone and AirPods/Mac classes.

	## Investigation
	- Confirmed the mismatch reproduced after the first training run with the new TAU batches.
	- Compared class distributions via `train.py --dry-run`; highlighted severe imbalance: TAU devices (≈295 clips each) vs. iPhone (15 wav + 47 m4a) vs. AirPods/Mac (15 wav + 14 m4a).
	- Noted identical feature extraction between training and inference (`features.extract_features`), driving suspicion toward data coverage rather than pipeline drift.

	## Actions Taken
	1. Data Organisation
	- Split the TAU Mobile archive into `data/audio`, `data/audio2`, and `data/audio9` based on filename suffixes (`-a/-b/-c`).
	- Normalised provenance defaults in `configs/base.yaml` for the new device buckets.
	2. Metadata Refresh
	- Ran `python3 scripts/refresh_metadata.py --config configs/base.yaml` to register hashes and sources for all clips (including new iPhone/AirPods captures).
	- Repeated after each data ingest to keep `data/metadata.csv` consistent.
	3. Model Retraining
	- Executed `python train.py` to rebuild `models/model.pkl` and `models/label_encoder.pkl` with the expanded dataset (990 clips total).
	4. Inference UX Improvements
	- Allowed directory inputs in `predict.py` so `python predict.py data/iphone` expands automatically.
	- Updated the “laptop” friendly name to “AirPods Pro / MacBook built-in microphone” to reflect the mixed capture source.

	## Verification
	- Post-retrain prediction:
	```
	File: data/iphone/clip_02.wav
	RMS loudness: -40.8 dBFS
	1. Local iPhone recordings — 96.1%
	2. AirPods Pro / MacBook built-in microphone — 3.9%
	3. Samsung Galaxy S7 (TAU device B) — 0.0%
	```
	- The confidence inversion (≈96 % iPhone) confirms the classifier now separates the classes even for low-level speech content.

	## Feature Changes for Improved Results
	- `configs/base.yaml`: added TAU device folders to `include_devices` and defined CC-BY provenance defaults.
	- `data/metadata.csv`: regenerated with 990 entries to incorporate the new recordings (62 iPhone, 29 AirPods/Mac).
	- `devices.py`: renamed the “laptop” label to “AirPods Pro / MacBook built-in microphone” for accurate reporting.
	- `predict.py`: added directory expansion and broader audio-extension support to streamline batch evaluation.
	- Dataset restructuring: migrated TAU archive clips into `data/audio`, `data/audio2`, `data/audio9` directories, preserving the `-a/-b/-c` microphone mapping.

	## Follow-Up Recommendations
	- Continue collecting parallel iPhone vs. AirPods recordings, especially in quiet environments, until class counts approach parity with TAU devices.
	- Maintain a held-out validation set (not yet captured) to quantify gains objectively beyond spot checks.
	- Document future ingestion runs by appending to this case study or a dedicated experiment log under `docs/`.