File size: 8,911 Bytes
de9c0fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6c1b75
 
de9c0fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6c1b75
 
de9c0fe
 
 
 
 
 
 
 
 
 
 
 
b6c1b75
 
 
 
 
 
de9c0fe
b6c1b75
de9c0fe
 
 
 
 
 
b6c1b75
 
de9c0fe
 
 
 
 
 
b6c1b75
 
 
 
 
de9c0fe
 
b6c1b75
de9c0fe
 
 
 
 
 
 
 
 
 
b6c1b75
de9c0fe
 
b6c1b75
 
de9c0fe
 
 
 
 
 
 
 
 
 
b6c1b75
de9c0fe
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: Mic-ID
emoji: "πŸŽ™οΈ"
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: "1.31.1"
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---

# Mic-ID

![Streamlit](https://img.shields.io/badge/Streamlit-FF4B4B?style=flat&logo=streamlit&logoColor=white) ![Python](https://img.shields.io/badge/Python-3776AB?style=flat&logo=python&logoColor=white)

A Streamlit front-end around a microphone fingerprinting baseline: drop in a short clip, get the most likely capture device plus an optional tonal hint. πŸŽ™οΈ Built for quick lab demos, perfect for showing off how far classic features still go.

## Table of Contents
- [Highlights](#highlights)
- [Live Demo Flow](#live-demo-flow)
- [Quick Start](#quick-start)
- [Controls at a Glance](#controls-at-a-glance)
- [Device Recognition](#device-recognition)
- [Scale Detection](#scale-detection)
- [Bundled Example Clips](#bundled-example-clips)
- [Download Contents](#download-contents)
- [Testing](#testing)
- [Project Layout](#project-layout)
- [Roadmap](#roadmap)
- [Contributing](#contributing)

## Highlights
- πŸ”Ž End-to-end workflow for collecting, training, and demoing mic classification in one repo.
- πŸŽ›οΈ Feature-first approach: log-mel, MFCC, and spectral stats feed a histogram gradient boosting model.
- 🧠 Friendly predictions: class IDs map to real device names so you can narrate results without decoding labels.
- πŸ—‚οΈ Lightweight artefacts: plain `.wav` folders in `data/`, pickled models in `models/`, metrics and confusion heatmaps in `reports/`.
- βš™οΈ Streamlit UI mirrors the CLI helpers, including loudness normalisation and experimental scale read-outs.

## Live Demo Flow
If you are running a live session, keep this script handy:

- 🎧 `streamlit run app.py` from the project root.
- πŸ“‚ Use `data/audio/airport-helsinki-204-6138-a.wav` to introduce the core upload flow and the default top-3 guess list.
- πŸ”„ Swap to `data/audio/airport-helsinki-204-6138-b.wav` or `data/audio/airport-helsinki-204-6138-c.wav` to highlight how the twin scene shifts the predicted device while the environment stays constant.
- πŸ“± Jump to `data/iphone/clip_05.wav` to show the locally recorded class and talk about adding in-house gear with `utils.py`.
- πŸ“Š Mention the probability bar chart and the saved copy under `uploads/hooks - <filename>` for later analysis.

## Quick Start
⚑ Four commands set everything up:

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 scripts/refresh_metadata.py  # rebuild hashes + provenance records
python3 train.py --config configs/base.yaml  # optional if you want to refresh the model
```

Then launch the app with `streamlit run app.py` (defaults to http://localhost:8501).

## Hugging Face Space Setup
Want a hosted demo? This repo is ready to drop into a Hugging Face Space using the Streamlit SDK. The short version:

1. `pip install -U "huggingface_hub[cli]"` and run `huggingface-cli login` with a write-scoped access token.
2. `git clone` your Space (for example `https://huggingface.co/spaces/connaaa/mic-id`) into an empty folder.
3. Copy the contents of this repository into that clone, keeping `README.md`, `app.py`, `requirements.txt`, `packages.txt`, `models/`, and the curated `data/` subsets you want online.
4. Commit and `git push`. The Space will build the dependencies listed in `requirements.txt` plus Debian packages from `packages.txt`.

Large training corpora can be trimmed before pushing if you only need the pretrained model for inference.

## Controls at a Glance
| Control | Default | What it does |
| --- | --- | --- |
| File uploader | – | Accepts WAV/MP3/M4A, converts to 16 kHz mono, and normalises loudness before scoring. |
| `How many guesses should we list?` slider | 3 | Sets the length of the ranked prediction list and bar chart. |
| Training data expander | Collapsed | Recaps which datasets went into the current checkpoint, handy during demos. |
| Prediction pane | Auto | Shows the tonal estimate (if any), RMS loudness, ranked devices, and probability chart. |

Each control includes inline help text so presenters can improvise without notes.

## Device Recognition
- 🧱 Audio flows through `features.extract_features`, stitching log-mel and MFCC statistics with zero-crossing, centroid, roll-off, and flatness cues.
- 🌲 `python3 train.py --config configs/base.yaml` reads the provenance metadata, enforces per-device clip minimums, and fits a `HistGradientBoostingClassifier` before saving artefacts to `models/model.pkl` plus the label encoder.
- πŸ“ˆ Every training run exports `reports/metrics.json`, `reports/confusion_matrix.png`, and a timestamped `reports/runs/run-*.json` snapshot so you can cite precision/recall live.
- 🏷️ The app and CLI surface friendly names (e.g. β€œZoom F8 field recorder”) pulled from `devices.describe_label()` to keep the story human-readable.

## Scale Detection
- 🎼 Uses a simple `librosa` chroma profile match across all major/minor keys.
- βœ… High confidence (β‰₯β€―0.6) renders a green highlight, 0.4–0.6 shows an amber β€œlow confidence” tag, and anything lower hides the scale suggestion entirely.
- πŸ₯ Purely percussive or noisy clips skip the tonal hint, which is exactly what you want for location recordings.

## Bundled Example Clips
All sample audio lives under `data/` and mirrors the device IDs referenced in the demo.

| Folder | What it represents | Count* |
| --- | --- | --- |
| `audio/` | TAU Urban Acoustic Scenes clips (device A) – Zoom F8 field recorder | 295 |
| `audio2/` | TAU Urban Acoustic Scenes clips (device B) – Samsung Galaxy S7 | 295 |
| `audio9/` | TAU Urban Acoustic Scenes clips (device C) – iPhone SE | 295 |
| `iphone/` | Locally recorded iPhone speech snippets captured with `utils.py` | 4 |
| `laptop/` | MacBook built-in mic samples recorded in a treated room | 4 |
| `outtakes/` | Extra captures you can promote into training data after curation | varies |

⋆Counts based on the current repo snapshot; refresh `data/` to rebalance as needed.

## Download Contents
Every run generates artefacts you can drop into a slide deck or share with collaborators:

- 🎯 `models/model.pkl` and `models/label_encoder.pkl` store the trained classifier and label map.
- πŸ“Š `reports/metrics.json` plus `reports/confusion_matrix.png` capture evaluation snapshots for the latest training session.
- 🧾 `data/metadata.csv` tracks every clip’s provenance, licence, and hash for reproducible retrains.
- πŸ—‚οΈ `reports/runs/run-*.json` snapshots record the exact config, dataset summary, and hashes used for each training run.
- πŸ“ Uploaded clips are preserved under `uploads/hooks - <original-name>` so you can replay or re-label them later.

## Testing
Quick smoke checks live in the scripts themselves:

```bash
# Validate provenance without training
python3 train.py --dry-run

# Rebuild the model, metrics, and run snapshot
python3 train.py --config configs/base.yaml

# Score a few clips and verify probabilities look sane
python3 predict.py data/laptop/clip_01.wav data/iphone/clip_05.wav --topk 5
```

For deeper regression coverage, wire these commands into your CI and compare the resulting metrics JSON against previous baselines.

## Project Layout
```
mic-id/
 β”œβ”€ app.py              # Streamlit UI for uploading and scoring clips
 β”œβ”€ predict.py          # CLI scorer with friendly device names
 β”œβ”€ train.py            # Dataset loader, model trainer, metric exporter
 β”œβ”€ configs/            # YAML training configs + device provenance defaults
 β”œβ”€ features.py         # Audio feature extraction helpers
 β”œβ”€ utils.py            # Command-line recorder for new device samples
 β”œβ”€ data/               # Per-device waveforms and provenance metadata
 β”‚   └─ metadata.csv    # Clip-level provenance (source/licence/hash)
 β”œβ”€ models/             # Saved classifier + label encoder
 β”œβ”€ reports/            # Metrics JSON and confusion matrix plots
 β”œβ”€ docs/               # Data sourcing guide and prep notes
 β”œβ”€ scripts/            # Dataset preparation helpers (TAU, Freesound, etc.)
 └─ uploads/            # Cached demo uploads saved by the Streamlit app
```

## Roadmap
- πŸ›°οΈ Add a lightweight CNN baseline alongside the gradient boosting model for comparison.
- πŸ§ͺ Ship augmentation scripts (noise, EQ, impulse responses) to spotlight microphone colouration differences.
- πŸ” Wire metadata/hash validation into CI so new clips are rejected unless provenance is complete.
- πŸ“¦ Polish export helpers so the app can bundle probabilities + features in one download.

## Contributing
Issues and pull requests are welcome. 🀝 If you contribute new devices, include a short note (or a `metadata.csv` entry) describing the capture setup so others can reproduce your results and audit licensing.