File size: 8,911 Bytes
de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe b6c1b75 de9c0fe | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
title: Mic-ID
emoji: "ποΈ"
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: "1.31.1"
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---
# Mic-ID
 
A Streamlit front-end around a microphone fingerprinting baseline: drop in a short clip, get the most likely capture device plus an optional tonal hint. ποΈ Built for quick lab demos, perfect for showing off how far classic features still go.
## Table of Contents
- [Highlights](#highlights)
- [Live Demo Flow](#live-demo-flow)
- [Quick Start](#quick-start)
- [Controls at a Glance](#controls-at-a-glance)
- [Device Recognition](#device-recognition)
- [Scale Detection](#scale-detection)
- [Bundled Example Clips](#bundled-example-clips)
- [Download Contents](#download-contents)
- [Testing](#testing)
- [Project Layout](#project-layout)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
## Highlights
- π End-to-end workflow for collecting, training, and demoing mic classification in one repo.
- ποΈ Feature-first approach: log-mel, MFCC, and spectral stats feed a histogram gradient boosting model.
- π§ Friendly predictions: class IDs map to real device names so you can narrate results without decoding labels.
- ποΈ Lightweight artefacts: plain `.wav` folders in `data/`, pickled models in `models/`, metrics and confusion heatmaps in `reports/`.
- βοΈ Streamlit UI mirrors the CLI helpers, including loudness normalisation and experimental scale read-outs.
## Live Demo Flow
If you are running a live session, keep this script handy:
- π§ `streamlit run app.py` from the project root.
- π Use `data/audio/airport-helsinki-204-6138-a.wav` to introduce the core upload flow and the default top-3 guess list.
- π Swap to `data/audio/airport-helsinki-204-6138-b.wav` or `data/audio/airport-helsinki-204-6138-c.wav` to highlight how the twin scene shifts the predicted device while the environment stays constant.
- π± Jump to `data/iphone/clip_05.wav` to show the locally recorded class and talk about adding in-house gear with `utils.py`.
- π Mention the probability bar chart and the saved copy under `uploads/hooks - <filename>` for later analysis.
## Quick Start
β‘ Four commands set everything up:
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 scripts/refresh_metadata.py # rebuild hashes + provenance records
python3 train.py --config configs/base.yaml # optional if you want to refresh the model
```
Then launch the app with `streamlit run app.py` (defaults to http://localhost:8501).
## Hugging Face Space Setup
Want a hosted demo? This repo is ready to drop into a Hugging Face Space using the Streamlit SDK. The short version:
1. `pip install -U "huggingface_hub[cli]"` and run `huggingface-cli login` with a write-scoped access token.
2. `git clone` your Space (for example `https://huggingface.co/spaces/connaaa/mic-id`) into an empty folder.
3. Copy the contents of this repository into that clone, keeping `README.md`, `app.py`, `requirements.txt`, `packages.txt`, `models/`, and the curated `data/` subsets you want online.
4. Commit and `git push`. The Space will build the dependencies listed in `requirements.txt` plus Debian packages from `packages.txt`.
Large training corpora can be trimmed before pushing if you only need the pretrained model for inference.
## Controls at a Glance
| Control | Default | What it does |
| --- | --- | --- |
| File uploader | β | Accepts WAV/MP3/M4A, converts to 16 kHz mono, and normalises loudness before scoring. |
| `How many guesses should we list?` slider | 3 | Sets the length of the ranked prediction list and bar chart. |
| Training data expander | Collapsed | Recaps which datasets went into the current checkpoint, handy during demos. |
| Prediction pane | Auto | Shows the tonal estimate (if any), RMS loudness, ranked devices, and probability chart. |
Each control includes inline help text so presenters can improvise without notes.
## Device Recognition
- π§± Audio flows through `features.extract_features`, stitching log-mel and MFCC statistics with zero-crossing, centroid, roll-off, and flatness cues.
- π² `python3 train.py --config configs/base.yaml` reads the provenance metadata, enforces per-device clip minimums, and fits a `HistGradientBoostingClassifier` before saving artefacts to `models/model.pkl` plus the label encoder.
- π Every training run exports `reports/metrics.json`, `reports/confusion_matrix.png`, and a timestamped `reports/runs/run-*.json` snapshot so you can cite precision/recall live.
- π·οΈ The app and CLI surface friendly names (e.g. βZoom F8 field recorderβ) pulled from `devices.describe_label()` to keep the story human-readable.
## Scale Detection
- πΌ Uses a simple `librosa` chroma profile match across all major/minor keys.
- β
High confidence (β₯β―0.6) renders a green highlight, 0.4β0.6 shows an amber βlow confidenceβ tag, and anything lower hides the scale suggestion entirely.
- π₯ Purely percussive or noisy clips skip the tonal hint, which is exactly what you want for location recordings.
## Bundled Example Clips
All sample audio lives under `data/` and mirrors the device IDs referenced in the demo.
| Folder | What it represents | Count* |
| --- | --- | --- |
| `audio/` | TAU Urban Acoustic Scenes clips (device A) β Zoom F8 field recorder | 295 |
| `audio2/` | TAU Urban Acoustic Scenes clips (device B) β Samsung Galaxy S7 | 295 |
| `audio9/` | TAU Urban Acoustic Scenes clips (device C) β iPhone SE | 295 |
| `iphone/` | Locally recorded iPhone speech snippets captured with `utils.py` | 4 |
| `laptop/` | MacBook built-in mic samples recorded in a treated room | 4 |
| `outtakes/` | Extra captures you can promote into training data after curation | varies |
βCounts based on the current repo snapshot; refresh `data/` to rebalance as needed.
## Download Contents
Every run generates artefacts you can drop into a slide deck or share with collaborators:
- π― `models/model.pkl` and `models/label_encoder.pkl` store the trained classifier and label map.
- π `reports/metrics.json` plus `reports/confusion_matrix.png` capture evaluation snapshots for the latest training session.
- π§Ύ `data/metadata.csv` tracks every clipβs provenance, licence, and hash for reproducible retrains.
- ποΈ `reports/runs/run-*.json` snapshots record the exact config, dataset summary, and hashes used for each training run.
- π Uploaded clips are preserved under `uploads/hooks - <original-name>` so you can replay or re-label them later.
## Testing
Quick smoke checks live in the scripts themselves:
```bash
# Validate provenance without training
python3 train.py --dry-run
# Rebuild the model, metrics, and run snapshot
python3 train.py --config configs/base.yaml
# Score a few clips and verify probabilities look sane
python3 predict.py data/laptop/clip_01.wav data/iphone/clip_05.wav --topk 5
```
For deeper regression coverage, wire these commands into your CI and compare the resulting metrics JSON against previous baselines.
## Project Layout
```
mic-id/
ββ app.py # Streamlit UI for uploading and scoring clips
ββ predict.py # CLI scorer with friendly device names
ββ train.py # Dataset loader, model trainer, metric exporter
ββ configs/ # YAML training configs + device provenance defaults
ββ features.py # Audio feature extraction helpers
ββ utils.py # Command-line recorder for new device samples
ββ data/ # Per-device waveforms and provenance metadata
β ββ metadata.csv # Clip-level provenance (source/licence/hash)
ββ models/ # Saved classifier + label encoder
ββ reports/ # Metrics JSON and confusion matrix plots
ββ docs/ # Data sourcing guide and prep notes
ββ scripts/ # Dataset preparation helpers (TAU, Freesound, etc.)
ββ uploads/ # Cached demo uploads saved by the Streamlit app
```
## Roadmap
- π°οΈ Add a lightweight CNN baseline alongside the gradient boosting model for comparison.
- π§ͺ Ship augmentation scripts (noise, EQ, impulse responses) to spotlight microphone colouration differences.
- π Wire metadata/hash validation into CI so new clips are rejected unless provenance is complete.
- π¦ Polish export helpers so the app can bundle probabilities + features in one download.
## Contributing
Issues and pull requests are welcome. π€ If you contribute new devices, include a short note (or a `metadata.csv` entry) describing the capture setup so others can reproduce your results and audit licensing.
|