A newer version of the Streamlit SDK is available: 1.57.0
title: Mic-ID
emoji: ποΈ
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.31.1
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
Mic-ID
A Streamlit front-end around a microphone fingerprinting baseline: drop in a short clip, get the most likely capture device plus an optional tonal hint. ποΈ Built for quick lab demos, perfect for showing off how far classic features still go.
Table of Contents
- Highlights
- Live Demo Flow
- Quick Start
- Controls at a Glance
- Device Recognition
- Scale Detection
- Bundled Example Clips
- Download Contents
- Testing
- Project Layout
- Roadmap
- Contributing
Highlights
- π End-to-end workflow for collecting, training, and demoing mic classification in one repo.
- ποΈ Feature-first approach: log-mel, MFCC, and spectral stats feed a histogram gradient boosting model.
- π§ Friendly predictions: class IDs map to real device names so you can narrate results without decoding labels.
- ποΈ Lightweight artefacts: plain
.wavfolders indata/, pickled models inmodels/, metrics and confusion heatmaps inreports/. - βοΈ Streamlit UI mirrors the CLI helpers, including loudness normalisation and experimental scale read-outs.
Live Demo Flow
If you are running a live session, keep this script handy:
- π§
streamlit run app.pyfrom the project root. - π Use
data/audio/airport-helsinki-204-6138-a.wavto introduce the core upload flow and the default top-3 guess list. - π Swap to
data/audio/airport-helsinki-204-6138-b.wavordata/audio/airport-helsinki-204-6138-c.wavto highlight how the twin scene shifts the predicted device while the environment stays constant. - π± Jump to
data/iphone/clip_05.wavto show the locally recorded class and talk about adding in-house gear withutils.py. - π Mention the probability bar chart and the saved copy under
uploads/hooks - <filename>for later analysis.
Quick Start
β‘ Four commands set everything up:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 scripts/refresh_metadata.py # rebuild hashes + provenance records
python3 train.py --config configs/base.yaml # optional if you want to refresh the model
Then launch the app with streamlit run app.py (defaults to http://localhost:8501).
Hugging Face Space Setup
Want a hosted demo? This repo is ready to drop into a Hugging Face Space using the Streamlit SDK. The short version:
pip install -U "huggingface_hub[cli]"and runhuggingface-cli loginwith a write-scoped access token.git cloneyour Space (for examplehttps://huggingface.co/spaces/connaaa/mic-id) into an empty folder.- Copy the contents of this repository into that clone, keeping
README.md,app.py,requirements.txt,packages.txt,models/, and the curateddata/subsets you want online. - Commit and
git push. The Space will build the dependencies listed inrequirements.txtplus Debian packages frompackages.txt.
Large training corpora can be trimmed before pushing if you only need the pretrained model for inference.
Controls at a Glance
| Control | Default | What it does |
|---|---|---|
| File uploader | β | Accepts WAV/MP3/M4A, converts to 16 kHz mono, and normalises loudness before scoring. |
How many guesses should we list? slider |
3 | Sets the length of the ranked prediction list and bar chart. |
| Training data expander | Collapsed | Recaps which datasets went into the current checkpoint, handy during demos. |
| Prediction pane | Auto | Shows the tonal estimate (if any), RMS loudness, ranked devices, and probability chart. |
Each control includes inline help text so presenters can improvise without notes.
Device Recognition
- π§± Audio flows through
features.extract_features, stitching log-mel and MFCC statistics with zero-crossing, centroid, roll-off, and flatness cues. - π²
python3 train.py --config configs/base.yamlreads the provenance metadata, enforces per-device clip minimums, and fits aHistGradientBoostingClassifierbefore saving artefacts tomodels/model.pklplus the label encoder. - π Every training run exports
reports/metrics.json,reports/confusion_matrix.png, and a timestampedreports/runs/run-*.jsonsnapshot so you can cite precision/recall live. - π·οΈ The app and CLI surface friendly names (e.g. βZoom F8 field recorderβ) pulled from
devices.describe_label()to keep the story human-readable.
Scale Detection
- πΌ Uses a simple
librosachroma profile match across all major/minor keys. - β High confidence (β₯β―0.6) renders a green highlight, 0.4β0.6 shows an amber βlow confidenceβ tag, and anything lower hides the scale suggestion entirely.
- π₯ Purely percussive or noisy clips skip the tonal hint, which is exactly what you want for location recordings.
Bundled Example Clips
All sample audio lives under data/ and mirrors the device IDs referenced in the demo.
| Folder | What it represents | Count* |
|---|---|---|
audio/ |
TAU Urban Acoustic Scenes clips (device A) β Zoom F8 field recorder | 295 |
audio2/ |
TAU Urban Acoustic Scenes clips (device B) β Samsung Galaxy S7 | 295 |
audio9/ |
TAU Urban Acoustic Scenes clips (device C) β iPhone SE | 295 |
iphone/ |
Locally recorded iPhone speech snippets captured with utils.py |
4 |
laptop/ |
MacBook built-in mic samples recorded in a treated room | 4 |
outtakes/ |
Extra captures you can promote into training data after curation | varies |
βCounts based on the current repo snapshot; refresh data/ to rebalance as needed.
Download Contents
Every run generates artefacts you can drop into a slide deck or share with collaborators:
- π―
models/model.pklandmodels/label_encoder.pklstore the trained classifier and label map. - π
reports/metrics.jsonplusreports/confusion_matrix.pngcapture evaluation snapshots for the latest training session. - π§Ύ
data/metadata.csvtracks every clipβs provenance, licence, and hash for reproducible retrains. - ποΈ
reports/runs/run-*.jsonsnapshots record the exact config, dataset summary, and hashes used for each training run. - π Uploaded clips are preserved under
uploads/hooks - <original-name>so you can replay or re-label them later.
Testing
Quick smoke checks live in the scripts themselves:
# Validate provenance without training
python3 train.py --dry-run
# Rebuild the model, metrics, and run snapshot
python3 train.py --config configs/base.yaml
# Score a few clips and verify probabilities look sane
python3 predict.py data/laptop/clip_01.wav data/iphone/clip_05.wav --topk 5
For deeper regression coverage, wire these commands into your CI and compare the resulting metrics JSON against previous baselines.
Project Layout
mic-id/
ββ app.py # Streamlit UI for uploading and scoring clips
ββ predict.py # CLI scorer with friendly device names
ββ train.py # Dataset loader, model trainer, metric exporter
ββ configs/ # YAML training configs + device provenance defaults
ββ features.py # Audio feature extraction helpers
ββ utils.py # Command-line recorder for new device samples
ββ data/ # Per-device waveforms and provenance metadata
β ββ metadata.csv # Clip-level provenance (source/licence/hash)
ββ models/ # Saved classifier + label encoder
ββ reports/ # Metrics JSON and confusion matrix plots
ββ docs/ # Data sourcing guide and prep notes
ββ scripts/ # Dataset preparation helpers (TAU, Freesound, etc.)
ββ uploads/ # Cached demo uploads saved by the Streamlit app
Roadmap
- π°οΈ Add a lightweight CNN baseline alongside the gradient boosting model for comparison.
- π§ͺ Ship augmentation scripts (noise, EQ, impulse responses) to spotlight microphone colouration differences.
- π Wire metadata/hash validation into CI so new clips are rejected unless provenance is complete.
- π¦ Polish export helpers so the app can bundle probabilities + features in one download.
Contributing
Issues and pull requests are welcome. π€ If you contribute new devices, include a short note (or a metadata.csv entry) describing the capture setup so others can reproduce your results and audit licensing.