Spaces:

connaaa
/

mic-id

Sleeping

App Files Files Community

mic-id / README.md

connork

Align Space with latest Mic-ID release

b6c1b75 7 months ago

preview code

raw

history blame contribute delete

8.91 kB

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

metadata

title: Mic-ID
emoji: 🎙️
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.31.1
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

Mic-ID

A Streamlit front-end around a microphone fingerprinting baseline: drop in a short clip, get the most likely capture device plus an optional tonal hint. 🎙️ Built for quick lab demos, perfect for showing off how far classic features still go.

Highlights
Live Demo Flow
Quick Start
Controls at a Glance
Device Recognition
Scale Detection
Bundled Example Clips
Download Contents
Testing
Project Layout
Roadmap
Contributing

Highlights

🔎 End-to-end workflow for collecting, training, and demoing mic classification in one repo.
🎛️ Feature-first approach: log-mel, MFCC, and spectral stats feed a histogram gradient boosting model.
🧠 Friendly predictions: class IDs map to real device names so you can narrate results without decoding labels.
🗂️ Lightweight artefacts: plain .wav folders in data/, pickled models in models/, metrics and confusion heatmaps in reports/.
⚙️ Streamlit UI mirrors the CLI helpers, including loudness normalisation and experimental scale read-outs.

Live Demo Flow

If you are running a live session, keep this script handy:

🎧 streamlit run app.py from the project root.
📂 Use data/audio/airport-helsinki-204-6138-a.wav to introduce the core upload flow and the default top-3 guess list.
🔄 Swap to data/audio/airport-helsinki-204-6138-b.wav or data/audio/airport-helsinki-204-6138-c.wav to highlight how the twin scene shifts the predicted device while the environment stays constant.
📱 Jump to data/iphone/clip_05.wav to show the locally recorded class and talk about adding in-house gear with utils.py.
📊 Mention the probability bar chart and the saved copy under uploads/hooks - <filename> for later analysis.

Quick Start

⚡ Four commands set everything up:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 scripts/refresh_metadata.py  # rebuild hashes + provenance records
python3 train.py --config configs/base.yaml  # optional if you want to refresh the model

Then launch the app with streamlit run app.py (defaults to http://localhost:8501).

Hugging Face Space Setup

Want a hosted demo? This repo is ready to drop into a Hugging Face Space using the Streamlit SDK. The short version:

pip install -U "huggingface_hub[cli]" and run huggingface-cli login with a write-scoped access token.
git clone your Space (for example https://huggingface.co/spaces/connaaa/mic-id) into an empty folder.
Copy the contents of this repository into that clone, keeping README.md, app.py, requirements.txt, packages.txt, models/, and the curated data/ subsets you want online.
Commit and git push. The Space will build the dependencies listed in requirements.txt plus Debian packages from packages.txt.

Large training corpora can be trimmed before pushing if you only need the pretrained model for inference.

Controls at a Glance

Control	Default	What it does
File uploader	–	Accepts WAV/MP3/M4A, converts to 16 kHz mono, and normalises loudness before scoring.
`How many guesses should we list?` slider	3	Sets the length of the ranked prediction list and bar chart.
Training data expander	Collapsed	Recaps which datasets went into the current checkpoint, handy during demos.
Prediction pane	Auto	Shows the tonal estimate (if any), RMS loudness, ranked devices, and probability chart.

Each control includes inline help text so presenters can improvise without notes.

Device Recognition

🧱 Audio flows through features.extract_features, stitching log-mel and MFCC statistics with zero-crossing, centroid, roll-off, and flatness cues.
🌲 python3 train.py --config configs/base.yaml reads the provenance metadata, enforces per-device clip minimums, and fits a HistGradientBoostingClassifier before saving artefacts to models/model.pkl plus the label encoder.
📈 Every training run exports reports/metrics.json, reports/confusion_matrix.png, and a timestamped reports/runs/run-*.json snapshot so you can cite precision/recall live.
🏷️ The app and CLI surface friendly names (e.g. “Zoom F8 field recorder”) pulled from devices.describe_label() to keep the story human-readable.

Scale Detection

🎼 Uses a simple librosa chroma profile match across all major/minor keys.
✅ High confidence (≥ 0.6) renders a green highlight, 0.4–0.6 shows an amber “low confidence” tag, and anything lower hides the scale suggestion entirely.
🥁 Purely percussive or noisy clips skip the tonal hint, which is exactly what you want for location recordings.

Bundled Example Clips

All sample audio lives under data/ and mirrors the device IDs referenced in the demo.

Folder	What it represents	Count*
`audio/`	TAU Urban Acoustic Scenes clips (device A) – Zoom F8 field recorder	295
`audio2/`	TAU Urban Acoustic Scenes clips (device B) – Samsung Galaxy S7	295
`audio9/`	TAU Urban Acoustic Scenes clips (device C) – iPhone SE	295
`iphone/`	Locally recorded iPhone speech snippets captured with `utils.py`	4
`laptop/`	MacBook built-in mic samples recorded in a treated room	4
`outtakes/`	Extra captures you can promote into training data after curation	varies

⋆Counts based on the current repo snapshot; refresh data/ to rebalance as needed.

Download Contents

Every run generates artefacts you can drop into a slide deck or share with collaborators:

🎯 models/model.pkl and models/label_encoder.pkl store the trained classifier and label map.
📊 reports/metrics.json plus reports/confusion_matrix.png capture evaluation snapshots for the latest training session.
🧾 data/metadata.csv tracks every clip’s provenance, licence, and hash for reproducible retrains.
🗂️ reports/runs/run-*.json snapshots record the exact config, dataset summary, and hashes used for each training run.
📁 Uploaded clips are preserved under uploads/hooks - <original-name> so you can replay or re-label them later.

Testing

Quick smoke checks live in the scripts themselves:

# Validate provenance without training
python3 train.py --dry-run

# Rebuild the model, metrics, and run snapshot
python3 train.py --config configs/base.yaml

# Score a few clips and verify probabilities look sane
python3 predict.py data/laptop/clip_01.wav data/iphone/clip_05.wav --topk 5

For deeper regression coverage, wire these commands into your CI and compare the resulting metrics JSON against previous baselines.

Project Layout

mic-id/
 ├─ app.py              # Streamlit UI for uploading and scoring clips
 ├─ predict.py          # CLI scorer with friendly device names
 ├─ train.py            # Dataset loader, model trainer, metric exporter
 ├─ configs/            # YAML training configs + device provenance defaults
 ├─ features.py         # Audio feature extraction helpers
 ├─ utils.py            # Command-line recorder for new device samples
 ├─ data/               # Per-device waveforms and provenance metadata
 │   └─ metadata.csv    # Clip-level provenance (source/licence/hash)
 ├─ models/             # Saved classifier + label encoder
 ├─ reports/            # Metrics JSON and confusion matrix plots
 ├─ docs/               # Data sourcing guide and prep notes
 ├─ scripts/            # Dataset preparation helpers (TAU, Freesound, etc.)
 └─ uploads/            # Cached demo uploads saved by the Streamlit app

Roadmap

🛰️ Add a lightweight CNN baseline alongside the gradient boosting model for comparison.
🧪 Ship augmentation scripts (noise, EQ, impulse responses) to spotlight microphone colouration differences.
🔐 Wire metadata/hash validation into CI so new clips are rejected unless provenance is complete.
📦 Polish export helpers so the app can bundle probabilities + features in one download.

Contributing

Issues and pull requests are welcome. 🤝 If you contribute new devices, include a short note (or a metadata.csv entry) describing the capture setup so others can reproduce your results and audit licensing.