Audio Classification
LiteRT
LiteRT
multilingual
speaker-embedding
speaker-recognition
diarization
on-device
soniqo
speech-cloud
speech-core
Instructions to use soniqo/WeSpeaker-ResNet34-LM-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/WeSpeaker-ResNet34-LM-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: cc-by-4.0 | |
| language: | |
| - multilingual | |
| tags: | |
| - speaker-embedding | |
| - speaker-recognition | |
| - diarization | |
| - litert | |
| - tflite | |
| - on-device | |
| - soniqo | |
| - speech-cloud | |
| - speech-core | |
| base_model: pyannote/wespeaker-voxceleb-resnet34-LM | |
| library_name: litert | |
| pipeline_tag: audio-classification | |
| # WeSpeaker ResNet34-LM β LiteRT | |
| Speaker embedding for speaker identification and diarization clustering. | |
| > Part of the [**soniqo.audio**](https://soniqo.audio) speech toolkit β | |
| > an open, runtime-portable stack for speech AI. This bundle is the | |
| > **LiteRT** export, designed to plug into the abstract interfaces in | |
| > [`speech-core`](https://github.com/soniqo/speech-core) (C++ voice-agent | |
| > orchestration library). Browse all LiteRT bundles in the | |
| > [**soniqo LiteRT collection**](https://huggingface.co/collections/soniqo/litert-6a08268e11d5a47d7aacc02b). | |
| ## Use cases on soniqo.audio | |
| - [Meeting transcription](https://soniqo.audio/transcription/) | |
| 256-dim speaker embedding network for Android, ported from | |
| `pyannote/wespeaker-voxceleb-resnet34-LM`. | |
| ## Model | |
| | Property | Value | | |
| |---|---| | |
| | Architecture | ResNet34 + stats pooling + linear projection | | |
| | Parameters | ~6.6 M | | |
| | Format | LiteRT (TFLite) | | |
| | Quantization | float32 | | |
| | Sample rate | 16 000 Hz | | |
| | Input | 80-bin kaldi-style mel fbank features (T frames) | | |
| | Output | L2-normalized 256-dim embedding | | |
| ## Files | |
| | File | Size | Description | | |
| |---|---|---| | |
| | `wespeaker-resnet34.tflite` | 25.4 MB | Full model, FP32 | | |
| | `config.json` | 1 KB | Fbank spec + I/O signature | | |
| ## Why fbank-as-input | |
| pyannote's kaldi fbank implementation uses `torch.hamming_window` and | |
| `aten._fft_r2c`, neither of which has a lowering in litert-torch. We | |
| export only the ResNet34 portion; the caller computes the 80-bin fbank | |
| features on-device. This matches the standard mobile speaker-embedding | |
| pattern and keeps the tflite graph free of FFT ops. | |
| ### Fbank parameters | |
| | Parameter | Value | | |
| |---|---| | |
| | `num_mel_bins` | 80 | | |
| | `frame_length` | 25 ms | | |
| | `frame_shift` | 10 ms | | |
| | `window_type` | hamming | | |
| | `dither` | 0.0 | | |
| | `use_energy` | false | | |
| The reference implementation is `torchaudio.compliance.kaldi.fbank` with | |
| those arguments. The model internally applies `features - mean(features, dim=1)` | |
| centering so the caller may pass raw (uncentered) fbank output. | |
| ## Signature | |
| ``` | |
| Inputs: | |
| fbank [1, T, 80] float32 Kaldi mel fbank, T=298 for 3 s @ 16 kHz | |
| Outputs: | |
| embedding [1, 256] float32 L2-normalized speaker embedding | |
| ``` | |
| ## Parity | |
| Verified `max diff = 4.2e-07` vs the upstream pyannote model's full forward | |
| on a random 3-second waveform (with kaldi fbank features computed | |
| externally). | |
| ## Usage | |
| ```kotlin | |
| // Compute 80-bin kaldi fbank features on-device with your preferred library | |
| val fbank = kaldiFbank(audio, melBins = 80, frameLengthMs = 25, frameShiftMs = 10) | |
| val model = Interpreter(loadModelFile("wespeaker-resnet34.tflite")) | |
| val embedding = FloatArray(256) | |
| model.run(fbank, embedding) | |
| ``` | |
| ## Source | |
| Upstream: [pyannote/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM) | |
| (CC BY 4.0, gated β accept the license on the upstream page). | |
| ## Links | |
| - [speech-android](https://github.com/soniqo/speech-android) β Android SDK | |
| - [soniqo.audio](https://soniqo.audio) β website | |
| - [blog](https://soniqo.audio/blog) β blog | |
| ## Ecosystem | |
| - [**soniqo.audio**](https://soniqo.audio) β use-case explorer (transcription, voice cloning, live ASR, voice agents). | |
| - [**speech-core**](https://github.com/soniqo/speech-core) β C++ orchestration library for voice agents. Abstract `STTInterface` / `TTSInterface` / `VADInterface` / `EnhancerInterface`; LiteRT implementations plug straight into the interfaces. | |
| - [**speech-swift**](https://github.com/soniqo/speech-swift) β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable). | |
| - [**speech-android**](https://github.com/soniqo/speech-android) β Android SDK consuming on-device LiteRT bundles. | |
| ## Other LiteRT models in this collection | |
| **ASR / Transcription** | |
| - [Parakeet TDT 0.6B v3 β LiteRT (INT8)](https://huggingface.co/soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8) | |
| - [Nemotron Speech Streaming 0.6B β LiteRT](https://huggingface.co/soniqo/Nemotron-Speech-Streaming-LiteRT) | |
| - [Omnilingual ASR CTC 300M β LiteRT](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT) | |
| - [Omnilingual ASR CTC 300M β LiteRT (INT8)](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8) | |
| - [Qwen3 ASR 0.6B Encoder β LiteRT (INT8)](https://huggingface.co/soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8) | |
| **VAD / Diarization** | |
| - [Silero VAD v5 β LiteRT](https://huggingface.co/soniqo/Silero-VAD-v5-LiteRT) | |
| - [Pyannote Segmentation 3.0 β LiteRT](https://huggingface.co/soniqo/Pyannote-Segmentation-LiteRT) | |
| **TTS / Voice Cloning** | |
| - [VoxCPM2 β LiteRT (INT8)](https://huggingface.co/soniqo/VoxCPM2-LiteRT-INT8) | |
| ## License | |
| This bundle inherits the upstream model license (**cc-by-4.0**). See the | |
| linked `base_model` repository for the full terms. | |