card: unified LiteRT model card with soniqo.audio + ecosystem links

553304e verified 9 days ago

5.12 kB

	---
	license: cc-by-4.0
	language:
	- multilingual
	tags:
	- speaker-embedding
	- speaker-recognition
	- diarization
	- litert
	- tflite
	- on-device
	- soniqo
	- speech-cloud
	- speech-core
	base_model: pyannote/wespeaker-voxceleb-resnet34-LM
	library_name: litert
	pipeline_tag: audio-classification
	---

	# WeSpeaker ResNet34-LM — LiteRT

	Speaker embedding for speaker identification and diarization clustering.

	> Part of the [soniqo.audio](https://soniqo.audio) speech toolkit —
	> an open, runtime-portable stack for speech AI. This bundle is the
	> LiteRT export, designed to plug into the abstract interfaces in
	> [`speech-core`](https://github.com/soniqo/speech-core) (C++ voice-agent
	> orchestration library). Browse all LiteRT bundles in the
	> [soniqo LiteRT collection](https://huggingface.co/collections/soniqo/litert-6a08268e11d5a47d7aacc02b).

	## Use cases on soniqo.audio

	- [Meeting transcription](https://soniqo.audio/transcription/)

	256-dim speaker embedding network for Android, ported from
	`pyannote/wespeaker-voxceleb-resnet34-LM`.

	## Model

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| ResNet34 + stats pooling + linear projection \|
	\| Parameters \| ~6.6 M \|
	\| Format \| LiteRT (TFLite) \|
	\| Quantization \| float32 \|
	\| Sample rate \| 16 000 Hz \|
	\| Input \| 80-bin kaldi-style mel fbank features (T frames) \|
	\| Output \| L2-normalized 256-dim embedding \|

	## Files

	\| File \| Size \| Description \|
	\|---\|---\|---\|
	\| `wespeaker-resnet34.tflite` \| 25.4 MB \| Full model, FP32 \|
	\| `config.json` \| 1 KB \| Fbank spec + I/O signature \|

	## Why fbank-as-input

	pyannote's kaldi fbank implementation uses `torch.hamming_window` and
	`aten._fft_r2c`, neither of which has a lowering in litert-torch. We
	export only the ResNet34 portion; the caller computes the 80-bin fbank
	features on-device. This matches the standard mobile speaker-embedding
	pattern and keeps the tflite graph free of FFT ops.

	### Fbank parameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| `num_mel_bins` \| 80 \|
	\| `frame_length` \| 25 ms \|
	\| `frame_shift` \| 10 ms \|
	\| `window_type` \| hamming \|
	\| `dither` \| 0.0 \|
	\| `use_energy` \| false \|

	The reference implementation is `torchaudio.compliance.kaldi.fbank` with
	those arguments. The model internally applies `features - mean(features, dim=1)`
	centering so the caller may pass raw (uncentered) fbank output.

	## Signature

	```
	Inputs:
	fbank [1, T, 80] float32 Kaldi mel fbank, T=298 for 3 s @ 16 kHz

	Outputs:
	embedding [1, 256] float32 L2-normalized speaker embedding
	```

	## Parity

	Verified `max diff = 4.2e-07` vs the upstream pyannote model's full forward
	on a random 3-second waveform (with kaldi fbank features computed
	externally).

	## Usage

	```kotlin
	// Compute 80-bin kaldi fbank features on-device with your preferred library
	val fbank = kaldiFbank(audio, melBins = 80, frameLengthMs = 25, frameShiftMs = 10)

	val model = Interpreter(loadModelFile("wespeaker-resnet34.tflite"))
	val embedding = FloatArray(256)
	model.run(fbank, embedding)
	```

	## Source

	Upstream: [pyannote/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM)
	(CC BY 4.0, gated — accept the license on the upstream page).

	## Links

	- [speech-android](https://github.com/soniqo/speech-android) — Android SDK
	- [soniqo.audio](https://soniqo.audio) — website
	- [blog](https://soniqo.audio/blog) — blog

	## Ecosystem

	- [soniqo.audio](https://soniqo.audio) — use-case explorer (transcription, voice cloning, live ASR, voice agents).
	- [speech-core](https://github.com/soniqo/speech-core) — C++ orchestration library for voice agents. Abstract `STTInterface` / `TTSInterface` / `VADInterface` / `EnhancerInterface`; LiteRT implementations plug straight into the interfaces.
	- [speech-swift](https://github.com/soniqo/speech-swift) — Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
	- [speech-android](https://github.com/soniqo/speech-android) — Android SDK consuming on-device LiteRT bundles.

	## Other LiteRT models in this collection

	ASR / Transcription

	- [Parakeet TDT 0.6B v3 — LiteRT (INT8)](https://huggingface.co/soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8)
	- [Nemotron Speech Streaming 0.6B — LiteRT](https://huggingface.co/soniqo/Nemotron-Speech-Streaming-LiteRT)
	- [Omnilingual ASR CTC 300M — LiteRT](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT)
	- [Omnilingual ASR CTC 300M — LiteRT (INT8)](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8)
	- [Qwen3 ASR 0.6B Encoder — LiteRT (INT8)](https://huggingface.co/soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8)

	VAD / Diarization

	- [Silero VAD v5 — LiteRT](https://huggingface.co/soniqo/Silero-VAD-v5-LiteRT)
	- [Pyannote Segmentation 3.0 — LiteRT](https://huggingface.co/soniqo/Pyannote-Segmentation-LiteRT)

	TTS / Voice Cloning

	- [VoxCPM2 — LiteRT (INT8)](https://huggingface.co/soniqo/VoxCPM2-LiteRT-INT8)

	## License

	This bundle inherits the upstream model license (cc-by-4.0). See the
	linked `base_model` repository for the full terms.