---
license: cc-by-4.0
language:
  - multilingual
tags:
  - speaker-embedding
  - speaker-recognition
  - diarization
  - litert
  - tflite
  - on-device
  - soniqo
  - speech-cloud
  - speech-core
base_model: pyannote/wespeaker-voxceleb-resnet34-LM
library_name: litert
pipeline_tag: audio-classification
---

# WeSpeaker ResNet34-LM — LiteRT

Speaker embedding for speaker identification and diarization clustering.

> Part of the [**soniqo.audio**](https://soniqo.audio) speech toolkit —
> an open, runtime-portable stack for speech AI. This bundle is the
> **LiteRT** export, designed to plug into the abstract interfaces in
> [`speech-core`](https://github.com/soniqo/speech-core) (C++ voice-agent
> orchestration library). Browse all LiteRT bundles in the
> [**soniqo LiteRT collection**](https://huggingface.co/collections/soniqo/litert-6a08268e11d5a47d7aacc02b).

## Use cases on soniqo.audio

- [Meeting transcription](https://soniqo.audio/transcription/)

256-dim speaker embedding network for Android, ported from
`pyannote/wespeaker-voxceleb-resnet34-LM`.

## Model

| Property | Value |
|---|---|
| Architecture | ResNet34 + stats pooling + linear projection |
| Parameters | ~6.6 M |
| Format | LiteRT (TFLite) |
| Quantization | float32 |
| Sample rate | 16 000 Hz |
| Input | 80-bin kaldi-style mel fbank features (T frames) |
| Output | L2-normalized 256-dim embedding |

## Files

| File | Size | Description |
|---|---|---|
| `wespeaker-resnet34.tflite` | 25.4 MB | Full model, FP32 |
| `config.json` | 1 KB | Fbank spec + I/O signature |

## Why fbank-as-input

pyannote's kaldi fbank implementation uses `torch.hamming_window` and
`aten._fft_r2c`, neither of which has a lowering in litert-torch. We
export only the ResNet34 portion; the caller computes the 80-bin fbank
features on-device. This matches the standard mobile speaker-embedding
pattern and keeps the tflite graph free of FFT ops.

### Fbank parameters

| Parameter | Value |
|---|---|
| `num_mel_bins` | 80 |
| `frame_length` | 25 ms |
| `frame_shift` | 10 ms |
| `window_type` | hamming |
| `dither` | 0.0 |
| `use_energy` | false |

The reference implementation is `torchaudio.compliance.kaldi.fbank` with
those arguments. The model internally applies `features - mean(features, dim=1)`
centering so the caller may pass raw (uncentered) fbank output.

## Signature

```
Inputs:
  fbank         [1, T, 80]   float32   Kaldi mel fbank, T=298 for 3 s @ 16 kHz

Outputs:
  embedding     [1, 256]     float32   L2-normalized speaker embedding
```

## Parity

Verified `max diff = 4.2e-07` vs the upstream pyannote model's full forward
on a random 3-second waveform (with kaldi fbank features computed
externally).

## Usage

```kotlin
// Compute 80-bin kaldi fbank features on-device with your preferred library
val fbank = kaldiFbank(audio, melBins = 80, frameLengthMs = 25, frameShiftMs = 10)

val model = Interpreter(loadModelFile("wespeaker-resnet34.tflite"))
val embedding = FloatArray(256)
model.run(fbank, embedding)
```

## Source

Upstream: [pyannote/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM)
(CC BY 4.0, gated — accept the license on the upstream page).

## Links

- [speech-android](https://github.com/soniqo/speech-android) — Android SDK
- [soniqo.audio](https://soniqo.audio) — website
- [blog](https://soniqo.audio/blog) — blog

## Ecosystem

- [**soniqo.audio**](https://soniqo.audio) — use-case explorer (transcription, voice cloning, live ASR, voice agents).
- [**speech-core**](https://github.com/soniqo/speech-core) — C++ orchestration library for voice agents. Abstract `STTInterface` / `TTSInterface` / `VADInterface` / `EnhancerInterface`; LiteRT implementations plug straight into the interfaces.
- [**speech-swift**](https://github.com/soniqo/speech-swift) — Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- [**speech-android**](https://github.com/soniqo/speech-android) — Android SDK consuming on-device LiteRT bundles.

## Other LiteRT models in this collection

**ASR / Transcription**

- [Parakeet TDT 0.6B v3 — LiteRT (INT8)](https://huggingface.co/soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8)
- [Nemotron Speech Streaming 0.6B — LiteRT](https://huggingface.co/soniqo/Nemotron-Speech-Streaming-LiteRT)
- [Omnilingual ASR CTC 300M — LiteRT](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT)
- [Omnilingual ASR CTC 300M — LiteRT (INT8)](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8)
- [Qwen3 ASR 0.6B Encoder — LiteRT (INT8)](https://huggingface.co/soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8)

**VAD / Diarization**

- [Silero VAD v5 — LiteRT](https://huggingface.co/soniqo/Silero-VAD-v5-LiteRT)
- [Pyannote Segmentation 3.0 — LiteRT](https://huggingface.co/soniqo/Pyannote-Segmentation-LiteRT)

**TTS / Voice Cloning**

- [VoxCPM2 — LiteRT (INT8)](https://huggingface.co/soniqo/VoxCPM2-LiteRT-INT8)

## License

This bundle inherits the upstream model license (**cc-by-4.0**). See the
linked `base_model` repository for the full terms.