Parakeet TDT 0.6B v3 (MLX, BF16)
NVIDIA Parakeet TDT v3 automatic speech recognition model in MLX BF16 SafeTensors format for Apple Silicon. This is the reference BF16 checkpoint โ see quantized variants for reduced memory and faster inference:
sonic-speech/parakeet-tdt-0.6b-v3-int8โ Encoder INT8 (recommended)sonic-speech/parakeet-tdt-0.6b-v3-int4โ Encoder INT4 (lite, for 8GB Macs)
Benchmark Results (M3 Max, 64GB)
Variant Comparison
| Variant | Size | WER (LibriSpeech) | WER (TED-LIUM) | RTFx | Peak Memory |
|---|---|---|---|---|---|
| BF16 | 1,254 MB | 0.82% | 15.1% | 73x | 3,002 MB |
| INT8 | 755 MB | 0.82% | 15.1% | 95x | 1,268 MB |
| INT4 | 489 MB | 0.82% | 15.5% | 98x | 1,003 MB |
- LibriSpeech test-clean: 50 samples, studio-quality read speech
- TED-LIUM: 8 TED talks (60s segments), real-world acoustics
RTFx vs Audio Duration
| 5s | 15s | 30s | 60s | 120s |
|---|---|---|---|---|
| 59x | 91x | 98x | 103x | 99x |
Quantization Strategy
Encoder-only mixed-precision: the Conformer encoder (~85% of parameters) is quantized while the decoder and joint network remain BF16. This preserves decoder precision for rare words and punctuation.
| Variant | Size Reduction | Speed Gain | Memory Reduction | WER Impact |
|---|---|---|---|---|
| INT8 | -40% | +30% | -58% | None |
| INT4 | -61% | +34% | -67% | +0.4pp on real speech |
Usage
Install:
Model Details
- Architecture: Conformer encoder + TDT (Token-and-Duration Transducer) decoder
- Parameters: 627M
- Languages: 25 (SentencePiece tokenizer)
- Sample rate: 16 kHz
- Precision: BF16 (optimized for Apple Silicon)
Origin
Weights from mlx-community/parakeet-tdt-0.6b-v3, converted from NVIDIA's official nvidia/parakeet-tdt-0.6b-v3.
Part of the Sonic Speech model collection for the Sonic local-first voice AI project.
- Downloads last month
- 62
Hardware compatibility
Log In to add your hardware
Quantized