Parakeet TDT 0.6B v3 (MLX, BF16)

NVIDIA Parakeet TDT v3 automatic speech recognition model in MLX BF16 SafeTensors format for Apple Silicon. This is the reference BF16 checkpoint — see quantized variants for reduced memory and faster inference:

sonic-speech/parakeet-tdt-0.6b-v3-int8 — Encoder INT8 (recommended)
sonic-speech/parakeet-tdt-0.6b-v3-int4 — Encoder INT4 (lite, for 8GB Macs)

Benchmark Results (M3 Max, 64GB)

Variant Comparison

Variant	Size	WER (LibriSpeech)	WER (TED-LIUM)	RTFx	Peak Memory
BF16	1,254 MB	0.82%	15.1%	73x	3,002 MB
INT8	755 MB	0.82%	15.1%	95x	1,268 MB
INT4	489 MB	0.82%	15.5%	98x	1,003 MB

LibriSpeech test-clean: 50 samples, studio-quality read speech
TED-LIUM: 8 TED talks (60s segments), real-world acoustics

RTFx vs Audio Duration

5s	15s	30s	60s	120s
59x	91x	98x	103x	99x

Quantization Strategy

Encoder-only mixed-precision: the Conformer encoder (~85% of parameters) is quantized while the decoder and joint network remain BF16. This preserves decoder precision for rare words and punctuation.

Variant	Size Reduction	Speed Gain	Memory Reduction	WER Impact
INT8	-40%	+30%	-58%	None
INT4	-61%	+34%	-67%	+0.4pp on real speech

Usage

Install:

Model Details

Architecture: Conformer encoder + TDT (Token-and-Duration Transducer) decoder
Parameters: 627M
Languages: 25 (SentencePiece tokenizer)
Sample rate: 16 kHz
Precision: BF16 (optimized for Apple Silicon)

Origin

Weights from mlx-community/parakeet-tdt-0.6b-v3, converted from NVIDIA's official nvidia/parakeet-tdt-0.6b-v3.

Part of the Sonic Speech model collection for the Sonic local-first voice AI project.

Downloads last month: 62

MLX

Hardware compatibility

Quantized

Model tree for sonic-speech/parakeet-tdt-0.6b-v3

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

(35)

this model

Finetunes

2 models