Parakeet TDT 0.6B v3 (MLX, BF16)

NVIDIA Parakeet TDT v3 automatic speech recognition model in MLX BF16 SafeTensors format for Apple Silicon. This is the reference BF16 checkpoint โ€” see quantized variants for reduced memory and faster inference:

Benchmark Results (M3 Max, 64GB)

Variant Comparison

Variant Size WER (LibriSpeech) WER (TED-LIUM) RTFx Peak Memory
BF16 1,254 MB 0.82% 15.1% 73x 3,002 MB
INT8 755 MB 0.82% 15.1% 95x 1,268 MB
INT4 489 MB 0.82% 15.5% 98x 1,003 MB
  • LibriSpeech test-clean: 50 samples, studio-quality read speech
  • TED-LIUM: 8 TED talks (60s segments), real-world acoustics

RTFx vs Audio Duration

5s 15s 30s 60s 120s
59x 91x 98x 103x 99x

Quantization Strategy

Encoder-only mixed-precision: the Conformer encoder (~85% of parameters) is quantized while the decoder and joint network remain BF16. This preserves decoder precision for rare words and punctuation.

Variant Size Reduction Speed Gain Memory Reduction WER Impact
INT8 -40% +30% -58% None
INT4 -61% +34% -67% +0.4pp on real speech

Usage

Install:

Model Details

  • Architecture: Conformer encoder + TDT (Token-and-Duration Transducer) decoder
  • Parameters: 627M
  • Languages: 25 (SentencePiece tokenizer)
  • Sample rate: 16 kHz
  • Precision: BF16 (optimized for Apple Silicon)

Origin

Weights from mlx-community/parakeet-tdt-0.6b-v3, converted from NVIDIA's official nvidia/parakeet-tdt-0.6b-v3.

Part of the Sonic Speech model collection for the Sonic local-first voice AI project.

Downloads last month
62
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sonic-speech/parakeet-tdt-0.6b-v3

Finetuned
(35)
this model
Finetunes
2 models