Parakeet TDT 0.6B v3 — MLX

MLX safetensors conversion of nvidia/parakeet-tdt-0.6b-v3 for Apple Silicon.

Architecture

Encoder: Conformer (1024 hidden, pre-encoding convolutions + transformer layers)
Decoder: TDT Transducer (predictor LSTM + joint network, 5 duration classes: 0-4)
Vocabulary: 1025 tokens (SentencePiece)
Parameters: ~0.6B
Audio input: 16 kHz mono, 128 mel bins

File	Description
`model.safetensors`	All weights (encoder + predictor + joint), float32, ~2.3 GB
`config.json`	Full NeMo model configuration
`tokenizer.model`	SentencePiece tokenizer
`tokenizer.vocab`	Tokenizer vocabulary
`vocab.txt`	Text vocabulary

Weights converted from NeMo PyTorch format to MLX safetensors
Convolution weights use MLX layout (OHWI for 2D, OKI for 1D) — not directly compatible with PyTorch
CTC head, preprocessor, spec augmentation, and loss weights are excluded (inference only)

CC-BY-4.0 — following the upstream nvidia/parakeet-tdt-0.6b-v3 license. Attribution to NVIDIA is required.

MLX

Hardware compatibility

Quantized

Base model

Finetuned

(35)

this model