Parakeet-TDT-ExecuTorch-MLX

Pre-exported ExecuTorch .pte file for Parakeet TDT 0.6B with the MLX backend on Apple Silicon.

This variant uses:

MLX delegate
bf16 activations
4-bit weight-only quantization for encoder and decoder linear layers
group size 128

For the Metal (Apple GPU) variant, see Parakeet-TDT-ExecuTorch-Metal.

Installation

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch
make parakeet-mlx

Download

hf download younghan-meta/Parakeet-TDT-ExecuTorch-MLX --local-dir ~/parakeet_mlx

Run

cmake-out/examples/models/parakeet/parakeet_runner \
  --model_path ~/parakeet_mlx/model.pte \
  --tokenizer_path ~/parakeet_mlx/tokenizer.model \
  --audio_path /path/to/audio.wav \
  --timestamps none

Optional flags:

--timestamps segment for segment timestamps
--timestamps word for word timestamps
--timestamps all for token, word, and segment timestamps

Export Command

pip install "nemo_toolkit[asr]"
python examples/models/parakeet/export_parakeet_tdt.py \
    --backend mlx \
    --dtype bf16 \
    --qlinear_encoder 4w \
    --qlinear_encoder_group_size 128 \
    --qlinear 4w \
    --qlinear_group_size 128 \
    --output-dir ./parakeet_mlx

This export produces:

model.pte
tokenizer.model

No separate delegate data blob is required for MLX.

More Info

Downloads last month: 4

Model tree for younghan-meta/Parakeet-TDT-ExecuTorch-MLX

Base model

nvidia/parakeet-tdt-0.6b-v3

Quantized

(23)

this model