Parakeet-TDT-ExecuTorch-MLX

Pre-exported ExecuTorch .pte file for Parakeet TDT 0.6B with the MLX backend on Apple Silicon.

This variant uses:

  • MLX delegate
  • bf16 activations
  • 4-bit weight-only quantization for encoder and decoder linear layers
  • group size 128

For the Metal (Apple GPU) variant, see Parakeet-TDT-ExecuTorch-Metal.

Installation

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch
make parakeet-mlx

Download

hf download younghan-meta/Parakeet-TDT-ExecuTorch-MLX --local-dir ~/parakeet_mlx

Run

cmake-out/examples/models/parakeet/parakeet_runner \
  --model_path ~/parakeet_mlx/model.pte \
  --tokenizer_path ~/parakeet_mlx/tokenizer.model \
  --audio_path /path/to/audio.wav \
  --timestamps none

Optional flags:

  • --timestamps segment for segment timestamps
  • --timestamps word for word timestamps
  • --timestamps all for token, word, and segment timestamps

Export Command

pip install "nemo_toolkit[asr]"
python examples/models/parakeet/export_parakeet_tdt.py \
    --backend mlx \
    --dtype bf16 \
    --qlinear_encoder 4w \
    --qlinear_encoder_group_size 128 \
    --qlinear 4w \
    --qlinear_group_size 128 \
    --output-dir ./parakeet_mlx

This export produces:

  • model.pte
  • tokenizer.model

No separate delegate data blob is required for MLX.

More Info

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for younghan-meta/Parakeet-TDT-ExecuTorch-MLX

Quantized
(23)
this model