Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK

Pre-exported ExecuTorch .pte files for Voxtral-Mini-4B-Realtime-2602 with XNNPACK backend (CPU). Supports both offline and streaming transcription on any platform with a CPU — no GPU required.

For the Metal (Apple GPU) variant, see Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal.

Installation

Install ExecuTorch from source:

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && ./install_executorch.sh

Build the runner with XNNPACK:

cd ~/executorch && make voxtral_realtime-cpu

Download

pip install huggingface_hub
huggingface-cli download younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK --local-dir ~/voxtral_xnnpack

Run

Offline transcription

cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor.pte \
    --audio_path ~/voxtral_xnnpack/poem.wav

Streaming transcription (from file)

cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
    --audio_path ~/voxtral_xnnpack/poem.wav \
    --streaming

Live microphone (macOS)

ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error pipe:1 | \
  cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
    --mic

Performance (Apple Silicon Mac, 20s audio)

Mode TTFT Gen Tokens Gen Rate (tok/s) Total Inference
Offline 6.698s 377 26.94 20.690s
Streaming 0.096s 261 11.01 23.798s

Export Commands

These models were exported with:

# Offline
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend xnnpack \
    --output-dir ./voxtral_rt_xnnpack_offline \
    --qlinear-encoder 8da4w \
    --qlinear 8da4w \
    --qembedding 8w

# Streaming
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend xnnpack \
    --streaming \
    --output-dir ./voxtral_rt_xnnpack_streaming \
    --qlinear-encoder 8da4w \
    --qlinear 8da4w \
    --qembedding 8w

More Info

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK

Paper for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK