Voxtral Realtime
Paper • 2602.11298 • Published • 25
Pre-exported ExecuTorch .pte files
for Voxtral-Mini-4B-Realtime-2602
with XNNPACK backend (CPU). Supports both offline and streaming transcription
on any platform with a CPU — no GPU required.
For the Metal (Apple GPU) variant, see Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal.
Install ExecuTorch from source:
git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && ./install_executorch.sh
Build the runner with XNNPACK:
cd ~/executorch && make voxtral_realtime-cpu
pip install huggingface_hub
huggingface-cli download younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK --local-dir ~/voxtral_xnnpack
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_xnnpack/model-xnnpack-8da4w.pte \
--tokenizer_path ~/voxtral_xnnpack/tekken.json \
--preprocessor_path ~/voxtral_xnnpack/preprocessor.pte \
--audio_path ~/voxtral_xnnpack/poem.wav
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
--tokenizer_path ~/voxtral_xnnpack/tekken.json \
--preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
--audio_path ~/voxtral_xnnpack/poem.wav \
--streaming
ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error pipe:1 | \
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
--tokenizer_path ~/voxtral_xnnpack/tekken.json \
--preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
--mic
| Mode | TTFT | Gen Tokens | Gen Rate (tok/s) | Total Inference |
|---|---|---|---|---|
| Offline | 6.698s | 377 | 26.94 | 20.690s |
| Streaming | 0.096s | 261 | 11.01 | 23.798s |
These models were exported with:
# Offline
python examples/models/voxtral_realtime/export_voxtral_rt.py \
--model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
--backend xnnpack \
--output-dir ./voxtral_rt_xnnpack_offline \
--qlinear-encoder 8da4w \
--qlinear 8da4w \
--qembedding 8w
# Streaming
python examples/models/voxtral_realtime/export_voxtral_rt.py \
--model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
--backend xnnpack \
--streaming \
--output-dir ./voxtral_rt_xnnpack_streaming \
--qlinear-encoder 8da4w \
--qlinear 8da4w \
--qembedding 8w
Base model
mistralai/Ministral-3-3B-Base-2512