Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK

Pre-exported ExecuTorch .pte files for Voxtral-Mini-4B-Realtime-2602 with XNNPACK backend (CPU). Supports both offline and streaming transcription on any platform with a CPU — no GPU required.

For the Metal (Apple GPU) variant, see Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal.

Installation

Install ExecuTorch from source:

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && ./install_executorch.sh

Build the runner with XNNPACK:

cd ~/executorch && make voxtral_realtime-cpu

Download

pip install huggingface_hub
huggingface-cli download younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK --local-dir ~/voxtral_xnnpack

Run

Offline transcription

cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor.pte \
    --audio_path ~/voxtral_xnnpack/poem.wav

Streaming transcription (from file)

cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
    --audio_path ~/voxtral_xnnpack/poem.wav \
    --streaming

Live microphone (macOS)

ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error pipe:1 | \
  cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_xnnpack/model-xnnpack-8da4w-streaming.pte \
    --tokenizer_path ~/voxtral_xnnpack/tekken.json \
    --preprocessor_path ~/voxtral_xnnpack/preprocessor-streaming.pte \
    --mic

Performance (Apple Silicon Mac, 20s audio)

Mode	TTFT	Gen Tokens	Gen Rate (tok/s)	Total Inference
Offline	6.698s	377	26.94	20.690s
Streaming	0.096s	261	11.01	23.798s

Export Commands

These models were exported with:

# Offline
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend xnnpack \
    --output-dir ./voxtral_rt_xnnpack_offline \
    --qlinear-encoder 8da4w \
    --qlinear 8da4w \
    --qembedding 8w

# Streaming
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend xnnpack \
    --streaming \
    --output-dir ./voxtral_rt_xnnpack_streaming \
    --qlinear-encoder 8da4w \
    --qlinear 8da4w \
    --qembedding 8w

More Info

Downloads last month: 6

Model tree for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Finetuned

(13)

this model

Paper for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK

Voxtral Realtime

Paper • 2602.11298 • Published Feb 11 • 25