Qwen3-TTS-12Hz-0.6B-Base — MXFP4 (MLX)

MXFP4 quantized version of Qwen/Qwen3-TTS-12Hz-0.6B-Base for Apple Silicon.

Converted using mlx-audio with native MXFP4 (Microscaling Float 4-bit, OCP MX Spec).

Benchmark (M2 Ultra 128GB)

Quant Size Avg Time (3 runs)
8bit 1.9 GB 8.50s
mxfp4 1.6 GB 7.77s (~8.6% faster)

Audio quality verified: voice cloning works, long German texts direct speech render cleanly.

Conversion

python -m mlx_audio.convert \
  --hf-path Qwen/Qwen3-TTS-12Hz-0.6B-Base \
  --mlx-path ./Qwen3-TTS-0.6B-Base-mxfp4 \
  --quantize \
  --q-mode mxfp4

Usage

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio

model = load_model("mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4")
generate_audio(
    model=model,
    text="Hello, this is a test.",
    ref_audio="reference.wav",
    temperature=0.3,
    repetition_penalty=1.1,
)

CLI

python -m mlx_audio.tts.generate \
  --model mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4 \
  --text "Dies ist ein Test." \
  --ref_audio reference.wav \
  --ref_text "Transkript der Reference Audio" \
  --temperature 0.3 \
  --repetition_penalty 1.1 \
  --play
Downloads last month
207
Safetensors
Model size
0.5B params
Tensor type
BF16
·
U8
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4

Quantized
(12)
this model