Qwen3-TTS-12Hz-0.6B-Base — MXFP4 (MLX)
MXFP4 quantized version of Qwen/Qwen3-TTS-12Hz-0.6B-Base for Apple Silicon.
Converted using mlx-audio with native MXFP4 (Microscaling Float 4-bit, OCP MX Spec).
Benchmark (M2 Ultra 128GB)
| Quant | Size | Avg Time (3 runs) |
|---|---|---|
| 8bit | 1.9 GB | 8.50s |
| mxfp4 | 1.6 GB | 7.77s (~8.6% faster) |
Audio quality verified: voice cloning works, long German texts direct speech render cleanly.
Conversion
python -m mlx_audio.convert \
--hf-path Qwen/Qwen3-TTS-12Hz-0.6B-Base \
--mlx-path ./Qwen3-TTS-0.6B-Base-mxfp4 \
--quantize \
--q-mode mxfp4
Usage
from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio
model = load_model("mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4")
generate_audio(
model=model,
text="Hello, this is a test.",
ref_audio="reference.wav",
temperature=0.3,
repetition_penalty=1.1,
)
CLI
python -m mlx_audio.tts.generate \
--model mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4 \
--text "Dies ist ein Test." \
--ref_audio reference.wav \
--ref_text "Transkript der Reference Audio" \
--temperature 0.3 \
--repetition_penalty 1.1 \
--play
- Downloads last month
- 207
Model size
0.5B params
Tensor type
BF16
·
U8 ·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4
Base model
Qwen/Qwen3-TTS-12Hz-0.6B-Base