Qwen3-TTS 1.7B VoiceDesign (ONNX)
ONNX export of Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign for inference with ONNX Runtime. No PyTorch required at inference time.
Both FP32 and INT4 (weight-only, RTN) variants are included.
Exported and maintained by WaveKat as part of the wavekat-tts voice pipeline.
Quick Start
pip install -r requirements.txt
# FP32
python generate_onnx.py --text "Give every small business the voice of a big one." \
--instruct "Speak in a warm and friendly female voice" \
-o output_fp32.wav
# INT4 (~4x smaller, faster)
python generate_onnx.py --variant int4 \
--text "Give every small business the voice of a big one." \
--instruct "Speak in a warm and friendly female voice" \
-o output_int4.wav
# Chinese
python generate_onnx.py --variant int4 --lang chinese \
--text "่ฎฉๆฏไธๅฎถๅฐไผไธ๏ผ้ฝๆฅๆๅคงไผไธ็ๅฃฐ้ณใ" \
--instruct "Speak in a warm and professional female voice" \
-o output_zh.wav
Model Architecture
Qwen3-TTS is a three-stage autoregressive pipeline:
Text --> [Tokenizer + Embedding Construction] --> inputs_embeds
|
v
[Talker LM] 28 layers, 2048 hidden
predicts codebook group 0
|
v
[Code Predictor] 5 layers, 1024 hidden
predicts groups 1-15
|
v
[Vocoder] single forward pass
16 codebook groups --> 24kHz waveform
The pipeline is split into 4 ONNX models:
| Model | Description | FP32 Size | INT4 Size |
|---|---|---|---|
talker_prefill.onnx |
Full sequence prefill with KV cache output | 5.3 GB | 1.4 GB |
talker_decode.onnx |
Single-step decode with KV cache | 5.3 GB | 1.4 GB |
code_predictor.onnx |
Predict codebook groups 1-15 | 440 MB | 322 MB |
vocoder.onnx |
Codes to 24kHz waveform | 876 MB | 558 MB |
Repository Structure
.
โโโ config.json # Model config (dimensions, token IDs, language map)
โโโ tokenizer/ # Text tokenizer (vocab, merges, config)
โโโ embeddings/ # Pre-extracted embedding weights (.npy)
โโโ fp32/ # FP32 ONNX models
โ โโโ talker_prefill.onnx
โ โโโ talker_decode.onnx
โ โโโ code_predictor.onnx
โ โโโ vocoder.onnx
โโโ int4/ # INT4 weight-only quantized models
โ โโโ talker_prefill.onnx
โ โโโ talker_decode.onnx
โ โโโ code_predictor.onnx
โ โโโ vocoder.onnx
โโโ generate_onnx.py # Reference ONNX-only inference script
โโโ requirements.txt # Inference dependencies
Supported Languages
English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian.
Reproducing the Export
The export scripts are in the wavekat-tts repository:
cd tools/qwen3-tts-onnx
pip install -r requirements.txt
# Export FP32, validate, and quantize INT4
make all
About WaveKat
WaveKat builds open-source voice pipeline components in Rust. This ONNX export is maintained as part of wavekat-tts, which provides unified TTS inference across multiple backends.
Acknowledgements
- Qwen3-TTS by the Qwen team at Alibaba Cloud
- Downloads last month
- 124
Model tree for wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign