Qwen3-ASR GGUF

GGUF-quantized weights for Qwen3-ASR, a Rust inference engine for the Qwen3 audio speech recognition model built with candle.

Files

File	Size	Quant	Accuracy (short samples)
`qwen3_asr_0.6b_q8_0.gguf`	966 MB	Q8_0	2/2 ✓
`qwen3_asr_0.6b_q4_0.gguf`	521 MB	Q4_0	1/2 (expected 4-bit degradation)
`qwen3_asr_0.6b_q4_k.gguf`	577 MB	Q4_K	1/2 (expected 4-bit degradation)
`qwen3_asr_1.7b_q8_0.gguf`	2.3 GB	Q8_0	2/2 ✓
`qwen3_asr_1.7b_q4_k.gguf`	1.2 GB	Q4_K	2/2 ✓

cargo run --release -- --gguf qwen3_asr_1.7b_q4_k.gguf

See qwen3-asr-rs for full documentation.

Q8_0: 8-bit symmetric, block=32. Closest to original BF16 quality.
Q4_0: 4-bit symmetric, block=32. Split-half nibble packing (ggml-compatible).
Q4_K: 4-bit K-quant, super-block=256 with per-sub-block scale/min. Tensors with shape[-1] % 256 ≠ 0 fall back to Q8_0 automatically (affects 0.6B audio encoder, d_model=896).

The 1.7B Q4_K model retains full accuracy (2/2) while reducing size from 2.3 GB to 1.2 GB.

GGUF

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support