Qwen3-ASR GGUF

GGUF-quantized weights for Qwen3-ASR, a Rust inference engine for the Qwen3 audio speech recognition model built with candle.

Files

File Size Quant Accuracy (short samples)
qwen3_asr_0.6b_q8_0.gguf 966 MB Q8_0 2/2 โœ“
qwen3_asr_0.6b_q4_0.gguf 521 MB Q4_0 1/2 (expected 4-bit degradation)
qwen3_asr_0.6b_q4_k.gguf 577 MB Q4_K 1/2 (expected 4-bit degradation)
qwen3_asr_1.7b_q8_0.gguf 2.3 GB Q8_0 2/2 โœ“
qwen3_asr_1.7b_q4_k.gguf 1.2 GB Q4_K 2/2 โœ“

Usage

cargo run --release -- --gguf qwen3_asr_1.7b_q4_k.gguf

See qwen3-asr-rs for full documentation.

Quantization Notes

  • Q8_0: 8-bit symmetric, block=32. Closest to original BF16 quality.
  • Q4_0: 4-bit symmetric, block=32. Split-half nibble packing (ggml-compatible).
  • Q4_K: 4-bit K-quant, super-block=256 with per-sub-block scale/min. Tensors with shape[-1] % 256 โ‰  0 fall back to Q8_0 automatically (affects 0.6B audio encoder, d_model=896).

The 1.7B Q4_K model retains full accuracy (2/2) while reducing size from 2.3 GB to 1.2 GB.

Downloads last month
507
GGUF
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support