Qwen3-ASR GGUF
GGUF-quantized weights for Qwen3-ASR, a Rust inference engine for the Qwen3 audio speech recognition model built with candle.
Files
| File | Size | Quant | Accuracy (short samples) |
|---|---|---|---|
qwen3_asr_0.6b_q8_0.gguf |
966 MB | Q8_0 | 2/2 โ |
qwen3_asr_0.6b_q4_0.gguf |
521 MB | Q4_0 | 1/2 (expected 4-bit degradation) |
qwen3_asr_0.6b_q4_k.gguf |
577 MB | Q4_K | 1/2 (expected 4-bit degradation) |
qwen3_asr_1.7b_q8_0.gguf |
2.3 GB | Q8_0 | 2/2 โ |
qwen3_asr_1.7b_q4_k.gguf |
1.2 GB | Q4_K | 2/2 โ |
Usage
cargo run --release -- --gguf qwen3_asr_1.7b_q4_k.gguf
See qwen3-asr-rs for full documentation.
Quantization Notes
- Q8_0: 8-bit symmetric, block=32. Closest to original BF16 quality.
- Q4_0: 4-bit symmetric, block=32. Split-half nibble packing (ggml-compatible).
- Q4_K: 4-bit K-quant, super-block=256 with per-sub-block scale/min. Tensors with
shape[-1] % 256 โ 0fall back to Q8_0 automatically (affects 0.6B audio encoder, d_model=896).
The 1.7B Q4_K model retains full accuracy (2/2) while reducing size from 2.3 GB to 1.2 GB.
- Downloads last month
- 507
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support