Qwen3.5-122B-A10B โ Gutenberg (K_G) Quants
Quantizations of Qwen3.5-122B-A10B using the Gutenberg quantization strategy.
Available Quants
| Quant | Size | BPW |
|---|---|---|
| K_G_6.00 | 85.5 GiB | 6.02 |
| K_G_5.00 | 71.3 GiB | 5.02 |
| K_G_4.50 | 64.3 GiB | 4.52 |
| K_G_4.00 | 57.2 GiB | 4.02 |
| K_G_3.50 | 50.0 GiB | 3.51 |
| K_G_3.00 | 42.9 GiB | 3.02 |
| K_G_2.50 | 35.4 GiB | 2.49 |
KLD Comparison vs Unsloth UD Quants
Measured against Q8_K_XL reference logits. Lower KLD = closer to source model quality.
| Model | Size | BPW | KLD | Same Top P |
|---|---|---|---|---|
| UD-Q6_K_XL | 104.7 GiB | 7.36 | 0.002771 | 96.55% |
| K_G_6.00 | 85.5 GiB | 6.02 | 0.003026 | 96.55% |
| UD-Q5_K_XL | 85.6 GiB | 6.02 | 0.003329 | 96.34% |
| K_G_5.00 | 71.3 GiB | 5.02 | 0.004002 | 96.14% |
| UD-Q4_K_XL | 71.7 GiB | 5.05 | 0.004898 | 95.70% |
| K_G_4.50 | 64.3 GiB | 4.52 | 0.005178 | 95.68% |
| K_G_4.00 | 57.2 GiB | 4.02 | 0.006769 | 95.33% |
| K_G_3.50 | 50.0 GiB | 3.51 | 0.010662 | 94.24% |
| UD-Q3_K_XL | 53.1 GiB | 3.73 | 0.014053 | 93.16% |
| K_G_3.00 | 42.9 GiB | 3.02 | 0.018017 | 92.94% |
| UD-IQ2_XXS | 34.1 GiB | 2.40 | 0.056205 | 87.12% |
| K_G_2.50 | 35.4 GiB | 2.49 | 0.034715 | 90.48% |
What is Gutenberg?
Gutenberg uses KLD sensitivity data to allocate quantization precision where it matters most. Instead of applying uniform quantization, each expert tensor is ranked by its measured impact on output quality, then assigned to one of three tiers (+1, base, or -1 quant level) within a BPW budget. Non-expert tensors are kept at Q8_0.
KLD measurements show improved output fidelity compared to standard quants at equivalent model sizes.
Compatibility
Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.
- Downloads last month
- 7,045
We're not able to determine the quantization variants.
Model tree for Goldkoron/Qwen3.5-122B-A10B
Base model
Qwen/Qwen3.5-122B-A10B