Qwen3.5-122B-A10B โ€” Gutenberg (K_G) Quants

Quantizations of Qwen3.5-122B-A10B using the Gutenberg quantization strategy.

Available Quants

Quant Size BPW
K_G_6.00 85.5 GiB 6.02
K_G_5.00 71.3 GiB 5.02
K_G_4.50 64.3 GiB 4.52
K_G_4.00 57.2 GiB 4.02
K_G_3.50 50.0 GiB 3.51
K_G_3.00 42.9 GiB 3.02
K_G_2.50 35.4 GiB 2.49

KLD Comparison vs Unsloth UD Quants

Measured against Q8_K_XL reference logits. Lower KLD = closer to source model quality.

Model Size BPW KLD Same Top P
UD-Q6_K_XL 104.7 GiB 7.36 0.002771 96.55%
K_G_6.00 85.5 GiB 6.02 0.003026 96.55%
UD-Q5_K_XL 85.6 GiB 6.02 0.003329 96.34%
K_G_5.00 71.3 GiB 5.02 0.004002 96.14%
UD-Q4_K_XL 71.7 GiB 5.05 0.004898 95.70%
K_G_4.50 64.3 GiB 4.52 0.005178 95.68%
K_G_4.00 57.2 GiB 4.02 0.006769 95.33%
K_G_3.50 50.0 GiB 3.51 0.010662 94.24%
UD-Q3_K_XL 53.1 GiB 3.73 0.014053 93.16%
K_G_3.00 42.9 GiB 3.02 0.018017 92.94%
UD-IQ2_XXS 34.1 GiB 2.40 0.056205 87.12%
K_G_2.50 35.4 GiB 2.49 0.034715 90.48%

What is Gutenberg?

Gutenberg uses KLD sensitivity data to allocate quantization precision where it matters most. Instead of applying uniform quantization, each expert tensor is ranked by its measured impact on output quality, then assigned to one of three tiers (+1, base, or -1 quant level) within a BPW budget. Non-expert tensors are kept at Q8_0.

KLD measurements show improved output fidelity compared to standard quants at equivalent model sizes.

Compatibility

Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.

Downloads last month
7,045
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Goldkoron/Qwen3.5-122B-A10B

Quantized
(104)
this model