Qwen3.5-122B-A10B — Gutenberg (K_G) Quants

Quantizations of Qwen3.5-122B-A10B using the Gutenberg quantization strategy.

Available Quants

Quant	Size	BPW
K_G_6.00	85.5 GiB	6.02
K_G_5.00	71.3 GiB	5.02
K_G_4.50	64.3 GiB	4.52
K_G_4.00	57.2 GiB	4.02
K_G_3.50	50.0 GiB	3.51
K_G_3.00	42.9 GiB	3.02
K_G_2.50	35.4 GiB	2.49

KLD Comparison vs Unsloth UD Quants

Measured against Q8_K_XL reference logits. Lower KLD = closer to source model quality.

Model	Size	BPW	KLD	Same Top P
UD-Q6_K_XL	104.7 GiB	7.36	0.002771	96.55%
K_G_6.00	85.5 GiB	6.02	0.003026	96.55%
UD-Q5_K_XL	85.6 GiB	6.02	0.003329	96.34%
K_G_5.00	71.3 GiB	5.02	0.004002	96.14%
UD-Q4_K_XL	71.7 GiB	5.05	0.004898	95.70%
K_G_4.50	64.3 GiB	4.52	0.005178	95.68%
K_G_4.00	57.2 GiB	4.02	0.006769	95.33%
K_G_3.50	50.0 GiB	3.51	0.010662	94.24%
UD-Q3_K_XL	53.1 GiB	3.73	0.014053	93.16%
K_G_3.00	42.9 GiB	3.02	0.018017	92.94%
UD-IQ2_XXS	34.1 GiB	2.40	0.056205	87.12%
K_G_2.50	35.4 GiB	2.49	0.034715	90.48%

What is Gutenberg?

Gutenberg uses KLD sensitivity data to allocate quantization precision where it matters most. Instead of applying uniform quantization, each expert tensor is ranked by its measured impact on output quality, then assigned to one of three tiers (+1, base, or -1 quant level) within a BPW budget. Non-expert tensors are kept at Q8_0.

KLD measurements show improved output fidelity compared to standard quants at equivalent model sizes.

Compatibility

Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.

Downloads last month: 7,045

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goldkoron/Qwen3.5-122B-A10B

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(104)

this model