MiniMax-M2.7 — Gutenberg Quants

Quantizations of MiniMax-M2.7 using the Gutenberg (K_G) quantization strategy.

Available Quants

Quant	Size	BPW	Mean KLD	Same Top P
K_G_5.00	133.1 GiB	5.00	0.022412	92.447%
K_G_4.50	119.7 GiB	4.50	0.029416	91.311%
K_G_4.00	106.4 GiB	4.00	0.044050	89.497%
K_G_3.50	93.1 GiB	3.50	0.061226	87.641%
K_G_3.00	79.9 GiB	3.00	0.098738	84.454%
K_G_2.50	66.6 GiB	2.50	0.172875	80.034%

KLD and Same Top P measured against Q6_K expert reference logits (8192 context, 10 chunks).

vs Standard Quants (unsloth)

Gutenberg	BPW	KLD	Standard (unsloth)	BPW	KLD
K_G_2.50	2.50	0.172875	UD-IQ2_M	2.45	0.191059
K_G_3.00	3.00	0.098738	UD-IQ3_XXS	2.80	0.119762
K_G_3.50	3.50	0.061226	UD-Q3_K_M	3.54	0.063647
K_G_4.00	4.00	0.044050	UD-IQ4_XS	3.79	0.051081
K_G_5.00	5.00	0.022412	UD-Q4_K_M	4.90	0.024529

Why Gutenberg?

Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.

The result is significantly better quality than standard quants at the same model size.

Compatibility

Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.

Downloads last month: -

GGUF

Model size

229B params

Architecture

minimax-m2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goldkoron/MiniMax-M2.7

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(45)

this model