These are some quants I use depending on the memory availability. I also added nvfp4 in the hope for custom kernels emerging in the future. I recommend the Q3K-IQ4XS and IQ4XS-Q5K quants.

KLD

I need to use the Q8 version due to hardware restrictions for running the kld baseline. However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.

Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.

Provider	Quant	Size GB	Mean PPL	Mean KLD	Same Top p
KS	Q8		7.0266 +/- 0.05210	baseline	baseline
KS	IQ4XS-Q5K	135.5			90.720 ± 0.077 %
KS	IQ4XS	123.8	7.153799 ± 0.053213	0.086127 ± 0.001029	89.425 ± 0.082 %
KS	IQ4XS-Q4K	126.1			89.205 ± 0.083 %
KS	NVFP4	130.8	7.177182 ± 0.053324	0.105053 ± 0.001034	88.154 ± 0.086 %
unsloth	UD-Q4_K_XL	141			86.990 ± 0.090 %
KS	Q3K-IQ4XS	108.6	7.297092 ± 0.054489	0.140361 ± 0.001216	86.387 ± 0.091 %

Downloads last month: 2,491

GGUF

Model size

229B params

Architecture

minimax-m2

Hardware compatibility

4-bit

View +3 variants

Model tree for krampenschiesser/MiniMax-M2.7-GGUF

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(72)

this model