These are some quants I use depending on the memory availability. I also added nvfp4 in the hope for custom kernels emerging in the future. I recommend the Q3K-IQ4XS and IQ4XS-Q5K quants.
KLD
I need to use the Q8 version due to hardware restrictions for running the kld baseline. However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.
Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.
| Provider | Quant | Size GB | Mean PPL | Mean KLD | Same Top p |
|---|---|---|---|---|---|
| KS | Q8 | 7.0266 +/- 0.05210 | baseline | baseline | |
| KS | IQ4XS-Q5K | 135.5 | 90.720 Β± 0.077 % | ||
| KS | IQ4XS | 123.8 | 7.153799 Β± 0.053213 | 0.086127 Β± 0.001029 | 89.425 Β± 0.082 % |
| KS | IQ4XS-Q4K | 126.1 | 89.205 Β± 0.083 % | ||
| KS | NVFP4 | 130.8 | 7.177182 Β± 0.053324 | 0.105053 Β± 0.001034 | 88.154 Β± 0.086 % |
| unsloth | UD-Q4_K_XL | 141 | 86.990 Β± 0.090 % | ||
| KS | Q3K-IQ4XS | 108.6 | 7.297092 Β± 0.054489 | 0.140361 Β± 0.001216 | 86.387 Β± 0.091 % |
- Downloads last month
- 2,491
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for krampenschiesser/MiniMax-M2.7-GGUF
Base model
MiniMaxAI/MiniMax-M2.7