Quantization was performed using exllama3 v0.0.29.

3.0bpw
4.0bpw
5.0bpw
6.0bpw

Quantization Notes

Inf/NaN values in calibration data

Some experts in the model produce inf values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The lm_head layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).

This causes Cholesky decomposition to fail during quantization, as the Hessian matrix is no longer positive-definite. ExLlamaV3 does not handle this case gracefully — quantization crashes after exhausting retry attempts. A local patch was applied to fall back to uncalibrated quantization for the affected tensors. Given that only a handful of experts out of 256 in the last layer are affected, the impact on output quality is expected to be minimal.

This appears to be a property of the model weights themselves, not a bug in the quantizer.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/MiniMax-M2.7-exl3

Quantized
(52)
this model