Quantization was performed using exllama3 v0.0.29.
Quantization Notes
Inf/NaN values in calibration data
Some experts in the model produce inf values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The lm_head layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).
This causes Cholesky decomposition to fail during quantization, as the Hessian matrix is no longer positive-definite. ExLlamaV3 does not handle this case gracefully — quantization crashes after exhausting retry attempts. A local patch was applied to fall back to uncalibrated quantization for the affected tensors. Given that only a handful of experts out of 256 in the last layer are affected, the impact on output quality is expected to be minimal.
This appears to be a property of the model weights themselves, not a bug in the quantizer.
Model tree for NeuroSenko/MiniMax-M2.7-exl3
Base model
MiniMaxAI/MiniMax-M2.7