MiniMax-M2.7 APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of MiniMax-M2.7.

Brought to you by the LocalAI team | APEX Project | Technical Report

Status: Re-quantization in progress. The previous quants had a conversion bug (our direct FP8→BF16 path produced broken logits). We've identified the issue — using unsloth's pre-converted BF16 GGUF as the source instead — and are re-quantizing. Working quants will be back shortly.

About APEX

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration.

See the APEX project for full details, technical report, and scripts.

Architecture

  • Model: MiniMax-M2.7 (MiniMaxM2)
  • Layers: 62
  • Experts: 256 routed (8 active per token)
  • Total Parameters: ~228B
  • Active Parameters: ~10B per token

Credits

APEX is brought to you by the LocalAI team.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/MiniMax-M2.7-APEX-GGUF

Finetuned
(16)
this model

Collection including mudler/MiniMax-M2.7-APEX-GGUF