MiniMax-M2.7 APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of MiniMax-M2.7.

Brought to you by the LocalAI team | APEX Project | Technical Report

Status: Re-quantization in progress. The previous quants had a conversion bug (our direct FP8→BF16 path produced broken logits). We've identified the issue — using unsloth's pre-converted BF16 GGUF as the source instead — and are re-quantizing. Working quants will be back shortly.

About APEX

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration.

See the APEX project for full details, technical report, and scripts.

Architecture

Model: MiniMax-M2.7 (MiniMaxM2)
Layers: 62
Experts: 256 routed (8 active per token)
Total Parameters: ~228B
Active Parameters: ~10B per token

Credits

APEX is brought to you by the LocalAI team.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/MiniMax-M2.7-APEX-GGUF

Base model

MiniMaxAI/MiniMax-M2.7

Finetuned

(16)

this model

Collection including mudler/MiniMax-M2.7-APEX-GGUF

APEX Quants (GGUF)

Collection

MoE models quantized with the APEX Quantization technique ( https://github.com/mudler/apex-quant ) • 24 items • Updated about 2 hours ago • 49