MiniMax-M2.5-REAP-139B-A10B-MXFP4

When you want MoE-specific compression without sending quality into the abyss.

Built from:

  • Base: MiniMaxAI/MiniMax-M2.5
  • REAP source: tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF (BF16 split)
  • Quantized locally with llama.cpp as MXFP4_MOE.

Quant

Quant Size (GiB) Notes
MXFP4_MOE 70.91 MoE-oriented quantization, with many non-expert tensors preserved at higher precision

Tensor Mix

This quant uses a mixed layout (as expected for MXFP4 MoE), including mxfp4, q8_0, and f32 tensors.

Usage

Use the first shard; llama.cpp resolves the rest:

llama-cli -m MiniMax-M2.5-REAP-MXFP4_MOE-00001-of-00007.gguf -ngl 0 -c 8192

Why a Separate Repo

MXFP4 has a different target audience than standard Q-series packs, so it gets its own clean home and simpler download surface.

Credits

  • MiniMaxAI for MiniMax-M2.5
  • tomngdev for the BF16 REAP GGUF release
  • BennyDaBall for this quant

Disclaimer

You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.

Downloads last month
98
GGUF
Model size
139B params
Architecture
minimax-m2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BennyDaBall/MiniMax-M2.5-REAP-139B-A10B-MXFP4