MiniMax-M2.5-REAP-139B-A10B-GGUF

This is the REAP model in practical pants: high quality GGUF quants for local inference without setting your workstation on fire.

Built from:

  • Base: MiniMaxAI/MiniMax-M2.5
  • REAP source: tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF (BF16 split)
  • Quantized locally with llama.cpp on Strix Halo + high RAM mode.

Available Quants

Quant Status Size (GiB) Notes
Q8_0 uploaded 137.78 Highest quality quant in this pack
Q5_K_M uploading 92.33 Better quality/size balance
Q4_K_M uploaded 78.83 Strong practical default

File Layout

All quants are split GGUF sets (00001-of-00007 etc.) for safer handling of very large models.

Quality Notes

  • These are generated from BF16 REAP GGUF, not requantized from lower precision.
  • Token embedding and output tensors are kept at Q8_0 during quantization for quality retention.

Usage

Use any first shard with llama.cpp; it auto-discovers sibling shards:

llama-cli -m MiniMax-M2.5-REAP-Q4_K_M-00001-of-00007.gguf -ngl 0 -c 8192

Credits

  • MiniMaxAI for MiniMax-M2.5
  • tomngdev for the BF16 REAP GGUF release
  • BennyDaBall for this quant pack

Disclaimer

You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.

Downloads last month
159
GGUF
Model size
139B params
Architecture
minimax-m2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BennyDaBall/MiniMax-M2.5-REAP-139B-A10B-GGUF