MiniMax-M2.5-REAP-139B-A10B-GGUF
This is the REAP model in practical pants: high quality GGUF quants for local inference without setting your workstation on fire.
Built from:
- Base:
MiniMaxAI/MiniMax-M2.5 - REAP source:
tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF(BF16 split) - Quantized locally with
llama.cppon Strix Halo + high RAM mode.
Available Quants
| Quant | Status | Size (GiB) | Notes |
|---|---|---|---|
Q8_0 |
uploaded | 137.78 | Highest quality quant in this pack |
Q5_K_M |
uploading | 92.33 | Better quality/size balance |
Q4_K_M |
uploaded | 78.83 | Strong practical default |
File Layout
All quants are split GGUF sets (00001-of-00007 etc.) for safer handling of very large models.
Quality Notes
- These are generated from BF16 REAP GGUF, not requantized from lower precision.
- Token embedding and output tensors are kept at
Q8_0during quantization for quality retention.
Usage
Use any first shard with llama.cpp; it auto-discovers sibling shards:
llama-cli -m MiniMax-M2.5-REAP-Q4_K_M-00001-of-00007.gguf -ngl 0 -c 8192
Credits
MiniMaxAIfor MiniMax-M2.5tomngdevfor the BF16 REAP GGUF releaseBennyDaBallfor this quant pack
Disclaimer
You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.
- Downloads last month
- 159
Hardware compatibility
Log In to add your hardware
4-bit
5-bit
8-bit
Model tree for BennyDaBall/MiniMax-M2.5-REAP-139B-A10B-GGUF
Base model
MiniMaxAI/MiniMax-M2.5 Quantized
cerebras/MiniMax-M2.5-REAP-139B-A10B