MiniMax-M2.5-REAP-139B-A10B-GGUF

This is the REAP model in practical pants: high quality GGUF quants for local inference without setting your workstation on fire.

Built from:

Available Quants

Quant	Status	Size (GiB)	Notes
`Q8_0`	uploaded	137.78	Highest quality quant in this pack
`Q5_K_M`	uploading	92.33	Better quality/size balance
`Q4_K_M`	uploaded	78.83	Strong practical default

All quants are split GGUF sets (00001-of-00007 etc.) for safer handling of very large models.

These are generated from BF16 REAP GGUF, not requantized from lower precision.
Token embedding and output tensors are kept at Q8_0 during quantization for quality retention.

Use any first shard with llama.cpp; it auto-discovers sibling shards:

llama-cli -m MiniMax-M2.5-REAP-Q4_K_M-00001-of-00007.gguf -ngl 0 -c 8192

You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.

GGUF

Model size

139B params

Architecture

minimax-m2

Hardware compatibility

4-bit

5-bit

8-bit

Base model

Quantized

Quantized

Quantized

(2)

this model