MiniMax-M2.7 AWQ 4bit (W4A16)

W4A16 quantization of MiniMaxAI/MiniMax-M2.7, produced with llm-compressor.

  • Format: compressed-tensors pack-quantized, int4 weights / fp16 activations
  • Group size: 128, symmetric
  • Calibration: data-free, MSE observer
  • Kept in BF16: MoE routing gates and lm_head only — every other Linear is quantized, matching the ignore list from cyankiwi/MiniMax-M2.5-AWQ-4bit.

vLLM

vllm serve demon-zombie/MiniMax-M2.7-AWQ-4bit \
  --tensor-parallel-size 4 \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2 \
  --enable-auto-tool-choice
Downloads last month
2,639
Safetensors
Model size
229B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for demon-zombie/MiniMax-M2.7-AWQ-4bit

Quantized
(71)
this model