MiniMax-M2.7-TQ3

A TurboQuant 3-bit quantized version of MiniMax-M2.7, optimized for inference with turboquant-vllm.

Model Details

  • Base Model: MiniMaxAI/MiniMax-M2.7
  • Quantization: TurboQuant 3-bit
  • Quantization Tool: turboquant-vllm
  • Architecture: Transformer-based LLM with extended context support

Usage

This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the turboquant-vllm repository for installation and usage instructions.

Example

# Please refer to turboquant-vllm for proper model loading

Chat Template

The model uses a Jinja chat template with support for:

  • System messages
  • Tool/function calling (<minimax:tool_call> / </minimax:tool_call> delimiters)
  • Reasoning content (<think> / </minimax:tool_call> delimiters)
  • Multi-turn conversations

The default model identity is: "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."

Tokenizer

  • Backend: tokenizers
  • Vocabulary Size: (see tokenizer files)
  • Special Tokens: Includes tokens for tool calls, reasoning markers, and standard control tokens

Quantization Details

This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.

Disclaimer

This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.

Downloads last month
-
Safetensors
Model size
88B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fno2010/MiniMax-2.7-TQ3

Quantized
(33)
this model