MiniMax-M2.7-TQ3
A TurboQuant 3-bit quantized version of MiniMax-M2.7, optimized for inference with turboquant-vllm.
Model Details
- Base Model: MiniMaxAI/MiniMax-M2.7
- Quantization: TurboQuant 3-bit
- Quantization Tool: turboquant-vllm
- Architecture: Transformer-based LLM with extended context support
Usage
This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the turboquant-vllm repository for installation and usage instructions.
Example
# Please refer to turboquant-vllm for proper model loading
Chat Template
The model uses a Jinja chat template with support for:
- System messages
- Tool/function calling (
<minimax:tool_call>/</minimax:tool_call>delimiters) - Reasoning content (
<think>/</minimax:tool_call>delimiters) - Multi-turn conversations
The default model identity is: "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."
Tokenizer
- Backend: tokenizers
- Vocabulary Size: (see tokenizer files)
- Special Tokens: Includes tokens for tool calls, reasoning markers, and standard control tokens
Quantization Details
This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.
Disclaimer
This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.
- Downloads last month
- -
Model tree for fno2010/MiniMax-2.7-TQ3
Base model
MiniMaxAI/MiniMax-M2.7