MiniMax-M2.7-TQ3

A TurboQuant 3-bit quantized version of MiniMax-M2.7, optimized for inference with turboquant-vllm.

Model Details

Base Model: MiniMaxAI/MiniMax-M2.7
Quantization: TurboQuant 3-bit
Quantization Tool: turboquant-vllm
Architecture: Transformer-based LLM with extended context support

Usage

This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the turboquant-vllm repository for installation and usage instructions.

Example

# Please refer to turboquant-vllm for proper model loading

Chat Template

The model uses a Jinja chat template with support for:

System messages
Tool/function calling (<minimax:tool_call> / </minimax:tool_call> delimiters)
Reasoning content (<think> / </minimax:tool_call> delimiters)
Multi-turn conversations

The default model identity is: "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."

Tokenizer

Backend: tokenizers
Vocabulary Size: (see tokenizer files)
Special Tokens: Includes tokens for tool calls, reasoning markers, and standard control tokens

Quantization Details

This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.

Disclaimer

This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.

Downloads last month: -

Safetensors

Model size

88B params

Tensor type

F32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fno2010/MiniMax-2.7-TQ3

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(33)

this model