---
pipeline_tag: text-generation
base_model:
- MiniMaxAI/MiniMax-M2.7
license: other
license_name: nvidia-software-and-model-evaluation-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license
library_name: Model Optimizer
tags:
- nvidia
- ModelOpt
- MiniMax
- quantized
- NVFP4
- nvfp4
---

# Model Overview

## Description:
MiniMax M2.7 is a large language model for complex software engineering, agentic tool use, and office productivity workflows. It is presented as a model deeply participating in its own evolution, with support for complex agent harnesses, dynamic tool search, Agent Teams, and high-fidelity coding and document-editing tasks.

*This model is for research and development only.*

## Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA [MiniMax M2.7 Model Card](https://huggingface.co/MiniMaxAI/MiniMax-M2.7)

### License/Terms of Use:
**GOVERNING TERMS:** Use of this model is governed by the [NVIDIA Software and Model Evaluation license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license/).

**ADDITIONAL INFORMATION:** [Non-Commercial MiniMax License](https://github.com/MiniMax-AI/MiniMax-M2.7/blob/main/LICENSE). Copyright (c) 2026 MiniMax.


### Deployment Geography:
Global

### Use Case:
**Use Case:** Designed for advanced coding assistance, agentic workflows, long-horizon software engineering, live production troubleshooting, office document generation and editing, and other complex multi-step productivity tasks.

### Examples

- Coding assistants and software engineering copilots
- Agent harnesses with complex skill libraries and multi-tool search
- Bug localization and production troubleshooting
- Office document generation and editing workflows
- Research, analysis, and productivity automation

### Release Date:
Huggingface 04/24/2026 via https://huggingface.co/nvidia/MiniMax-M2.7-NVFP4

## Model Architecture:
**Architecture Type:** Transformer  
**Network Architecture:** Sparse Mixture-of-Experts (MoE)  
**Total Parameters:** 230B  
**Active Parameters:** 10B  
**Layers:** 62  
**Hidden Size:** 3072  
**Experts:** 256 local experts, with 8 experts activated per token  

### Input:
**Input Types:** Text  
**Input Formats:** String  
**Input Parameters:** One-Dimensional (1D)  
**Other Input Properties:** Supports long system prompts.  
**Input Context Length (ISL):** 204,800

### Output:
**Output Types:** Text  
**Output Format:** String  
**Output Parameters:** One-Dimensional (1D)  
**Other Output Properties:** None

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

## Software Integration:
**Runtime Engine(s):**
* SGLang
* vLLM

**Supported Hardware Microarchitecture Compatibility:**
* NVIDIA Blackwell

**Preferred Operating System(s):**
* Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

## Model Version(s):
This model v1 and is NVFP4 quantized with nvidia-modelopt **v0.43.0**

## Training and Evaluation Datasets:

## Calibration Dataset:
**Link:** [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail), [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2)  
**Data Collection Method by dataset:** Automated.  
**Labeling Method by dataset:** Automated.  
**Properties:** The `cnn_dailymail` dataset contains English-language news articles and summaries. `Nemotron-Post-Training-Dataset-v2` is a post-training dataset curated by NVIDIA containing multi-turn conversations across diverse topics.

## Training Dataset:
**Data Modality:** Text  
**Data Collection Method by dataset:** Undisclosed  
**Labeling Method by dataset:** Undisclosed  
**Properties:** Undisclosed

## Evaluation Dataset:
**Datasets:** MMLU-Pro, LiveCodebench, IFEval, GPQA Diamond, SciCode, AIME 2025, IFBench, and AA-LCR<br>
**Data Collection Method by dataset:** Hybrid, Automated, Human <br>
**Labeling Method by dataset:** Hybrid, Automated, Human<br>
**Properties:** We evaluated the model on text-based reasoning and coding benchmarks: MMLU Pro is a multi-task language understanding benchmark with challenging multiple-choice questions across diverse academic domains; LiveCodeBench V6 contains competitive programming problems; SciCode evaluates scientific coding capabilities; IFEval is a benchmark that tests whether language models can follow explicit, verifiable formatting and structural constraints layered on top of content generation prompts; GPQA Diamond contains 448 graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry; AIME 2025 contains problems from the American Invitational Mathematics Examination; IFBench is a benchmark for evaluating instruction-following capabilities across diverse and structured task constraints; AA-LCR (Artificial Analysis Long Context Reasoning) is a long-context benchmark of 100 questions over documents ranging from 10k to 100k tokens that requires multi-step reasoning and synthesis across dispersed sections rather than simple retrieval.


## Inference:
**Engine:** vLLM

**Test Hardware:** B200

## Post Training Quantization
This model was obtained by quantizing the weights and activations of MiniMax M2.7 to NVFP4 data type, ready for inference with SGLang. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 1.65x.

## Usage
To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:latest` and run the sample command below:

```
python3 -m sglang.launch_server --model nvidia/MiniMax-M2.7-NVFP4 \
  --tensor-parallel-size 8 \
  --quantization modelopt_fp4 \
  --trust-remote-code \
  --reasoning-parser minimax-append-think \
  --tool-call-parser minimax-m2 \
  --moe-runner-backend flashinfer_cutlass \
  --attention-backend flashinfer
```

To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command below:

```
vllm serve nvidia/MiniMax-M2.7-NVFP4 \
  --tensor-parallel-size 8 \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2_append_think \
  --enable-auto-tool-choice \
  --trust-remote-code
```

### Evaluation

The accuracy benchmark results are presented in the table below:

| **Precision** | **IFEval** | **MMLU Pro** | **GPQA Diamond** | **LiveCodeBench** | **SciCode** | **AIME 2025** | **IFBench** | **AA-LCR** |
|---|---|---|---|---|---|---|---|---|
| FP8 (baseline) | 0.909 | 0.824 | 0.860 | 0.573 | 0.498 | 0.892 | 0.733 | 0.718 |
| NVFP4 | 0.904 | 0.817 | 0.857 | 0.582 | 0.487 | 0.888 | 0.728 | 0.728 |

> Baseline and evaluation settings are not fully disclosed on the referenced MiniMax M2.7 page.

## Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).