cvoegele-nv commited on
Commit
9ee12fb
·
verified ·
1 Parent(s): 71dc37c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -0
README.md CHANGED
@@ -1,5 +1,160 @@
1
  ---
 
 
 
2
  license: other
3
  license_name: nvidia-software-and-model-evaluation-license
4
  license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license
 
 
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: text-generation
3
+ base_model:
4
+ - MiniMaxAI/MiniMax-M2.7
5
  license: other
6
  license_name: nvidia-software-and-model-evaluation-license
7
  license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license
8
+ library_name: Model Optimizer
9
+ tags:
10
+ - nvidia
11
+ - ModelOpt
12
+ - MiniMax
13
+ - quantized
14
+ - NVFP4
15
+ - nvfp4
16
  ---
17
+
18
+ # Model Overview
19
+
20
+ ## Description:
21
+ MiniMax M2.7 is a large language model for complex software engineering, agentic tool use, and office productivity workflows. It is presented as a model deeply participating in its own evolution, with support for complex agent harnesses, dynamic tool search, Agent Teams, and high-fidelity coding and document-editing tasks.
22
+
23
+ *This model is for research and development only.*
24
+
25
+ ## Third-Party Community Consideration
26
+ This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA [MiniMax M2.7 Model Card](https://huggingface.co/MiniMaxAI/MiniMax-M2.7)
27
+
28
+ ### License/Terms of Use:
29
+ **GOVERNING TERMS:** Use of this model is governed by the [NVIDIA Software and Model Evaluation license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license/).
30
+
31
+ **ADDITIONAL INFORMATION:** [Non-Commercial MiniMax License](https://github.com/MiniMax-AI/MiniMax-M2.7/blob/main/LICENSE). Copyright (c) 2026 MiniMax.
32
+
33
+
34
+ ### Deployment Geography:
35
+ Global
36
+
37
+ ### Use Case:
38
+ **Use Case:** Designed for advanced coding assistance, agentic workflows, long-horizon software engineering, live production troubleshooting, office document generation and editing, and other complex multi-step productivity tasks.
39
+
40
+ ### Examples
41
+
42
+ - Coding assistants and software engineering copilots
43
+ - Agent harnesses with complex skill libraries and multi-tool search
44
+ - Bug localization and production troubleshooting
45
+ - Office document generation and editing workflows
46
+ - Research, analysis, and productivity automation
47
+
48
+ ### Release Date:
49
+ Huggingface 04/16/2026 via https://huggingface.co/nvidia/MiniMax-M2.7-NVFP4
50
+
51
+ ## Model Architecture:
52
+ **Architecture Type:** Transformer
53
+ **Network Architecture:** Sparse Mixture-of-Experts (MoE)
54
+ **Total Parameters:** 230B
55
+ **Active Parameters:** 10B
56
+ **Layers:** 62
57
+ **Hidden Size:** 3072
58
+ **Experts:** 256 local experts, with 8 experts activated per token
59
+
60
+ ### Input:
61
+ **Input Types:** Text
62
+ **Input Formats:** String
63
+ **Input Parameters:** One-Dimensional (1D)
64
+ **Other Input Properties:** Supports long system prompts.
65
+ **Input Context Length (ISL):** 204,800
66
+
67
+ ### Output:
68
+ **Output Types:** Text
69
+ **Output Format:** String
70
+ **Output Parameters:** One-Dimensional (1D)
71
+ **Other Output Properties:** None
72
+
73
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
74
+
75
+ ## Software Integration:
76
+ **Runtime Engine(s):**
77
+ * SGLang
78
+ * vLLM
79
+
80
+ **Supported Hardware Microarchitecture Compatibility:**
81
+ * NVIDIA Blackwell
82
+
83
+ **Preferred Operating System(s):**
84
+ * Linux
85
+
86
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
87
+
88
+ ## Model Version(s):
89
+ This model v1 and is NVFP4 quantized with nvidia-modelopt **v0.43.0**
90
+
91
+ ## Training and Evaluation Datasets:
92
+
93
+ ## Calibration Dataset:
94
+ **Link:** [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail), [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2)
95
+ **Data Collection Method by dataset:** Automated.
96
+ **Labeling Method by dataset:** Automated.
97
+ **Properties:** The `cnn_dailymail` dataset contains English-language news articles and summaries. `Nemotron-Post-Training-Dataset-v2` is a post-training dataset curated by NVIDIA containing multi-turn conversations across diverse topics.
98
+
99
+ ## Training Dataset:
100
+ **Data Modality:** Text
101
+ **Data Collection Method by dataset:** Undisclosed
102
+ **Labeling Method by dataset:** Undisclosed
103
+ **Properties:** Undisclosed
104
+
105
+ ## Evaluation Dataset:
106
+ **Datasets:** MMLU-Pro, LiveCodebench, IFEval, GPQA Diamond, SciCode, AIME 2025, and IFBench<br>
107
+ **Data Collection Method by dataset:** Hybrid, Automated, Human <br>
108
+ **Labeling Method by dataset:** Hybrid, Automated, Human<br>
109
+ **Properties:** We evaluated the model on text-based reasoning and coding benchmarks: MMLU Pro is a multi-task language understanding benchmark with challenging multiple-choice questions across diverse academic domains; LiveCodeBench V6 contains competitive programming problems; SciCode evaluates scientific coding capabilities; IFEval is a benchmark that tests whether language models can follow explicit, verifiable formatting and structural constraints layered on top of content generation prompts; GPQA Diamond contains 448 graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry; AIME 2025 contains problems from the American Invitational Mathematics Examination; IFBench is a benchmark for evaluating instruction-following capabilities across diverse and structured task constraints.
110
+
111
+
112
+ ## Inference:
113
+ **Engine:** vLLM
114
+
115
+ **Test Hardware:** B200
116
+
117
+ ## Post Training Quantization
118
+ This model was obtained by quantizing the weights and activations of MiniMax M2.7 to NVFP4 data type, ready for inference with SGLang. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 1.65x.
119
+
120
+ ## Usage
121
+ To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:latest` and run the sample command below:
122
+
123
+ ```
124
+ python3 -m sglang.launch_server --model nvidia/MiniMax-M2.7-NVFP4 \
125
+ --tensor-parallel-size 8 \
126
+ --quantization modelopt_fp4 \
127
+ --trust-remote-code \
128
+ --reasoning-parser minimax-append-think \
129
+ --tool-call-parser minimax-m2 \
130
+ --moe-runner-backend flashinfer_cutlass \
131
+ --attention-backend flashinfer
132
+ ```
133
+
134
+ To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command below:
135
+
136
+ ```
137
+ vllm serve nvidia/MiniMax-M2.7-NVFP4 \
138
+ --tensor-parallel-size 8 \
139
+ --tool-call-parser minimax_m2 \
140
+ --reasoning-parser minimax_m2_append_think \
141
+ --enable-auto-tool-choice \
142
+ --trust-remote-code
143
+ ```
144
+
145
+ ### Evaluation
146
+
147
+ The accuracy benchmark results are presented in the table below:
148
+
149
+ | **Precision** | **IFEval** | **MMLU Pro** | **GPQA Diamond** | **LiveCodeBench** | **SciCode** | **AIME 2025** | **IFBench** | **AA-LCR** |
150
+ |---|---|---|---|---|---|---|---|---|
151
+ | FP8 | 0.909 | 0.824 | 0.860 | 0.573 | 0.498 | 0.892 | 0.733 | 0.718 |
152
+ | NVFP4 | 0.904 | 0.817 | 0.857 | 0.582 | 0.487 | 0.888 | 0.728 | 0.728 |
153
+
154
+ > Baseline and evaluation settings are not fully disclosed on the referenced MiniMax M2.7 page.
155
+
156
+ ## Ethical Considerations
157
+
158
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
159
+
160
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).