Create README.md
#4
by CooLLaMACEO - opened
README.md
ADDED
|
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
---
|
| 5 |
+
# DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
|
| 6 |
+
|
| 7 |
+
<!-- markdownlint-disable first-line-h1 -->
|
| 8 |
+
<!-- markdownlint-disable html -->
|
| 9 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
| 10 |
+
|
| 11 |
+
<div align="center">
|
| 12 |
+
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V4" />
|
| 13 |
+
</div>
|
| 14 |
+
<hr>
|
| 15 |
+
<div align="center" style="line-height: 1;">
|
| 16 |
+
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
|
| 17 |
+
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
|
| 18 |
+
</a>
|
| 19 |
+
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
|
| 20 |
+
<img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V4-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 21 |
+
</a>
|
| 22 |
+
</div>
|
| 23 |
+
<div align="center" style="line-height: 1;">
|
| 24 |
+
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
|
| 25 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 26 |
+
</a>
|
| 27 |
+
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
|
| 28 |
+
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 29 |
+
</a>
|
| 30 |
+
</div>
|
| 31 |
+
<div align="center" style="line-height: 1;">
|
| 32 |
+
<a href="LICENSE" style="margin: 2px;">
|
| 33 |
+
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
| 34 |
+
</a>
|
| 35 |
+
</div>
|
| 36 |
+
<p align="center">
|
| 37 |
+
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf"><b>Technical Report</b>👁️</a>
|
| 38 |
+
</p>
|
| 39 |
+
|
| 40 |
+
## Introduction
|
| 41 |
+
|
| 42 |
+
We present a preview version of **DeepSeek-V4** series, including two strong Mixture-of-Experts (MoE) language models — **DeepSeek-V4-Pro** with 1.6T parameters (49B activated) and **DeepSeek-V4-Flash** with 284B parameters (13B activated) — both supporting a context length of **one million tokens**.
|
| 43 |
+
|
| 44 |
+
DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:
|
| 45 |
+
|
| 46 |
+
1. **Hybrid Attention Architecture:** We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only **27% of single-token inference FLOPs** and **10% of KV cache** compared with DeepSeek-V3.2.
|
| 47 |
+
2. **Manifold-Constrained Hyper-Connections (mHC):** We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
|
| 48 |
+
3. **Muon Optimizer:** We employ the Muon optimizer for faster convergence and greater training stability.
|
| 49 |
+
|
| 50 |
+
We pre-train both models on more than **32T** diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.
|
| 51 |
+
|
| 52 |
+
**DeepSeek-V4-Pro-Max**, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, **DeepSeek-V4-Flash-Max** achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.
|
| 53 |
+
|
| 54 |
+
<div align="center">
|
| 55 |
+
<img src="assets/dsv4_performance.png" >
|
| 56 |
+
</div>
|
| 57 |
+
|
| 58 |
+
## Model Downloads
|
| 59 |
+
|
| 60 |
+
<div align="center">
|
| 61 |
+
|
| 62 |
+
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Precision** | **Download** |
|
| 63 |
+
| :---: | :---: | :---: | :---: | :---: | :---: |
|
| 64 |
+
| DeepSeek-V4-Flash-Base | 284B | 13B | 1M | FP8 Mixed | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash-Base) |
|
| 65 |
+
| DeepSeek-V4-Flash | 284B | 13B | 1M | FP4 + FP8 Mixed* | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash) |
|
| 66 |
+
| DeepSeek-V4-Pro-Base | 1.6T | 49B | 1M | FP8 Mixed | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro-Base) |
|
| 67 |
+
| DeepSeek-V4-Pro | 1.6T | 49B | 1M | FP4 + FP8 Mixed* | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro) |
|
| 68 |
+
|
| 69 |
+
</div>
|
| 70 |
+
|
| 71 |
+
*\*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.*
|
| 72 |
+
|
| 73 |
+
## Evaluation Results
|
| 74 |
+
|
| 75 |
+
### Base Model
|
| 76 |
+
|
| 77 |
+
<div align="center">
|
| 78 |
+
|
| 79 |
+
| Benchmark (Metric) | # Shots | DeepSeek-V3.2-Base | DeepSeek-V4-Flash-Base | DeepSeek-V4-Pro-Base |
|
| 80 |
+
| :--- | :---: | :---: | :---: | :---: |
|
| 81 |
+
| Architecture | - | MoE | MoE | MoE |
|
| 82 |
+
| # Activated Params | - | 37B | 13B | 49B |
|
| 83 |
+
| # Total Params | - | 671B | 284B | 1.6T |
|
| 84 |
+
| **World Knowledge** | | | | |
|
| 85 |
+
| AGIEval (EM) | 0-shot | 80.1 | 82.6 | **83.1** |
|
| 86 |
+
| MMLU (EM) | 5-shot | 87.8 | 88.7 | **90.1** |
|
| 87 |
+
| MMLU-Redux (EM) | 5-shot | 87.5 | 89.4 | **90.8** |
|
| 88 |
+
| MMLU-Pro (EM) | 5-shot | 65.5 | 68.3 | **73.5** |
|
| 89 |
+
| MMMLU (EM) | 5-shot | 87.9 | 88.8 | **90.3** |
|
| 90 |
+
| C-Eval (EM) | 5-shot | 90.4 | 92.1 | **93.1** |
|
| 91 |
+
| CMMLU (EM) | 5-shot | 88.9 | 90.4 | **90.8** |
|
| 92 |
+
| MultiLoKo (EM) | 5-shot | 38.7 | 42.2 | **51.1** |
|
| 93 |
+
| Simple-QA verified (EM) | 25-shot | 28.3 | 30.1 | **55.2** |
|
| 94 |
+
| SuperGPQA (EM) | 5-shot | 45.0 | 46.5 | **53.9** |
|
| 95 |
+
| FACTS Parametric (EM) | 25-shot | 27.1 | 33.9 | **62.6** |
|
| 96 |
+
| TriviaQA (EM) | 5-shot | 83.3 | 82.8 | **85.6** |
|
| 97 |
+
| **Language & Reasoning** | | | | |
|
| 98 |
+
| BBH (EM) | 3-shot | **87.6** | 86.9 | 87.5 |
|
| 99 |
+
| DROP (F1) | 1-shot | 88.2 | 88.6 | **88.7** |
|
| 100 |
+
| HellaSwag (EM) | 0-shot | 86.4 | 85.7 | **88.0** |
|
| 101 |
+
| WinoGrande (EM) | 0-shot | 78.9 | 79.5 | **81.5** |
|
| 102 |
+
| CLUEWSC (EM) | 5-shot | 83.5 | 82.2 | **85.2** |
|
| 103 |
+
| **Code & Math** | | | | |
|
| 104 |
+
| BigCodeBench (Pass@1) | 3-shot | **63.9** | 56.8 | 59.2 |
|
| 105 |
+
| HumanEval (Pass@1) | 0-shot | 62.8 | 69.5 | **76.8** |
|
| 106 |
+
| GSM8K (EM) | 8-shot | 91.1 | 90.8 | **92.6** |
|
| 107 |
+
| MATH (EM) | 4-shot | 60.5 | 57.4 | **64.5** |
|
| 108 |
+
| MGSM (EM) | 8-shot | 81.3 | **85.7** | 84.4 |
|
| 109 |
+
| CMath (EM) | 3-shot | 92.6 | **93.6** | 90.9 |
|
| 110 |
+
| **Long Context** | | | | |
|
| 111 |
+
| LongBench-V2 (EM) | 1-shot | 40.2 | 44.7 | **51.5** |
|
| 112 |
+
|
| 113 |
+
</div>
|
| 114 |
+
|
| 115 |
+
### Instruct Model
|
| 116 |
+
|
| 117 |
+
DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:
|
| 118 |
+
|
| 119 |
+
| Reasoning Mode | Characteristics | Typical Use Cases | Response Format |
|
| 120 |
+
| :--- | :--- | :--- | :--- |
|
| 121 |
+
| Non-think | Fast, intuitive responses | Routine daily tasks, low-risk decisions | `</think>` summary |
|
| 122 |
+
| Think High | Conscious logical analysis, slower but more accurate | Complex problem-solving, planning | `<think>` thinking `</think>` summary |
|
| 123 |
+
| Think Max | Push reasoning to its fullest extent | Exploring the boundary of model reasoning capability | Special system prompt + `<think>` thinking `</think>` summary |
|
| 124 |
+
|
| 125 |
+
#### DeepSeek-V4-Pro-Max vs Frontier Models
|
| 126 |
+
|
| 127 |
+
<div align="center">
|
| 128 |
+
|
| 129 |
+
| Benchmark (Metric) | Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max |
|
| 130 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 131 |
+
| **Knowledge & Reasoning** | | | | | | |
|
| 132 |
+
| MMLU-Pro (EM) | 89.1 | 87.5 | **91.0** | 87.1 | 86.0 | 87.5 |
|
| 133 |
+
| SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | **75.6** | 36.9 | 38.1 | 57.9 |
|
| 134 |
+
| Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | **85.9** | 75.9 | 75.0 | 84.4 |
|
| 135 |
+
| GPQA Diamond (Pass@1) | 91.3 | 93.0 | **94.3** | 90.5 | 86.2 | 90.1 |
|
| 136 |
+
| HLE (Pass@1) | 40.0 | 39.8 | **44.4** | 36.4 | 34.7 | 37.7 |
|
| 137 |
+
| LiveCodeBench (Pass@1) | 88.8 | - | 91.7 | 89.6 | - | **93.5** |
|
| 138 |
+
| Codeforces (Rating) | - | 3168 | 3052 | - | - | **3206** |
|
| 139 |
+
| HMMT 2026 Feb (Pass@1) | 96.2 | **97.7** | 94.7 | 92.7 | 89.4 | 95.2 |
|
| 140 |
+
| IMOAnswerBench (Pass@1) | 75.3 | **91.4** | 81.0 | 86.0 | 83.8 | 89.8 |
|
| 141 |
+
| Apex (Pass@1) | 34.5 | 54.1 | **60.9** | 24.0 | 11.5 | 38.3 |
|
| 142 |
+
| Apex Shortlist (Pass@1) | 85.9 | 78.1 | 89.1 | 75.5 | 72.4 | **90.2** |
|
| 143 |
+
| **Long Context** | | | | | | |
|
| 144 |
+
| MRCR 1M (MMR) | **92.9** | - | 76.3 | - | - | 83.5 |
|
| 145 |
+
| CorpusQA 1M (ACC) | **71.7** | - | 53.8 | - | - | 62.0 |
|
| 146 |
+
| **Agentic** | | | | | | |
|
| 147 |
+
| Terminal Bench 2.0 (Acc) | 65.4 | **75.1** | 68.5 | 66.7 | 63.5 | 67.9 |
|
| 148 |
+
| SWE Verified (Resolved) | **80.8** | - | 80.6 | 80.2 | - | 80.6 |
|
| 149 |
+
| SWE Pro (Resolved) | 57.3 | 57.7 | 54.2 | **58.6** | 58.4 | 55.4 |
|
| 150 |
+
| SWE Multilingual (Resolved) | **77.5** | - | - | 76.7 | 73.3 | 76.2 |
|
| 151 |
+
| BrowseComp (Pass@1) | 83.7 | 82.7 | **85.9** | 83.2 | 79.3 | 83.4 |
|
| 152 |
+
| HLE w/ tools (Pass@1) | 53.1 | 52.0 | 51.6 | **54.0** | 50.4 | 48.2 |
|
| 153 |
+
| GDPval-AA (Elo) | 1619 | **1674** | 1314 | 1482 | 1535 | 1554 |
|
| 154 |
+
| MCPAtlas Public (Pass@1) | **73.8** | 67.2 | 69.2 | 66.6 | 71.8 | 73.6 |
|
| 155 |
+
| Toolathlon (Pass@1) | 47.2 | **54.6** | 48.8 | 50.0 | 40.7 | 51.8 |
|
| 156 |
+
|
| 157 |
+
</div>
|
| 158 |
+
|
| 159 |
+
#### Comparison across Modes
|
| 160 |
+
|
| 161 |
+
<div align="center">
|
| 162 |
+
|
| 163 |
+
| Benchmark (Metric) | V4-Flash Non-Think | V4-Flash High | V4-Flash Max | V4-Pro Non-Think | V4-Pro High | V4-Pro Max |
|
| 164 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 165 |
+
| **Knowledge & Reasoning** | | | | | | |
|
| 166 |
+
| MMLU-Pro (EM) | 83.0 | 86.4 | 86.2 | 82.9 | 87.1 | **87.5** |
|
| 167 |
+
| SimpleQA-Verified (Pass@1) | 23.1 | 28.9 | 34.1 | 45.0 | 46.2 | **57.9** |
|
| 168 |
+
| Chinese-SimpleQA (Pass@1) | 71.5 | 73.2 | 78.9 | 75.8 | 77.7 | **84.4** |
|
| 169 |
+
| GPQA Diamond (Pass@1) | 71.2 | 87.4 | 88.1 | 72.9 | 89.1 | **90.1** |
|
| 170 |
+
| HLE (Pass@1) | 8.1 | 29.4 | 34.8 | 7.7 | 34.5 | **37.7** |
|
| 171 |
+
| LiveCodeBench (Pass@1) | 55.2 | 88.4 | 91.6 | 56.8 | 89.8 | **93.5** |
|
| 172 |
+
| Codeforces (Rating) | - | 2816 | 3052 | - | 2919 | **3206** |
|
| 173 |
+
| HMMT 2026 Feb (Pass@1) | 40.8 | 91.9 | 94.8 | 31.7 | 94.0 | **95.2** |
|
| 174 |
+
| IMOAnswerBench (Pass@1) | 41.9 | 85.1 | 88.4 | 35.3 | 88.0 | **89.8** |
|
| 175 |
+
| Apex (Pass@1) | 1.0 | 19.1 | 33.0 | 0.4 | 27.4 | **38.3** |
|
| 176 |
+
| Apex Shortlist (Pass@1) | 9.3 | 72.1 | 85.7 | 9.2 | 85.5 | **90.2** |
|
| 177 |
+
| **Long Context** | | | | | | |
|
| 178 |
+
| MRCR 1M (MMR) | 37.5 | 76.9 | 78.7 | 44.7 | 83.3 | **83.5** |
|
| 179 |
+
| CorpusQA 1M (ACC) | 15.5 | 59.3 | 60.5 | 35.6 | 56.5 | **62.0** |
|
| 180 |
+
| **Agentic** | | | | | | |
|
| 181 |
+
| Terminal Bench 2.0 (Acc) | 49.1 | 56.6 | 56.9 | 59.1 | 63.3 | **67.9** |
|
| 182 |
+
| SWE Verified (Resolved) | 73.7 | 78.6 | 79.0 | 73.6 | 79.4 | **80.6** |
|
| 183 |
+
| SWE Pro (Resolved) | 49.1 | 52.3 | 52.6 | 52.1 | 54.4 | **55.4** |
|
| 184 |
+
| SWE Multilingual (Resolved) | 69.7 | 70.2 | 73.3 | 69.8 | 74.1 | **76.2** |
|
| 185 |
+
| BrowseComp (Pass@1) | - | 53.5 | 73.2 | - | 80.4 | **83.4** |
|
| 186 |
+
| HLE w/ tools (Pass@1) | - | 40.3 | 45.1 | - | 44.7 | **48.2** |
|
| 187 |
+
| MCPAtlas (Pass@1) | 64.0 | 67.4 | 69.0 | 69.4 | **74.2** | 73.6 |
|
| 188 |
+
| GDPval-AA (Elo) | - | - | 1395 | - | - | **1554** |
|
| 189 |
+
| Toolathlon (Pass@1) | 40.7 | 43.5 | 47.8 | 46.3 | 49.0 | **51.8** |
|
| 190 |
+
|
| 191 |
+
</div>
|
| 192 |
+
|
| 193 |
+
## Chat Template
|
| 194 |
+
|
| 195 |
+
This release does not include a Jinja-format chat template. Instead, we provide a dedicated `encoding` folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the [`encoding`](encoding/README.md) folder for full documentation.
|
| 196 |
+
|
| 197 |
+
A brief example:
|
| 198 |
+
|
| 199 |
+
```python
|
| 200 |
+
from encoding_dsv4 import encode_messages, parse_message_from_completion_text
|
| 201 |
+
messages = [
|
| 202 |
+
{"role": "user", "content": "hello"},
|
| 203 |
+
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
|
| 204 |
+
{"role": "user", "content": "1+1=?"}
|
| 205 |
+
]
|
| 206 |
+
# messages -> string
|
| 207 |
+
prompt = encode_messages(messages, thinking_mode="thinking")
|
| 208 |
+
# string -> tokens
|
| 209 |
+
import transformers
|
| 210 |
+
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
|
| 211 |
+
tokens = tokenizer.encode(prompt)
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
## How to Run Locally
|
| 215 |
+
|
| 216 |
+
Please refer to the [inference](inference/README.md) folder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.
|
| 217 |
+
|
| 218 |
+
For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 1.0`. For the Think Max reasoning mode, we recommend setting the context window to at least **384K** tokens.
|
| 219 |
+
|
| 220 |
+
## License
|
| 221 |
+
|
| 222 |
+
This repository and the model weights are licensed under the [MIT License](LICENSE).
|
| 223 |
+
|
| 224 |
+
## Citation
|
| 225 |
+
|
| 226 |
+
```
|
| 227 |
+
@misc{deepseekai2026deepseekv4,
|
| 228 |
+
title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
|
| 229 |
+
author={DeepSeek-AI},
|
| 230 |
+
year={2026},
|
| 231 |
+
}
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
## Contact
|
| 235 |
+
|
| 236 |
+
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
|