danielhanchen commited on
Commit
ba43e24
·
verified ·
1 Parent(s): 5ee74d5

Mirror worker 4

Browse files
DeepSeek_V4.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa4a3490e2dcc03c9da61b04a8be471795e9966ebbbf292a3899fa62683a330e
3
+ size 4479901
README.md ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ ---
5
+ # DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
6
+
7
+ <!-- markdownlint-disable first-line-h1 -->
8
+ <!-- markdownlint-disable html -->
9
+ <!-- markdownlint-disable no-duplicate-header -->
10
+
11
+ <div align="center">
12
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V4" />
13
+ </div>
14
+ <hr>
15
+ <div align="center" style="line-height: 1;">
16
+ <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
17
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
18
+ </a>
19
+ <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
20
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V4-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
21
+ </a>
22
+ </div>
23
+ <div align="center" style="line-height: 1;">
24
+ <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
25
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
26
+ </a>
27
+ <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
28
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
29
+ </a>
30
+ </div>
31
+ <div align="center" style="line-height: 1;">
32
+ <a href="LICENSE" style="margin: 2px;">
33
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
34
+ </a>
35
+ </div>
36
+
37
+ <p align="center">
38
+ <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf"><b>Technical Report</b>👁️</a>
39
+ </p>
40
+
41
+ ## Introduction
42
+
43
+ We present a preview version of **DeepSeek-V4** series, including two strong Mixture-of-Experts (MoE) language models — **DeepSeek-V4-Pro** with 1.6T parameters (49B activated) and **DeepSeek-V4-Flash** with 284B parameters (13B activated) — both supporting a context length of **one million tokens**.
44
+
45
+ DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:
46
+
47
+ 1. **Hybrid Attention Architecture:** We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only **27% of single-token inference FLOPs** and **10% of KV cache** compared with DeepSeek-V3.2.
48
+ 2. **Manifold-Constrained Hyper-Connections (mHC):** We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
49
+ 3. **Muon Optimizer:** We employ the Muon optimizer for faster convergence and greater training stability.
50
+
51
+ We pre-train both models on more than **32T** diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.
52
+
53
+ **DeepSeek-V4-Pro-Max**, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, **DeepSeek-V4-Flash-Max** achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.
54
+
55
+ <div align="center">
56
+ <img src="assets/dsv4_performance.png" >
57
+ </div>
58
+
59
+ ## Model Downloads
60
+
61
+ <div align="center">
62
+
63
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Precision** | **Download** |
64
+ | :---: | :---: | :---: | :---: | :---: | :---: |
65
+ | DeepSeek-V4-Flash-Base | 284B | 13B | 1M | FP8 Mixed | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash-Base) |
66
+ | DeepSeek-V4-Flash | 284B | 13B | 1M | FP4 + FP8 Mixed* | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash) |
67
+ | DeepSeek-V4-Pro-Base | 1.6T | 49B | 1M | FP8 Mixed | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro-Base) |
68
+ | DeepSeek-V4-Pro | 1.6T | 49B | 1M | FP4 + FP8 Mixed* | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro) |
69
+
70
+ </div>
71
+
72
+ *\*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.*
73
+
74
+ ## Evaluation Results
75
+
76
+ ### Base Model
77
+
78
+ <div align="center">
79
+
80
+ | Benchmark (Metric) | # Shots | DeepSeek-V3.2-Base | DeepSeek-V4-Flash-Base | DeepSeek-V4-Pro-Base |
81
+ | :--- | :---: | :---: | :---: | :---: |
82
+ | Architecture | - | MoE | MoE | MoE |
83
+ | # Activated Params | - | 37B | 13B | 49B |
84
+ | # Total Params | - | 671B | 284B | 1.6T |
85
+ | **World Knowledge** | | | | |
86
+ | AGIEval (EM) | 0-shot | 80.1 | 82.6 | **83.1** |
87
+ | MMLU (EM) | 5-shot | 87.8 | 88.7 | **90.1** |
88
+ | MMLU-Redux (EM) | 5-shot | 87.5 | 89.4 | **90.8** |
89
+ | MMLU-Pro (EM) | 5-shot | 65.5 | 68.3 | **73.5** |
90
+ | MMMLU (EM) | 5-shot | 87.9 | 88.8 | **90.3** |
91
+ | C-Eval (EM) | 5-shot | 90.4 | 92.1 | **93.1** |
92
+ | CMMLU (EM) | 5-shot | 88.9 | 90.4 | **90.8** |
93
+ | MultiLoKo (EM) | 5-shot | 38.7 | 42.2 | **51.1** |
94
+ | Simple-QA verified (EM) | 25-shot | 28.3 | 30.1 | **55.2** |
95
+ | SuperGPQA (EM) | 5-shot | 45.0 | 46.5 | **53.9** |
96
+ | FACTS Parametric (EM) | 25-shot | 27.1 | 33.9 | **62.6** |
97
+ | TriviaQA (EM) | 5-shot | 83.3 | 82.8 | **85.6** |
98
+ | **Language & Reasoning** | | | | |
99
+ | BBH (EM) | 3-shot | **87.6** | 86.9 | 87.5 |
100
+ | DROP (F1) | 1-shot | 88.2 | 88.6 | **88.7** |
101
+ | HellaSwag (EM) | 0-shot | 86.4 | 85.7 | **88.0** |
102
+ | WinoGrande (EM) | 0-shot | 78.9 | 79.5 | **81.5** |
103
+ | CLUEWSC (EM) | 5-shot | 83.5 | 82.2 | **85.2** |
104
+ | **Code & Math** | | | | |
105
+ | BigCodeBench (Pass@1) | 3-shot | **63.9** | 56.8 | 59.2 |
106
+ | HumanEval (Pass@1) | 0-shot | 62.8 | 69.5 | **76.8** |
107
+ | GSM8K (EM) | 8-shot | 91.1 | 90.8 | **92.6** |
108
+ | MATH (EM) | 4-shot | 60.5 | 57.4 | **64.5** |
109
+ | MGSM (EM) | 8-shot | 81.3 | **85.7** | 84.4 |
110
+ | CMath (EM) | 3-shot | 92.6 | **93.6** | 90.9 |
111
+ | **Long Context** | | | | |
112
+ | LongBench-V2 (EM) | 1-shot | 40.2 | 44.7 | **51.5** |
113
+
114
+ </div>
115
+
116
+ ### Instruct Model
117
+
118
+ DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:
119
+
120
+ | Reasoning Mode | Characteristics | Typical Use Cases | Response Format |
121
+ | :--- | :--- | :--- | :--- |
122
+ | Non-think | Fast, intuitive responses | Routine daily tasks, low-risk decisions | `</think>` summary |
123
+ | Think High | Conscious logical analysis, slower but more accurate | Complex problem-solving, planning | `<think>` thinking `</think>` summary |
124
+ | Think Max | Push reasoning to its fullest extent | Exploring the boundary of model reasoning capability | Special system prompt + `<think>` thinking `</think>` summary |
125
+
126
+ #### DeepSeek-V4-Pro-Max vs Frontier Models
127
+
128
+ <div align="center">
129
+
130
+ | Benchmark (Metric) | Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max |
131
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
132
+ | **Knowledge & Reasoning** | | | | | | |
133
+ | MMLU-Pro (EM) | 89.1 | 87.5 | **91.0** | 87.1 | 86.0 | 87.5 |
134
+ | SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | **75.6** | 36.9 | 38.1 | 57.9 |
135
+ | Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | **85.9** | 75.9 | 75.0 | 84.4 |
136
+ | GPQA Diamond (Pass@1) | 91.3 | 93.0 | **94.3** | 90.5 | 86.2 | 90.1 |
137
+ | HLE (Pass@1) | 40.0 | 39.8 | **44.4** | 36.4 | 34.7 | 37.7 |
138
+ | LiveCodeBench (Pass@1) | 88.8 | - | 91.7 | 89.6 | - | **93.5** |
139
+ | Codeforces (Rating) | - | 3168 | 3052 | - | - | **3206** |
140
+ | HMMT 2026 Feb (Pass@1) | 96.2 | **97.7** | 94.7 | 92.7 | 89.4 | 95.2 |
141
+ | IMOAnswerBench (Pass@1) | 75.3 | **91.4** | 81.0 | 86.0 | 83.8 | 89.8 |
142
+ | Apex (Pass@1) | 34.5 | 54.1 | **60.9** | 24.0 | 11.5 | 38.3 |
143
+ | Apex Shortlist (Pass@1) | 85.9 | 78.1 | 89.1 | 75.5 | 72.4 | **90.2** |
144
+ | **Long Context** | | | | | | |
145
+ | MRCR 1M (MMR) | **92.9** | - | 76.3 | - | - | 83.5 |
146
+ | CorpusQA 1M (ACC) | **71.7** | - | 53.8 | - | - | 62.0 |
147
+ | **Agentic** | | | | | | |
148
+ | Terminal Bench 2.0 (Acc) | 65.4 | **75.1** | 68.5 | 66.7 | 63.5 | 67.9 |
149
+ | SWE Verified (Resolved) | **80.8** | - | 80.6 | 80.2 | - | 80.6 |
150
+ | SWE Pro (Resolved) | 57.3 | 57.7 | 54.2 | **58.6** | 58.4 | 55.4 |
151
+ | SWE Multilingual (Resolved) | **77.5** | - | - | 76.7 | 73.3 | 76.2 |
152
+ | BrowseComp (Pass@1) | 83.7 | 82.7 | **85.9** | 83.2 | 79.3 | 83.4 |
153
+ | HLE w/ tools (Pass@1) | 53.1 | 52.0 | 51.6 | **54.0** | 50.4 | 48.2 |
154
+ | GDPval-AA (Elo) | 1619 | **1674** | 1314 | 1482 | 1535 | 1554 |
155
+ | MCPAtlas Public (Pass@1) | **73.8** | 67.2 | 69.2 | 66.6 | 71.8 | 73.6 |
156
+ | Toolathlon (Pass@1) | 47.2 | **54.6** | 48.8 | 50.0 | 40.7 | 51.8 |
157
+
158
+ </div>
159
+
160
+ #### Comparison across Modes
161
+
162
+ <div align="center">
163
+
164
+ | Benchmark (Metric) | V4-Flash Non-Think | V4-Flash High | V4-Flash Max | V4-Pro Non-Think | V4-Pro High | V4-Pro Max |
165
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
166
+ | **Knowledge & Reasoning** | | | | | | |
167
+ | MMLU-Pro (EM) | 83.0 | 86.4 | 86.2 | 82.9 | 87.1 | **87.5** |
168
+ | SimpleQA-Verified (Pass@1) | 23.1 | 28.9 | 34.1 | 45.0 | 46.2 | **57.9** |
169
+ | Chinese-SimpleQA (Pass@1) | 71.5 | 73.2 | 78.9 | 75.8 | 77.7 | **84.4** |
170
+ | GPQA Diamond (Pass@1) | 71.2 | 87.4 | 88.1 | 72.9 | 89.1 | **90.1** |
171
+ | HLE (Pass@1) | 8.1 | 29.4 | 34.8 | 7.7 | 34.5 | **37.7** |
172
+ | LiveCodeBench (Pass@1) | 55.2 | 88.4 | 91.6 | 56.8 | 89.8 | **93.5** |
173
+ | Codeforces (Rating) | - | 2816 | 3052 | - | 2919 | **3206** |
174
+ | HMMT 2026 Feb (Pass@1) | 40.8 | 91.9 | 94.8 | 31.7 | 94.0 | **95.2** |
175
+ | IMOAnswerBench (Pass@1) | 41.9 | 85.1 | 88.4 | 35.3 | 88.0 | **89.8** |
176
+ | Apex (Pass@1) | 1.0 | 19.1 | 33.0 | 0.4 | 27.4 | **38.3** |
177
+ | Apex Shortlist (Pass@1) | 9.3 | 72.1 | 85.7 | 9.2 | 85.5 | **90.2** |
178
+ | **Long Context** | | | | | | |
179
+ | MRCR 1M (MMR) | 37.5 | 76.9 | 78.7 | 44.7 | 83.3 | **83.5** |
180
+ | CorpusQA 1M (ACC) | 15.5 | 59.3 | 60.5 | 35.6 | 56.5 | **62.0** |
181
+ | **Agentic** | | | | | | |
182
+ | Terminal Bench 2.0 (Acc) | 49.1 | 56.6 | 56.9 | 59.1 | 63.3 | **67.9** |
183
+ | SWE Verified (Resolved) | 73.7 | 78.6 | 79.0 | 73.6 | 79.4 | **80.6** |
184
+ | SWE Pro (Resolved) | 49.1 | 52.3 | 52.6 | 52.1 | 54.4 | **55.4** |
185
+ | SWE Multilingual (Resolved) | 69.7 | 70.2 | 73.3 | 69.8 | 74.1 | **76.2** |
186
+ | BrowseComp (Pass@1) | - | 53.5 | 73.2 | - | 80.4 | **83.4** |
187
+ | HLE w/ tools (Pass@1) | - | 40.3 | 45.1 | - | 44.7 | **48.2** |
188
+ | MCPAtlas (Pass@1) | 64.0 | 67.4 | 69.0 | 69.4 | **74.2** | 73.6 |
189
+ | GDPval-AA (Elo) | - | - | 1395 | - | - | **1554** |
190
+ | Toolathlon (Pass@1) | 40.7 | 43.5 | 47.8 | 46.3 | 49.0 | **51.8** |
191
+
192
+ </div>
193
+
194
+ ## Chat Template
195
+
196
+ This release does not include a Jinja-format chat template. Instead, we provide a dedicated `encoding` folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the [`encoding`](encoding/README.md) folder for full documentation.
197
+
198
+ A brief example:
199
+
200
+ ```python
201
+ from encoding_dsv4 import encode_messages, parse_message_from_completion_text
202
+
203
+ messages = [
204
+ {"role": "user", "content": "hello"},
205
+ {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
206
+ {"role": "user", "content": "1+1=?"}
207
+ ]
208
+
209
+ # messages -> string
210
+ prompt = encode_messages(messages, thinking_mode="thinking")
211
+
212
+ # string -> tokens
213
+ import transformers
214
+ tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
215
+ tokens = tokenizer.encode(prompt)
216
+ ```
217
+
218
+ ## How to Run Locally
219
+
220
+ Please refer to the [inference](inference/README.md) folder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.
221
+
222
+ For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 1.0`. For the Think Max reasoning mode, we recommend setting the context window to at least **384K** tokens.
223
+
224
+ ## License
225
+
226
+ This repository and the model weights are licensed under the [MIT License](LICENSE).
227
+
228
+ ## Citation
229
+
230
+ ```
231
+ @misc{deepseekai2026deepseekv4,
232
+ title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
233
+ author={DeepSeek-AI},
234
+ year={2026},
235
+ }
236
+ ```
237
+
238
+ ## Contact
239
+
240
+ If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
inference/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 129280,
3
+ "dim": 4096,
4
+ "moe_inter_dim": 2048,
5
+ "n_layers": 43,
6
+ "n_hash_layers": 3,
7
+ "n_heads": 64,
8
+ "n_routed_experts": 256,
9
+ "n_shared_experts": 1,
10
+ "n_activated_experts": 6,
11
+ "score_func": "sqrtsoftplus",
12
+ "route_scale": 1.5,
13
+ "swiglu_limit": 10.0,
14
+ "q_lora_rank": 1024,
15
+ "head_dim": 512,
16
+ "rope_head_dim": 64,
17
+ "o_groups": 8,
18
+ "o_lora_rank": 1024,
19
+ "window_size": 128,
20
+ "original_seq_len": 65536,
21
+ "rope_theta": 10000,
22
+ "rope_factor": 16,
23
+ "beta_fast": 32,
24
+ "beta_slow": 1,
25
+ "index_n_heads": 64,
26
+ "index_head_dim": 128,
27
+ "index_topk": 512,
28
+ "hc_mult": 4,
29
+ "hc_sinkhorn_iters": 20,
30
+ "dtype": "fp8",
31
+ "scale_fmt": "ue8m0",
32
+ "expert_dtype": "fp4",
33
+ "compress_rope_theta": 160000,
34
+ "compress_ratios": [0, 0, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 128, 4, 0]
35
+ }
model-00004-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:948250b46a6f92df92ef093ab2d0023c31f924232846a84affe84fa3c7794f5c
3
+ size 3596229272
model-00012-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05739c7d91a302f41a4627587982016a6cc874f875a3ea299d1f2e1dcea5cbb6
3
+ size 3590026352
model-00020-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:906c652f3c36b510689c2637ebe5172865cc6ccc17515b9fc70ee9e048e7c5af
3
+ size 3590026352
model-00028-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cc519b5a03a30d45717ffb8408a4f833f3a94f70e35a1de38e95a0ffcdc152e
3
+ size 3590026352
model-00036-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab72ad9d171fc0867350948e5091878b3c2445a5cfb8a83dd8c25d4272628107
3
+ size 3590026352
model-00044-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:438b052b8a2d650939e63704f55f1352b946152ba6633cb256c3864ef21d2f62
3
+ size 3590026352
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|begin▁of▁sentence|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "<|end▁of▁sentence|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "legacy": true,
22
+ "model_max_length": 1048576,
23
+ "pad_token": {
24
+ "__type": "AddedToken",
25
+ "content": "<|end▁of▁sentence|>",
26
+ "lstrip": false,
27
+ "normalized": true,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ "sp_model_kwargs": {},
32
+ "unk_token": null,
33
+ "tokenizer_class": "PreTrainedTokenizerFast"
34
+ }