prince-canuma commited on
Commit
29d71e1
·
verified ·
1 Parent(s): 3ed8904

Add files using upload-large-folder tool

Browse files
Files changed (40) hide show
  1. README.md +13 -68
  2. config.json +1036 -4
  3. model-00001-of-00036.safetensors +3 -0
  4. model-00002-of-00036.safetensors +3 -0
  5. model-00003-of-00036.safetensors +3 -0
  6. model-00004-of-00036.safetensors +3 -0
  7. model-00005-of-00036.safetensors +3 -0
  8. model-00006-of-00036.safetensors +3 -0
  9. model-00007-of-00036.safetensors +3 -0
  10. model-00008-of-00036.safetensors +3 -0
  11. model-00009-of-00036.safetensors +3 -0
  12. model-00010-of-00036.safetensors +3 -0
  13. model-00011-of-00036.safetensors +3 -0
  14. model-00012-of-00036.safetensors +3 -0
  15. model-00013-of-00036.safetensors +3 -0
  16. model-00014-of-00036.safetensors +3 -0
  17. model-00015-of-00036.safetensors +3 -0
  18. model-00016-of-00036.safetensors +3 -0
  19. model-00017-of-00036.safetensors +3 -0
  20. model-00018-of-00036.safetensors +3 -0
  21. model-00019-of-00036.safetensors +3 -0
  22. model-00020-of-00036.safetensors +3 -0
  23. model-00021-of-00036.safetensors +3 -0
  24. model-00022-of-00036.safetensors +3 -0
  25. model-00023-of-00036.safetensors +3 -0
  26. model-00024-of-00036.safetensors +3 -0
  27. model-00025-of-00036.safetensors +3 -0
  28. model-00026-of-00036.safetensors +3 -0
  29. model-00027-of-00036.safetensors +3 -0
  30. model-00028-of-00036.safetensors +3 -0
  31. model-00029-of-00036.safetensors +3 -0
  32. model-00030-of-00036.safetensors +3 -0
  33. model-00031-of-00036.safetensors +3 -0
  34. model-00032-of-00036.safetensors +3 -0
  35. model-00033-of-00036.safetensors +3 -0
  36. model-00034-of-00036.safetensors +3 -0
  37. model-00035-of-00036.safetensors +3 -0
  38. model-00036-of-00036.safetensors +3 -0
  39. model.safetensors.index.json +0 -0
  40. tokenizer_config.json +5 -1
README.md CHANGED
@@ -1,86 +1,31 @@
1
  ---
2
- language:
3
- - en
4
- library_name: mlx
5
- license: mit
6
- pipeline_tag: text-generation
7
  tags:
8
  - mlx
9
- - safetensors
10
- - deepseek_v4
11
- - 4-bit
12
- base_model: deepseek-ai/DeepSeek-V4-Flash
13
- base_model_relation: quantized
14
  ---
15
 
16
- # DeepSeek-V4-Flash-4bit (MLX)
17
-
18
- 4-bit quantized MLX port of [`deepseek-ai/DeepSeek-V4-Flash`](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) for Apple Silicon.
19
-
20
- 158B total params (~37B active), 149 GB on disk, fits comfortably on a single M3/M5 Ultra (256GB+).
21
 
22
- ## Requires the V4 mlx-lm port
23
-
24
- DeepSeek-V4 is a new architecture (mHC, hash-routed MoE, sqrtsoftplus, Compressor + Indexer for compressed KV) and is **not yet in stock mlx-lm**. To use this model you need the V4 port:
25
 
26
  ```bash
27
- git clone https://github.com/machiabeli/mlx-lm-1.git mlx-lm
28
- cd mlx-lm && git checkout feat/deepseek-v4
29
- pip install -e .
30
  ```
31
 
32
- Tracking PR: [`ml-explore/mlx-lm#1189`](https://github.com/ml-explore/mlx-lm/pull/1189). Once merged, `pip install mlx-lm` will work.
33
-
34
- ## Usage
35
-
36
  ```python
37
  from mlx_lm import load, generate
38
 
39
  model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
40
- out = generate(model, tokenizer, prompt="Q: What is 2+2?\nA:", max_tokens=64)
41
- print(out)
42
- ```
43
-
44
- ## Performance
45
-
46
- Measured on M3 Ultra (512GB) single-node, batch=1:
47
 
48
- | Stage | tok/s |
49
- |-------|-------|
50
- | Prompt processing | 6.6 |
51
- | Generation | **20.2** |
52
- | Peak RAM | 160 GB |
53
 
54
- Generation throughput includes the **fused Metal kernel for mHC Sinkhorn** added in PR #1189 (1.83x over the Python reference).
55
-
56
- ## Source quality caveat
57
-
58
- The bf16 source weights used for this conversion were upcasted from DeepSeek's native FP8 release rather than re-quantized directly from FP8. This stacks two quantization passes (FP8 -> BF16 -> Q4) and may produce slightly worse outputs than a direct FP8 -> Q4 conversion. A re-conversion from native FP8 is planned.
59
-
60
- ## Conversion
61
 
 
62
  ```
63
- mlx_lm.convert \
64
- --hf-path deepseek-ai/DeepSeek-V4-Flash \
65
- --mlx-path DeepSeek-V4-Flash-4bit \
66
- -q --q-bits 4 --q-group-size 64
67
- ```
68
-
69
- Result: 4.506 bits per weight, 33 sharded safetensors.
70
-
71
- ## Architecture
72
-
73
- V4 is a substantial step from V3:
74
-
75
- - **mHC (Manifold-constrained Hyper-Connections)** — replaces residual connections with `hc_mult=4` parallel hidden-state copies recombined via a doubly-stochastic Sinkhorn-normalized mix matrix.
76
- - **Hash-routed MoE** — first 3 layers use a deterministic `tid2eid` table (token id -> expert id) instead of learned routing.
77
- - **`sqrtsoftplus` scoring** — `sqrt(softplus(x))` instead of softmax for expert scores.
78
- - **MLA with single shared 512-dim KV head** — broadcast across 64 query heads (no kv_lora_rank up-projection step like V3).
79
- - **Compressor + Indexer** for compressed KV attention with topk sparse selection (Indexer at compress_ratio=4).
80
- - **Per-head learnable `attn_sink`** in softmax denominator.
81
-
82
- Full details: [DeepSeek V4 technical report](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf).
83
-
84
- ## License
85
-
86
- MIT (matches upstream).
 
1
  ---
2
+ language: en
 
 
 
 
3
  tags:
4
  - mlx
5
+ library_name: mlx
6
+ pipeline_tag: text-generation
 
 
 
7
  ---
8
 
9
+ # mlx-community/DeepSeek-V4-Flash-4bit
 
 
 
 
10
 
11
+ ## Use with mlx
 
 
12
 
13
  ```bash
14
+ pip install mlx-lm
 
 
15
  ```
16
 
 
 
 
 
17
  ```python
18
  from mlx_lm import load, generate
19
 
20
  model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
 
 
 
 
 
 
 
21
 
22
+ prompt = "hello"
 
 
 
 
23
 
24
+ if tokenizer.chat_template is not None:
25
+ messages = [{"role": "user", "content": prompt}]
26
+ prompt = tokenizer.apply_chat_template(
27
+ messages, add_generation_prompt=True, return_dict=False,
28
+ )
 
 
29
 
30
+ response = generate(model, tokenizer, prompt=prompt, verbose=True)
31
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -51,7 +51,7 @@
51
  4,
52
  0
53
  ],
54
- "compress_rope_theta": 160000,
55
  "eos_token_id": 1,
56
  "hc_eps": 1e-06,
57
  "hc_mult": 4,
@@ -82,12 +82,1044 @@
82
  "quantization": {
83
  "group_size": 64,
84
  "bits": 4,
85
- "mode": "affine"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  },
87
  "quantization_config": {
88
  "group_size": 64,
89
  "bits": 4,
90
- "mode": "affine"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  },
92
  "rms_norm_eps": 1e-06,
93
  "rope_scaling": {
@@ -97,7 +1129,7 @@
97
  "original_max_position_embeddings": 65536,
98
  "type": "yarn"
99
  },
100
- "rope_theta": 10000,
101
  "routed_scaling_factor": 1.5,
102
  "scoring_func": "sqrtsoftplus",
103
  "sliding_window": 128,
 
51
  4,
52
  0
53
  ],
54
+ "compress_rope_theta": 160000.0,
55
  "eos_token_id": 1,
56
  "hc_eps": 1e-06,
57
  "hc_mult": 4,
 
82
  "quantization": {
83
  "group_size": 64,
84
  "bits": 4,
85
+ "mode": "affine",
86
+ "model.layers.0.ffn.switch_mlp.gate_proj": {
87
+ "group_size": 32,
88
+ "bits": 4
89
+ },
90
+ "model.layers.0.ffn.switch_mlp.up_proj": {
91
+ "group_size": 32,
92
+ "bits": 4
93
+ },
94
+ "model.layers.0.ffn.switch_mlp.down_proj": {
95
+ "group_size": 32,
96
+ "bits": 4
97
+ },
98
+ "model.layers.1.ffn.switch_mlp.gate_proj": {
99
+ "group_size": 32,
100
+ "bits": 4
101
+ },
102
+ "model.layers.1.ffn.switch_mlp.up_proj": {
103
+ "group_size": 32,
104
+ "bits": 4
105
+ },
106
+ "model.layers.1.ffn.switch_mlp.down_proj": {
107
+ "group_size": 32,
108
+ "bits": 4
109
+ },
110
+ "model.layers.2.ffn.switch_mlp.gate_proj": {
111
+ "group_size": 32,
112
+ "bits": 4
113
+ },
114
+ "model.layers.2.ffn.switch_mlp.up_proj": {
115
+ "group_size": 32,
116
+ "bits": 4
117
+ },
118
+ "model.layers.2.ffn.switch_mlp.down_proj": {
119
+ "group_size": 32,
120
+ "bits": 4
121
+ },
122
+ "model.layers.3.ffn.switch_mlp.gate_proj": {
123
+ "group_size": 32,
124
+ "bits": 4
125
+ },
126
+ "model.layers.3.ffn.switch_mlp.up_proj": {
127
+ "group_size": 32,
128
+ "bits": 4
129
+ },
130
+ "model.layers.3.ffn.switch_mlp.down_proj": {
131
+ "group_size": 32,
132
+ "bits": 4
133
+ },
134
+ "model.layers.4.ffn.switch_mlp.gate_proj": {
135
+ "group_size": 32,
136
+ "bits": 4
137
+ },
138
+ "model.layers.4.ffn.switch_mlp.up_proj": {
139
+ "group_size": 32,
140
+ "bits": 4
141
+ },
142
+ "model.layers.4.ffn.switch_mlp.down_proj": {
143
+ "group_size": 32,
144
+ "bits": 4
145
+ },
146
+ "model.layers.5.ffn.switch_mlp.gate_proj": {
147
+ "group_size": 32,
148
+ "bits": 4
149
+ },
150
+ "model.layers.5.ffn.switch_mlp.up_proj": {
151
+ "group_size": 32,
152
+ "bits": 4
153
+ },
154
+ "model.layers.5.ffn.switch_mlp.down_proj": {
155
+ "group_size": 32,
156
+ "bits": 4
157
+ },
158
+ "model.layers.6.ffn.switch_mlp.gate_proj": {
159
+ "group_size": 32,
160
+ "bits": 4
161
+ },
162
+ "model.layers.6.ffn.switch_mlp.up_proj": {
163
+ "group_size": 32,
164
+ "bits": 4
165
+ },
166
+ "model.layers.6.ffn.switch_mlp.down_proj": {
167
+ "group_size": 32,
168
+ "bits": 4
169
+ },
170
+ "model.layers.7.ffn.switch_mlp.gate_proj": {
171
+ "group_size": 32,
172
+ "bits": 4
173
+ },
174
+ "model.layers.7.ffn.switch_mlp.up_proj": {
175
+ "group_size": 32,
176
+ "bits": 4
177
+ },
178
+ "model.layers.7.ffn.switch_mlp.down_proj": {
179
+ "group_size": 32,
180
+ "bits": 4
181
+ },
182
+ "model.layers.8.ffn.switch_mlp.gate_proj": {
183
+ "group_size": 32,
184
+ "bits": 4
185
+ },
186
+ "model.layers.8.ffn.switch_mlp.up_proj": {
187
+ "group_size": 32,
188
+ "bits": 4
189
+ },
190
+ "model.layers.8.ffn.switch_mlp.down_proj": {
191
+ "group_size": 32,
192
+ "bits": 4
193
+ },
194
+ "model.layers.9.ffn.switch_mlp.gate_proj": {
195
+ "group_size": 32,
196
+ "bits": 4
197
+ },
198
+ "model.layers.9.ffn.switch_mlp.up_proj": {
199
+ "group_size": 32,
200
+ "bits": 4
201
+ },
202
+ "model.layers.9.ffn.switch_mlp.down_proj": {
203
+ "group_size": 32,
204
+ "bits": 4
205
+ },
206
+ "model.layers.10.ffn.switch_mlp.gate_proj": {
207
+ "group_size": 32,
208
+ "bits": 4
209
+ },
210
+ "model.layers.10.ffn.switch_mlp.up_proj": {
211
+ "group_size": 32,
212
+ "bits": 4
213
+ },
214
+ "model.layers.10.ffn.switch_mlp.down_proj": {
215
+ "group_size": 32,
216
+ "bits": 4
217
+ },
218
+ "model.layers.11.ffn.switch_mlp.gate_proj": {
219
+ "group_size": 32,
220
+ "bits": 4
221
+ },
222
+ "model.layers.11.ffn.switch_mlp.up_proj": {
223
+ "group_size": 32,
224
+ "bits": 4
225
+ },
226
+ "model.layers.11.ffn.switch_mlp.down_proj": {
227
+ "group_size": 32,
228
+ "bits": 4
229
+ },
230
+ "model.layers.12.ffn.switch_mlp.gate_proj": {
231
+ "group_size": 32,
232
+ "bits": 4
233
+ },
234
+ "model.layers.12.ffn.switch_mlp.up_proj": {
235
+ "group_size": 32,
236
+ "bits": 4
237
+ },
238
+ "model.layers.12.ffn.switch_mlp.down_proj": {
239
+ "group_size": 32,
240
+ "bits": 4
241
+ },
242
+ "model.layers.13.ffn.switch_mlp.gate_proj": {
243
+ "group_size": 32,
244
+ "bits": 4
245
+ },
246
+ "model.layers.13.ffn.switch_mlp.up_proj": {
247
+ "group_size": 32,
248
+ "bits": 4
249
+ },
250
+ "model.layers.13.ffn.switch_mlp.down_proj": {
251
+ "group_size": 32,
252
+ "bits": 4
253
+ },
254
+ "model.layers.14.ffn.switch_mlp.gate_proj": {
255
+ "group_size": 32,
256
+ "bits": 4
257
+ },
258
+ "model.layers.14.ffn.switch_mlp.up_proj": {
259
+ "group_size": 32,
260
+ "bits": 4
261
+ },
262
+ "model.layers.14.ffn.switch_mlp.down_proj": {
263
+ "group_size": 32,
264
+ "bits": 4
265
+ },
266
+ "model.layers.15.ffn.switch_mlp.gate_proj": {
267
+ "group_size": 32,
268
+ "bits": 4
269
+ },
270
+ "model.layers.15.ffn.switch_mlp.up_proj": {
271
+ "group_size": 32,
272
+ "bits": 4
273
+ },
274
+ "model.layers.15.ffn.switch_mlp.down_proj": {
275
+ "group_size": 32,
276
+ "bits": 4
277
+ },
278
+ "model.layers.16.ffn.switch_mlp.gate_proj": {
279
+ "group_size": 32,
280
+ "bits": 4
281
+ },
282
+ "model.layers.16.ffn.switch_mlp.up_proj": {
283
+ "group_size": 32,
284
+ "bits": 4
285
+ },
286
+ "model.layers.16.ffn.switch_mlp.down_proj": {
287
+ "group_size": 32,
288
+ "bits": 4
289
+ },
290
+ "model.layers.17.ffn.switch_mlp.gate_proj": {
291
+ "group_size": 32,
292
+ "bits": 4
293
+ },
294
+ "model.layers.17.ffn.switch_mlp.up_proj": {
295
+ "group_size": 32,
296
+ "bits": 4
297
+ },
298
+ "model.layers.17.ffn.switch_mlp.down_proj": {
299
+ "group_size": 32,
300
+ "bits": 4
301
+ },
302
+ "model.layers.18.ffn.switch_mlp.gate_proj": {
303
+ "group_size": 32,
304
+ "bits": 4
305
+ },
306
+ "model.layers.18.ffn.switch_mlp.up_proj": {
307
+ "group_size": 32,
308
+ "bits": 4
309
+ },
310
+ "model.layers.18.ffn.switch_mlp.down_proj": {
311
+ "group_size": 32,
312
+ "bits": 4
313
+ },
314
+ "model.layers.19.ffn.switch_mlp.gate_proj": {
315
+ "group_size": 32,
316
+ "bits": 4
317
+ },
318
+ "model.layers.19.ffn.switch_mlp.up_proj": {
319
+ "group_size": 32,
320
+ "bits": 4
321
+ },
322
+ "model.layers.19.ffn.switch_mlp.down_proj": {
323
+ "group_size": 32,
324
+ "bits": 4
325
+ },
326
+ "model.layers.20.ffn.switch_mlp.gate_proj": {
327
+ "group_size": 32,
328
+ "bits": 4
329
+ },
330
+ "model.layers.20.ffn.switch_mlp.up_proj": {
331
+ "group_size": 32,
332
+ "bits": 4
333
+ },
334
+ "model.layers.20.ffn.switch_mlp.down_proj": {
335
+ "group_size": 32,
336
+ "bits": 4
337
+ },
338
+ "model.layers.21.ffn.switch_mlp.gate_proj": {
339
+ "group_size": 32,
340
+ "bits": 4
341
+ },
342
+ "model.layers.21.ffn.switch_mlp.up_proj": {
343
+ "group_size": 32,
344
+ "bits": 4
345
+ },
346
+ "model.layers.21.ffn.switch_mlp.down_proj": {
347
+ "group_size": 32,
348
+ "bits": 4
349
+ },
350
+ "model.layers.22.ffn.switch_mlp.gate_proj": {
351
+ "group_size": 32,
352
+ "bits": 4
353
+ },
354
+ "model.layers.22.ffn.switch_mlp.up_proj": {
355
+ "group_size": 32,
356
+ "bits": 4
357
+ },
358
+ "model.layers.22.ffn.switch_mlp.down_proj": {
359
+ "group_size": 32,
360
+ "bits": 4
361
+ },
362
+ "model.layers.23.ffn.switch_mlp.gate_proj": {
363
+ "group_size": 32,
364
+ "bits": 4
365
+ },
366
+ "model.layers.23.ffn.switch_mlp.up_proj": {
367
+ "group_size": 32,
368
+ "bits": 4
369
+ },
370
+ "model.layers.23.ffn.switch_mlp.down_proj": {
371
+ "group_size": 32,
372
+ "bits": 4
373
+ },
374
+ "model.layers.24.ffn.switch_mlp.gate_proj": {
375
+ "group_size": 32,
376
+ "bits": 4
377
+ },
378
+ "model.layers.24.ffn.switch_mlp.up_proj": {
379
+ "group_size": 32,
380
+ "bits": 4
381
+ },
382
+ "model.layers.24.ffn.switch_mlp.down_proj": {
383
+ "group_size": 32,
384
+ "bits": 4
385
+ },
386
+ "model.layers.25.ffn.switch_mlp.gate_proj": {
387
+ "group_size": 32,
388
+ "bits": 4
389
+ },
390
+ "model.layers.25.ffn.switch_mlp.up_proj": {
391
+ "group_size": 32,
392
+ "bits": 4
393
+ },
394
+ "model.layers.25.ffn.switch_mlp.down_proj": {
395
+ "group_size": 32,
396
+ "bits": 4
397
+ },
398
+ "model.layers.26.ffn.switch_mlp.gate_proj": {
399
+ "group_size": 32,
400
+ "bits": 4
401
+ },
402
+ "model.layers.26.ffn.switch_mlp.up_proj": {
403
+ "group_size": 32,
404
+ "bits": 4
405
+ },
406
+ "model.layers.26.ffn.switch_mlp.down_proj": {
407
+ "group_size": 32,
408
+ "bits": 4
409
+ },
410
+ "model.layers.27.ffn.switch_mlp.gate_proj": {
411
+ "group_size": 32,
412
+ "bits": 4
413
+ },
414
+ "model.layers.27.ffn.switch_mlp.up_proj": {
415
+ "group_size": 32,
416
+ "bits": 4
417
+ },
418
+ "model.layers.27.ffn.switch_mlp.down_proj": {
419
+ "group_size": 32,
420
+ "bits": 4
421
+ },
422
+ "model.layers.28.ffn.switch_mlp.gate_proj": {
423
+ "group_size": 32,
424
+ "bits": 4
425
+ },
426
+ "model.layers.28.ffn.switch_mlp.up_proj": {
427
+ "group_size": 32,
428
+ "bits": 4
429
+ },
430
+ "model.layers.28.ffn.switch_mlp.down_proj": {
431
+ "group_size": 32,
432
+ "bits": 4
433
+ },
434
+ "model.layers.29.ffn.switch_mlp.gate_proj": {
435
+ "group_size": 32,
436
+ "bits": 4
437
+ },
438
+ "model.layers.29.ffn.switch_mlp.up_proj": {
439
+ "group_size": 32,
440
+ "bits": 4
441
+ },
442
+ "model.layers.29.ffn.switch_mlp.down_proj": {
443
+ "group_size": 32,
444
+ "bits": 4
445
+ },
446
+ "model.layers.30.ffn.switch_mlp.gate_proj": {
447
+ "group_size": 32,
448
+ "bits": 4
449
+ },
450
+ "model.layers.30.ffn.switch_mlp.up_proj": {
451
+ "group_size": 32,
452
+ "bits": 4
453
+ },
454
+ "model.layers.30.ffn.switch_mlp.down_proj": {
455
+ "group_size": 32,
456
+ "bits": 4
457
+ },
458
+ "model.layers.31.ffn.switch_mlp.gate_proj": {
459
+ "group_size": 32,
460
+ "bits": 4
461
+ },
462
+ "model.layers.31.ffn.switch_mlp.up_proj": {
463
+ "group_size": 32,
464
+ "bits": 4
465
+ },
466
+ "model.layers.31.ffn.switch_mlp.down_proj": {
467
+ "group_size": 32,
468
+ "bits": 4
469
+ },
470
+ "model.layers.32.ffn.switch_mlp.gate_proj": {
471
+ "group_size": 32,
472
+ "bits": 4
473
+ },
474
+ "model.layers.32.ffn.switch_mlp.up_proj": {
475
+ "group_size": 32,
476
+ "bits": 4
477
+ },
478
+ "model.layers.32.ffn.switch_mlp.down_proj": {
479
+ "group_size": 32,
480
+ "bits": 4
481
+ },
482
+ "model.layers.33.ffn.switch_mlp.gate_proj": {
483
+ "group_size": 32,
484
+ "bits": 4
485
+ },
486
+ "model.layers.33.ffn.switch_mlp.up_proj": {
487
+ "group_size": 32,
488
+ "bits": 4
489
+ },
490
+ "model.layers.33.ffn.switch_mlp.down_proj": {
491
+ "group_size": 32,
492
+ "bits": 4
493
+ },
494
+ "model.layers.34.ffn.switch_mlp.gate_proj": {
495
+ "group_size": 32,
496
+ "bits": 4
497
+ },
498
+ "model.layers.34.ffn.switch_mlp.up_proj": {
499
+ "group_size": 32,
500
+ "bits": 4
501
+ },
502
+ "model.layers.34.ffn.switch_mlp.down_proj": {
503
+ "group_size": 32,
504
+ "bits": 4
505
+ },
506
+ "model.layers.35.ffn.switch_mlp.gate_proj": {
507
+ "group_size": 32,
508
+ "bits": 4
509
+ },
510
+ "model.layers.35.ffn.switch_mlp.up_proj": {
511
+ "group_size": 32,
512
+ "bits": 4
513
+ },
514
+ "model.layers.35.ffn.switch_mlp.down_proj": {
515
+ "group_size": 32,
516
+ "bits": 4
517
+ },
518
+ "model.layers.36.ffn.switch_mlp.gate_proj": {
519
+ "group_size": 32,
520
+ "bits": 4
521
+ },
522
+ "model.layers.36.ffn.switch_mlp.up_proj": {
523
+ "group_size": 32,
524
+ "bits": 4
525
+ },
526
+ "model.layers.36.ffn.switch_mlp.down_proj": {
527
+ "group_size": 32,
528
+ "bits": 4
529
+ },
530
+ "model.layers.37.ffn.switch_mlp.gate_proj": {
531
+ "group_size": 32,
532
+ "bits": 4
533
+ },
534
+ "model.layers.37.ffn.switch_mlp.up_proj": {
535
+ "group_size": 32,
536
+ "bits": 4
537
+ },
538
+ "model.layers.37.ffn.switch_mlp.down_proj": {
539
+ "group_size": 32,
540
+ "bits": 4
541
+ },
542
+ "model.layers.38.ffn.switch_mlp.gate_proj": {
543
+ "group_size": 32,
544
+ "bits": 4
545
+ },
546
+ "model.layers.38.ffn.switch_mlp.up_proj": {
547
+ "group_size": 32,
548
+ "bits": 4
549
+ },
550
+ "model.layers.38.ffn.switch_mlp.down_proj": {
551
+ "group_size": 32,
552
+ "bits": 4
553
+ },
554
+ "model.layers.39.ffn.switch_mlp.gate_proj": {
555
+ "group_size": 32,
556
+ "bits": 4
557
+ },
558
+ "model.layers.39.ffn.switch_mlp.up_proj": {
559
+ "group_size": 32,
560
+ "bits": 4
561
+ },
562
+ "model.layers.39.ffn.switch_mlp.down_proj": {
563
+ "group_size": 32,
564
+ "bits": 4
565
+ },
566
+ "model.layers.40.ffn.switch_mlp.gate_proj": {
567
+ "group_size": 32,
568
+ "bits": 4
569
+ },
570
+ "model.layers.40.ffn.switch_mlp.up_proj": {
571
+ "group_size": 32,
572
+ "bits": 4
573
+ },
574
+ "model.layers.40.ffn.switch_mlp.down_proj": {
575
+ "group_size": 32,
576
+ "bits": 4
577
+ },
578
+ "model.layers.41.ffn.switch_mlp.gate_proj": {
579
+ "group_size": 32,
580
+ "bits": 4
581
+ },
582
+ "model.layers.41.ffn.switch_mlp.up_proj": {
583
+ "group_size": 32,
584
+ "bits": 4
585
+ },
586
+ "model.layers.41.ffn.switch_mlp.down_proj": {
587
+ "group_size": 32,
588
+ "bits": 4
589
+ },
590
+ "model.layers.42.ffn.switch_mlp.gate_proj": {
591
+ "group_size": 32,
592
+ "bits": 4
593
+ },
594
+ "model.layers.42.ffn.switch_mlp.up_proj": {
595
+ "group_size": 32,
596
+ "bits": 4
597
+ },
598
+ "model.layers.42.ffn.switch_mlp.down_proj": {
599
+ "group_size": 32,
600
+ "bits": 4
601
+ }
602
  },
603
  "quantization_config": {
604
  "group_size": 64,
605
  "bits": 4,
606
+ "mode": "affine",
607
+ "model.layers.0.ffn.switch_mlp.gate_proj": {
608
+ "group_size": 32,
609
+ "bits": 4
610
+ },
611
+ "model.layers.0.ffn.switch_mlp.up_proj": {
612
+ "group_size": 32,
613
+ "bits": 4
614
+ },
615
+ "model.layers.0.ffn.switch_mlp.down_proj": {
616
+ "group_size": 32,
617
+ "bits": 4
618
+ },
619
+ "model.layers.1.ffn.switch_mlp.gate_proj": {
620
+ "group_size": 32,
621
+ "bits": 4
622
+ },
623
+ "model.layers.1.ffn.switch_mlp.up_proj": {
624
+ "group_size": 32,
625
+ "bits": 4
626
+ },
627
+ "model.layers.1.ffn.switch_mlp.down_proj": {
628
+ "group_size": 32,
629
+ "bits": 4
630
+ },
631
+ "model.layers.2.ffn.switch_mlp.gate_proj": {
632
+ "group_size": 32,
633
+ "bits": 4
634
+ },
635
+ "model.layers.2.ffn.switch_mlp.up_proj": {
636
+ "group_size": 32,
637
+ "bits": 4
638
+ },
639
+ "model.layers.2.ffn.switch_mlp.down_proj": {
640
+ "group_size": 32,
641
+ "bits": 4
642
+ },
643
+ "model.layers.3.ffn.switch_mlp.gate_proj": {
644
+ "group_size": 32,
645
+ "bits": 4
646
+ },
647
+ "model.layers.3.ffn.switch_mlp.up_proj": {
648
+ "group_size": 32,
649
+ "bits": 4
650
+ },
651
+ "model.layers.3.ffn.switch_mlp.down_proj": {
652
+ "group_size": 32,
653
+ "bits": 4
654
+ },
655
+ "model.layers.4.ffn.switch_mlp.gate_proj": {
656
+ "group_size": 32,
657
+ "bits": 4
658
+ },
659
+ "model.layers.4.ffn.switch_mlp.up_proj": {
660
+ "group_size": 32,
661
+ "bits": 4
662
+ },
663
+ "model.layers.4.ffn.switch_mlp.down_proj": {
664
+ "group_size": 32,
665
+ "bits": 4
666
+ },
667
+ "model.layers.5.ffn.switch_mlp.gate_proj": {
668
+ "group_size": 32,
669
+ "bits": 4
670
+ },
671
+ "model.layers.5.ffn.switch_mlp.up_proj": {
672
+ "group_size": 32,
673
+ "bits": 4
674
+ },
675
+ "model.layers.5.ffn.switch_mlp.down_proj": {
676
+ "group_size": 32,
677
+ "bits": 4
678
+ },
679
+ "model.layers.6.ffn.switch_mlp.gate_proj": {
680
+ "group_size": 32,
681
+ "bits": 4
682
+ },
683
+ "model.layers.6.ffn.switch_mlp.up_proj": {
684
+ "group_size": 32,
685
+ "bits": 4
686
+ },
687
+ "model.layers.6.ffn.switch_mlp.down_proj": {
688
+ "group_size": 32,
689
+ "bits": 4
690
+ },
691
+ "model.layers.7.ffn.switch_mlp.gate_proj": {
692
+ "group_size": 32,
693
+ "bits": 4
694
+ },
695
+ "model.layers.7.ffn.switch_mlp.up_proj": {
696
+ "group_size": 32,
697
+ "bits": 4
698
+ },
699
+ "model.layers.7.ffn.switch_mlp.down_proj": {
700
+ "group_size": 32,
701
+ "bits": 4
702
+ },
703
+ "model.layers.8.ffn.switch_mlp.gate_proj": {
704
+ "group_size": 32,
705
+ "bits": 4
706
+ },
707
+ "model.layers.8.ffn.switch_mlp.up_proj": {
708
+ "group_size": 32,
709
+ "bits": 4
710
+ },
711
+ "model.layers.8.ffn.switch_mlp.down_proj": {
712
+ "group_size": 32,
713
+ "bits": 4
714
+ },
715
+ "model.layers.9.ffn.switch_mlp.gate_proj": {
716
+ "group_size": 32,
717
+ "bits": 4
718
+ },
719
+ "model.layers.9.ffn.switch_mlp.up_proj": {
720
+ "group_size": 32,
721
+ "bits": 4
722
+ },
723
+ "model.layers.9.ffn.switch_mlp.down_proj": {
724
+ "group_size": 32,
725
+ "bits": 4
726
+ },
727
+ "model.layers.10.ffn.switch_mlp.gate_proj": {
728
+ "group_size": 32,
729
+ "bits": 4
730
+ },
731
+ "model.layers.10.ffn.switch_mlp.up_proj": {
732
+ "group_size": 32,
733
+ "bits": 4
734
+ },
735
+ "model.layers.10.ffn.switch_mlp.down_proj": {
736
+ "group_size": 32,
737
+ "bits": 4
738
+ },
739
+ "model.layers.11.ffn.switch_mlp.gate_proj": {
740
+ "group_size": 32,
741
+ "bits": 4
742
+ },
743
+ "model.layers.11.ffn.switch_mlp.up_proj": {
744
+ "group_size": 32,
745
+ "bits": 4
746
+ },
747
+ "model.layers.11.ffn.switch_mlp.down_proj": {
748
+ "group_size": 32,
749
+ "bits": 4
750
+ },
751
+ "model.layers.12.ffn.switch_mlp.gate_proj": {
752
+ "group_size": 32,
753
+ "bits": 4
754
+ },
755
+ "model.layers.12.ffn.switch_mlp.up_proj": {
756
+ "group_size": 32,
757
+ "bits": 4
758
+ },
759
+ "model.layers.12.ffn.switch_mlp.down_proj": {
760
+ "group_size": 32,
761
+ "bits": 4
762
+ },
763
+ "model.layers.13.ffn.switch_mlp.gate_proj": {
764
+ "group_size": 32,
765
+ "bits": 4
766
+ },
767
+ "model.layers.13.ffn.switch_mlp.up_proj": {
768
+ "group_size": 32,
769
+ "bits": 4
770
+ },
771
+ "model.layers.13.ffn.switch_mlp.down_proj": {
772
+ "group_size": 32,
773
+ "bits": 4
774
+ },
775
+ "model.layers.14.ffn.switch_mlp.gate_proj": {
776
+ "group_size": 32,
777
+ "bits": 4
778
+ },
779
+ "model.layers.14.ffn.switch_mlp.up_proj": {
780
+ "group_size": 32,
781
+ "bits": 4
782
+ },
783
+ "model.layers.14.ffn.switch_mlp.down_proj": {
784
+ "group_size": 32,
785
+ "bits": 4
786
+ },
787
+ "model.layers.15.ffn.switch_mlp.gate_proj": {
788
+ "group_size": 32,
789
+ "bits": 4
790
+ },
791
+ "model.layers.15.ffn.switch_mlp.up_proj": {
792
+ "group_size": 32,
793
+ "bits": 4
794
+ },
795
+ "model.layers.15.ffn.switch_mlp.down_proj": {
796
+ "group_size": 32,
797
+ "bits": 4
798
+ },
799
+ "model.layers.16.ffn.switch_mlp.gate_proj": {
800
+ "group_size": 32,
801
+ "bits": 4
802
+ },
803
+ "model.layers.16.ffn.switch_mlp.up_proj": {
804
+ "group_size": 32,
805
+ "bits": 4
806
+ },
807
+ "model.layers.16.ffn.switch_mlp.down_proj": {
808
+ "group_size": 32,
809
+ "bits": 4
810
+ },
811
+ "model.layers.17.ffn.switch_mlp.gate_proj": {
812
+ "group_size": 32,
813
+ "bits": 4
814
+ },
815
+ "model.layers.17.ffn.switch_mlp.up_proj": {
816
+ "group_size": 32,
817
+ "bits": 4
818
+ },
819
+ "model.layers.17.ffn.switch_mlp.down_proj": {
820
+ "group_size": 32,
821
+ "bits": 4
822
+ },
823
+ "model.layers.18.ffn.switch_mlp.gate_proj": {
824
+ "group_size": 32,
825
+ "bits": 4
826
+ },
827
+ "model.layers.18.ffn.switch_mlp.up_proj": {
828
+ "group_size": 32,
829
+ "bits": 4
830
+ },
831
+ "model.layers.18.ffn.switch_mlp.down_proj": {
832
+ "group_size": 32,
833
+ "bits": 4
834
+ },
835
+ "model.layers.19.ffn.switch_mlp.gate_proj": {
836
+ "group_size": 32,
837
+ "bits": 4
838
+ },
839
+ "model.layers.19.ffn.switch_mlp.up_proj": {
840
+ "group_size": 32,
841
+ "bits": 4
842
+ },
843
+ "model.layers.19.ffn.switch_mlp.down_proj": {
844
+ "group_size": 32,
845
+ "bits": 4
846
+ },
847
+ "model.layers.20.ffn.switch_mlp.gate_proj": {
848
+ "group_size": 32,
849
+ "bits": 4
850
+ },
851
+ "model.layers.20.ffn.switch_mlp.up_proj": {
852
+ "group_size": 32,
853
+ "bits": 4
854
+ },
855
+ "model.layers.20.ffn.switch_mlp.down_proj": {
856
+ "group_size": 32,
857
+ "bits": 4
858
+ },
859
+ "model.layers.21.ffn.switch_mlp.gate_proj": {
860
+ "group_size": 32,
861
+ "bits": 4
862
+ },
863
+ "model.layers.21.ffn.switch_mlp.up_proj": {
864
+ "group_size": 32,
865
+ "bits": 4
866
+ },
867
+ "model.layers.21.ffn.switch_mlp.down_proj": {
868
+ "group_size": 32,
869
+ "bits": 4
870
+ },
871
+ "model.layers.22.ffn.switch_mlp.gate_proj": {
872
+ "group_size": 32,
873
+ "bits": 4
874
+ },
875
+ "model.layers.22.ffn.switch_mlp.up_proj": {
876
+ "group_size": 32,
877
+ "bits": 4
878
+ },
879
+ "model.layers.22.ffn.switch_mlp.down_proj": {
880
+ "group_size": 32,
881
+ "bits": 4
882
+ },
883
+ "model.layers.23.ffn.switch_mlp.gate_proj": {
884
+ "group_size": 32,
885
+ "bits": 4
886
+ },
887
+ "model.layers.23.ffn.switch_mlp.up_proj": {
888
+ "group_size": 32,
889
+ "bits": 4
890
+ },
891
+ "model.layers.23.ffn.switch_mlp.down_proj": {
892
+ "group_size": 32,
893
+ "bits": 4
894
+ },
895
+ "model.layers.24.ffn.switch_mlp.gate_proj": {
896
+ "group_size": 32,
897
+ "bits": 4
898
+ },
899
+ "model.layers.24.ffn.switch_mlp.up_proj": {
900
+ "group_size": 32,
901
+ "bits": 4
902
+ },
903
+ "model.layers.24.ffn.switch_mlp.down_proj": {
904
+ "group_size": 32,
905
+ "bits": 4
906
+ },
907
+ "model.layers.25.ffn.switch_mlp.gate_proj": {
908
+ "group_size": 32,
909
+ "bits": 4
910
+ },
911
+ "model.layers.25.ffn.switch_mlp.up_proj": {
912
+ "group_size": 32,
913
+ "bits": 4
914
+ },
915
+ "model.layers.25.ffn.switch_mlp.down_proj": {
916
+ "group_size": 32,
917
+ "bits": 4
918
+ },
919
+ "model.layers.26.ffn.switch_mlp.gate_proj": {
920
+ "group_size": 32,
921
+ "bits": 4
922
+ },
923
+ "model.layers.26.ffn.switch_mlp.up_proj": {
924
+ "group_size": 32,
925
+ "bits": 4
926
+ },
927
+ "model.layers.26.ffn.switch_mlp.down_proj": {
928
+ "group_size": 32,
929
+ "bits": 4
930
+ },
931
+ "model.layers.27.ffn.switch_mlp.gate_proj": {
932
+ "group_size": 32,
933
+ "bits": 4
934
+ },
935
+ "model.layers.27.ffn.switch_mlp.up_proj": {
936
+ "group_size": 32,
937
+ "bits": 4
938
+ },
939
+ "model.layers.27.ffn.switch_mlp.down_proj": {
940
+ "group_size": 32,
941
+ "bits": 4
942
+ },
943
+ "model.layers.28.ffn.switch_mlp.gate_proj": {
944
+ "group_size": 32,
945
+ "bits": 4
946
+ },
947
+ "model.layers.28.ffn.switch_mlp.up_proj": {
948
+ "group_size": 32,
949
+ "bits": 4
950
+ },
951
+ "model.layers.28.ffn.switch_mlp.down_proj": {
952
+ "group_size": 32,
953
+ "bits": 4
954
+ },
955
+ "model.layers.29.ffn.switch_mlp.gate_proj": {
956
+ "group_size": 32,
957
+ "bits": 4
958
+ },
959
+ "model.layers.29.ffn.switch_mlp.up_proj": {
960
+ "group_size": 32,
961
+ "bits": 4
962
+ },
963
+ "model.layers.29.ffn.switch_mlp.down_proj": {
964
+ "group_size": 32,
965
+ "bits": 4
966
+ },
967
+ "model.layers.30.ffn.switch_mlp.gate_proj": {
968
+ "group_size": 32,
969
+ "bits": 4
970
+ },
971
+ "model.layers.30.ffn.switch_mlp.up_proj": {
972
+ "group_size": 32,
973
+ "bits": 4
974
+ },
975
+ "model.layers.30.ffn.switch_mlp.down_proj": {
976
+ "group_size": 32,
977
+ "bits": 4
978
+ },
979
+ "model.layers.31.ffn.switch_mlp.gate_proj": {
980
+ "group_size": 32,
981
+ "bits": 4
982
+ },
983
+ "model.layers.31.ffn.switch_mlp.up_proj": {
984
+ "group_size": 32,
985
+ "bits": 4
986
+ },
987
+ "model.layers.31.ffn.switch_mlp.down_proj": {
988
+ "group_size": 32,
989
+ "bits": 4
990
+ },
991
+ "model.layers.32.ffn.switch_mlp.gate_proj": {
992
+ "group_size": 32,
993
+ "bits": 4
994
+ },
995
+ "model.layers.32.ffn.switch_mlp.up_proj": {
996
+ "group_size": 32,
997
+ "bits": 4
998
+ },
999
+ "model.layers.32.ffn.switch_mlp.down_proj": {
1000
+ "group_size": 32,
1001
+ "bits": 4
1002
+ },
1003
+ "model.layers.33.ffn.switch_mlp.gate_proj": {
1004
+ "group_size": 32,
1005
+ "bits": 4
1006
+ },
1007
+ "model.layers.33.ffn.switch_mlp.up_proj": {
1008
+ "group_size": 32,
1009
+ "bits": 4
1010
+ },
1011
+ "model.layers.33.ffn.switch_mlp.down_proj": {
1012
+ "group_size": 32,
1013
+ "bits": 4
1014
+ },
1015
+ "model.layers.34.ffn.switch_mlp.gate_proj": {
1016
+ "group_size": 32,
1017
+ "bits": 4
1018
+ },
1019
+ "model.layers.34.ffn.switch_mlp.up_proj": {
1020
+ "group_size": 32,
1021
+ "bits": 4
1022
+ },
1023
+ "model.layers.34.ffn.switch_mlp.down_proj": {
1024
+ "group_size": 32,
1025
+ "bits": 4
1026
+ },
1027
+ "model.layers.35.ffn.switch_mlp.gate_proj": {
1028
+ "group_size": 32,
1029
+ "bits": 4
1030
+ },
1031
+ "model.layers.35.ffn.switch_mlp.up_proj": {
1032
+ "group_size": 32,
1033
+ "bits": 4
1034
+ },
1035
+ "model.layers.35.ffn.switch_mlp.down_proj": {
1036
+ "group_size": 32,
1037
+ "bits": 4
1038
+ },
1039
+ "model.layers.36.ffn.switch_mlp.gate_proj": {
1040
+ "group_size": 32,
1041
+ "bits": 4
1042
+ },
1043
+ "model.layers.36.ffn.switch_mlp.up_proj": {
1044
+ "group_size": 32,
1045
+ "bits": 4
1046
+ },
1047
+ "model.layers.36.ffn.switch_mlp.down_proj": {
1048
+ "group_size": 32,
1049
+ "bits": 4
1050
+ },
1051
+ "model.layers.37.ffn.switch_mlp.gate_proj": {
1052
+ "group_size": 32,
1053
+ "bits": 4
1054
+ },
1055
+ "model.layers.37.ffn.switch_mlp.up_proj": {
1056
+ "group_size": 32,
1057
+ "bits": 4
1058
+ },
1059
+ "model.layers.37.ffn.switch_mlp.down_proj": {
1060
+ "group_size": 32,
1061
+ "bits": 4
1062
+ },
1063
+ "model.layers.38.ffn.switch_mlp.gate_proj": {
1064
+ "group_size": 32,
1065
+ "bits": 4
1066
+ },
1067
+ "model.layers.38.ffn.switch_mlp.up_proj": {
1068
+ "group_size": 32,
1069
+ "bits": 4
1070
+ },
1071
+ "model.layers.38.ffn.switch_mlp.down_proj": {
1072
+ "group_size": 32,
1073
+ "bits": 4
1074
+ },
1075
+ "model.layers.39.ffn.switch_mlp.gate_proj": {
1076
+ "group_size": 32,
1077
+ "bits": 4
1078
+ },
1079
+ "model.layers.39.ffn.switch_mlp.up_proj": {
1080
+ "group_size": 32,
1081
+ "bits": 4
1082
+ },
1083
+ "model.layers.39.ffn.switch_mlp.down_proj": {
1084
+ "group_size": 32,
1085
+ "bits": 4
1086
+ },
1087
+ "model.layers.40.ffn.switch_mlp.gate_proj": {
1088
+ "group_size": 32,
1089
+ "bits": 4
1090
+ },
1091
+ "model.layers.40.ffn.switch_mlp.up_proj": {
1092
+ "group_size": 32,
1093
+ "bits": 4
1094
+ },
1095
+ "model.layers.40.ffn.switch_mlp.down_proj": {
1096
+ "group_size": 32,
1097
+ "bits": 4
1098
+ },
1099
+ "model.layers.41.ffn.switch_mlp.gate_proj": {
1100
+ "group_size": 32,
1101
+ "bits": 4
1102
+ },
1103
+ "model.layers.41.ffn.switch_mlp.up_proj": {
1104
+ "group_size": 32,
1105
+ "bits": 4
1106
+ },
1107
+ "model.layers.41.ffn.switch_mlp.down_proj": {
1108
+ "group_size": 32,
1109
+ "bits": 4
1110
+ },
1111
+ "model.layers.42.ffn.switch_mlp.gate_proj": {
1112
+ "group_size": 32,
1113
+ "bits": 4
1114
+ },
1115
+ "model.layers.42.ffn.switch_mlp.up_proj": {
1116
+ "group_size": 32,
1117
+ "bits": 4
1118
+ },
1119
+ "model.layers.42.ffn.switch_mlp.down_proj": {
1120
+ "group_size": 32,
1121
+ "bits": 4
1122
+ }
1123
  },
1124
  "rms_norm_eps": 1e-06,
1125
  "rope_scaling": {
 
1129
  "original_max_position_embeddings": 65536,
1130
  "type": "yarn"
1131
  },
1132
+ "rope_theta": 10000.0,
1133
  "routed_scaling_factor": 1.5,
1134
  "scoring_func": "sqrtsoftplus",
1135
  "sliding_window": 128,
model-00001-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fef56d4a4dfbe23124b35ab58364423b42013fb2ee5937cf22bc36b4559c552b
3
+ size 4478654688
model-00002-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1c18825aa46b5c72382d8c1af673c5bcd007e22634a2cc6dabecbffd73ff866
3
+ size 5331061438
model-00003-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:665a1470a6e3fdc71bc77520a21f25056bcaff12f9be24121af8585705e5fc30
3
+ size 5316570854
model-00004-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c52b546ec7c0914b77a2aac12d2dc9f10706976f8570d66349770c6cbd6581d
3
+ size 4385332932
model-00005-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47f81e2aebbf14001a10d8f3ff82b73caa4459aa913b6b0bc4eb7a95ae1736b6
3
+ size 5316570901
model-00006-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:325f47003b727e51e6f7fd3abece5e60e941e6a2d4564c84f5aa9615cbb6c769
3
+ size 4333193053
model-00007-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a7c9db488d14e689f6b1e8e4ed726d3024fe90e2c947c08835342dac99c1248
3
+ size 5324857027
model-00008-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf22873f77046371a410e63b9012bc537249525e85fe4410b9833cb96ce99d50
3
+ size 5316570900
model-00009-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8aecbea8d37b641bda95a3760b9f3f8523481b9dde015443adbaf4cf628619d4
3
+ size 4385332980
model-00010-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7953f06831fce22a493edf3304cd8d77da6b5409c59b0828ef19f3ae9794f954
3
+ size 5316570957
model-00011-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366876c173e6003245fab6824825559b11c3ebcfeeb65119b62d7a9af5829229
3
+ size 4333193153
model-00012-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d6201e1dc5abef1c5df9622ba5319ef44d123d7752cac684bf210479fc2c831
3
+ size 5324857101
model-00013-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f14c642519e6c042b2cf57c07087d044a4a44f15061889e2216d491dc75ba872
3
+ size 5316570956
model-00014-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a088dd92e2381dffb8606e7d9c3fffa05283e7b22e1b22464b99b817d104d172
3
+ size 4385333000
model-00015-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a1eb94c4e78a2bdcc6c7a86c46d8135fc5b521be42d2bea0f0a4cd9510184c4
3
+ size 5316570957
model-00016-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c04bf745b3d1aadd8fc47ab8fb70c1589b55ce3c47e72063980e04db7ab7b52
3
+ size 4333193165
model-00017-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0078ef7238ff727affa7c4ed8092766b2b1231969ae04076de8aea8f17afb049
3
+ size 5324857103
model-00018-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3471c74a7c1f5bddcc66b5913d3935b95b2f120e4b7e6863096d2866d44e28d3
3
+ size 5316570924
model-00019-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:382ec301b4be7bdc16e0234f7a144c946f8171bb6fcd5d35352f3ff758018947
3
+ size 4385332994
model-00020-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e7aefe87ce91213491c7dced136eb0b9aa18a884f428e50f5f66de8edf10a3f
3
+ size 5316570957
model-00021-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8882b4837ecb9069535a3fbb0f5f86cf973de83833d8af7389a494d3ac5f904
3
+ size 4333193125
model-00022-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c22f242ecbdfd17d50992870209a1faea7e069f5e6629fc0a8b0d4988fcf137
3
+ size 5324857103
model-00023-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c8452eef7543f40181f1b76644e2f04be516ae6ea549feab36550dc38fcc6b9
3
+ size 5316570956
model-00024-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5ef8741f6487c56a89afc9d4c0f9c2b25a38d8cf8c45d2a97340b4500beef9f
3
+ size 4385332968
model-00025-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27b4cea1d20d31b0c56a007b7ec7509a7901ba1bf6080717456fd7290562bf7f
3
+ size 5316570951
model-00026-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3181ff0bfe73526da708471864022f62f7eb322701e94ba3fb6ab42371c0585e
3
+ size 4333193163
model-00027-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e30ed06b7ad2b2eebb22274c1be460b0b9924fefff55261fc1f8629a3b6bbc4e
3
+ size 5324857031
model-00028-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:956ad1192b8a17ffc7e3f0e112143c1cbda6fcb013bb96c47bb7dda5c8963ebc
3
+ size 5316570954
model-00029-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff72aa3b335a9d17db9b64af993c7f45db4edf07f715a0094376cd4a58deb6c1
3
+ size 4385332998
model-00030-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5275001c4459ec1c384039d32a5020f2d4989ec3fa92e7bb96a92a6b2a8f3d76
3
+ size 5316570957
model-00031-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:839788481e65983638cd79e018310d76055a83c2fb3a664fc486e0251d61210d
3
+ size 4333193147
model-00032-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bedc5b6b28f486a2fa47e999531a24c2609515dd12f1db6cffa7bbc6526ec9f
3
+ size 5324857063
model-00033-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2eb6adfb1dc5d9c0e05340aadb35118cbcd429449f14661a90eaef28d0240d1
3
+ size 5316570906
model-00034-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a829039a565e4e72b1c61a9dcf8906e0569fabcea89e5b7e112657ad64c260d8
3
+ size 4385333000
model-00035-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a8ecf35a3e7f5b3f8185cb19f682bba1587db608ae804f20985e42ba23c7838
3
+ size 5316570953
model-00036-of-00036.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee821e8707322b55b43ed13d033ad662687753f1310202dd7de192229baa0371
3
+ size 4566567283
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,10 +1,14 @@
1
  {
2
  "backend": "tokenizers",
3
  "bos_token": "<|begin▁of▁sentence|>",
 
4
  "eos_token": "<|end▁of▁sentence|>",
 
 
 
5
  "model_max_length": 1048576,
6
  "pad_token": "<|end▁of▁sentence|>",
 
7
  "tokenizer_class": "TokenizersBackend",
8
- "trust_remote_code": false,
9
  "unk_token": null
10
  }
 
1
  {
2
  "backend": "tokenizers",
3
  "bos_token": "<|begin▁of▁sentence|>",
4
+ "clean_up_tokenization_spaces": false,
5
  "eos_token": "<|end▁of▁sentence|>",
6
+ "is_local": true,
7
+ "legacy": true,
8
+ "local_files_only": false,
9
  "model_max_length": 1048576,
10
  "pad_token": "<|end▁of▁sentence|>",
11
+ "sp_model_kwargs": {},
12
  "tokenizer_class": "TokenizersBackend",
 
13
  "unk_token": null
14
  }