anemll commited on
Commit
3aa5c60
·
verified ·
1 Parent(s): 0cbbb98

Add files using upload-large-folder tool

Browse files
.cache/.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ dense/model-dense.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-V4-Flash
5
+ library_name: llama.cpp
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - gguf
9
+ - deepseek-v4
10
+ - deepseek-v4-flash
11
+ - flash-moe
12
+ - slot-bank
13
+ - ssd
14
+ - fp8
15
+ - fp4
16
+ - mxfp4
17
+ - metal
18
+ ---
19
+
20
+ # DeepSeek V4 Flash FP4/FP8 SSD Flash-MoE Package
21
+
22
+ This repository contains an SSD Flash-MoE package for DeepSeek V4 Flash.
23
+ It is intended for runtimes that can load a dense GGUF plus a routed expert
24
+ sidecar.
25
+
26
+ ## Quantization
27
+
28
+ - Dense/shared tensors: native DeepSeek FP8, represented as `F8_E4M3_B128` in GGUF.
29
+ - Routed MoE expert tensors: native DeepSeek FP4, represented as `MXFP4` in the sidecar manifest.
30
+ - Embeddings, output, norms, routing metadata, and IDs may remain `BF16`, `F32`, or `I32` where appropriate.
31
+
32
+ The routed expert tensors are not stored in the dense GGUF. They are stored in
33
+ the sidecar as layer-major binary banks.
34
+
35
+ ## Files
36
+
37
+ ```text
38
+ dense/
39
+ model-dense.gguf
40
+ flashmoe-package.json
41
+
42
+ sidecar/
43
+ manifest.json
44
+ layer_000.bin
45
+ ...
46
+ layer_042.bin
47
+ ```
48
+
49
+ ## Model Details
50
+
51
+ - Architecture: `deepseek4`
52
+ - Blocks: `43`
53
+ - Experts: `256`
54
+ - Active experts per token: `6`
55
+ - Context length: `1048576`
56
+ - Dense GGUF tensors: `1199`
57
+ - Routed expert sidecar entries: `129`
58
+
59
+ ## Example
60
+
61
+ ```bash
62
+ ./build/bin/llama-cli \
63
+ -m dense/model-dense.gguf \
64
+ --moe-mode slot-bank \
65
+ --moe-sidecar sidecar \
66
+ --moe-slot-bank 96 \
67
+ --moe-topk 6 \
68
+ -ngl 999 \
69
+ --moe-cache-io-split 4 \
70
+ -c 8192 \
71
+ -b 128 \
72
+ -ub 1 \
73
+ --no-warmup \
74
+ -p "What is Apple Neural Engine?" \
75
+ -n 256
76
+ ```
77
+
78
+ This package is not a standalone dense-only GGUF. Use a Flash-MoE aware
79
+ llama.cpp build that supports DeepSeek V4 Flash, slot-bank mode, and the
80
+ native FP8/FP4 tensor types.
dense/flashmoe-package.json ADDED
@@ -0,0 +1,856 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "schema_version": 1,
3
+ "kind": "flashmoe_dense_package",
4
+ "generated_at": "2026-04-29T00:24:26.809130+00:00",
5
+ "source": {
6
+ "model_files": [
7
+ "/Volumes/SN8100/DS/DeepSeek-V4-Flash-FP4-FP8-native.gguf"
8
+ ],
9
+ "sidecar_manifest": "/Volumes/SN8100/DS/DeepSeek-V4-Flash-FP4-FP8-native-sidecar/manifest.json",
10
+ "layer_filter": null
11
+ },
12
+ "model": {
13
+ "arch": "deepseek4",
14
+ "name": "DeepSeek V4 Flash",
15
+ "block_count": 43,
16
+ "leading_dense_block_count": 0,
17
+ "expert_count": 256,
18
+ "expert_used_count": 6,
19
+ "context_length": 1048576,
20
+ "embedding_length": 4096,
21
+ "file_type": 41
22
+ },
23
+ "dense_model": {
24
+ "path": "model-dense.gguf",
25
+ "tensor_count": 1199,
26
+ "total_bytes": 8973123932
27
+ },
28
+ "routed_removed": {
29
+ "tensor_count": 129,
30
+ "total_bytes": 147169738752,
31
+ "summary": {
32
+ "tensor_count": 129,
33
+ "total_bytes": 147169738752,
34
+ "by_family": {
35
+ "ffn_gate_exps": {
36
+ "tensors": 43,
37
+ "bytes": 49056579584,
38
+ "layers": 43
39
+ },
40
+ "ffn_up_exps": {
41
+ "tensors": 43,
42
+ "bytes": 49056579584,
43
+ "layers": 43
44
+ },
45
+ "ffn_down_exps": {
46
+ "tensors": 43,
47
+ "bytes": 49056579584,
48
+ "layers": 43
49
+ }
50
+ },
51
+ "by_layer": {
52
+ "0": {
53
+ "tensors": 3,
54
+ "bytes": 3422552064,
55
+ "families": [
56
+ "ffn_gate_exps",
57
+ "ffn_up_exps",
58
+ "ffn_down_exps"
59
+ ]
60
+ },
61
+ "1": {
62
+ "tensors": 3,
63
+ "bytes": 3422552064,
64
+ "families": [
65
+ "ffn_gate_exps",
66
+ "ffn_up_exps",
67
+ "ffn_down_exps"
68
+ ]
69
+ },
70
+ "2": {
71
+ "tensors": 3,
72
+ "bytes": 3422552064,
73
+ "families": [
74
+ "ffn_gate_exps",
75
+ "ffn_up_exps",
76
+ "ffn_down_exps"
77
+ ]
78
+ },
79
+ "3": {
80
+ "tensors": 3,
81
+ "bytes": 3422552064,
82
+ "families": [
83
+ "ffn_gate_exps",
84
+ "ffn_up_exps",
85
+ "ffn_down_exps"
86
+ ]
87
+ },
88
+ "4": {
89
+ "tensors": 3,
90
+ "bytes": 3422552064,
91
+ "families": [
92
+ "ffn_gate_exps",
93
+ "ffn_up_exps",
94
+ "ffn_down_exps"
95
+ ]
96
+ },
97
+ "5": {
98
+ "tensors": 3,
99
+ "bytes": 3422552064,
100
+ "families": [
101
+ "ffn_gate_exps",
102
+ "ffn_up_exps",
103
+ "ffn_down_exps"
104
+ ]
105
+ },
106
+ "6": {
107
+ "tensors": 3,
108
+ "bytes": 3422552064,
109
+ "families": [
110
+ "ffn_gate_exps",
111
+ "ffn_up_exps",
112
+ "ffn_down_exps"
113
+ ]
114
+ },
115
+ "7": {
116
+ "tensors": 3,
117
+ "bytes": 3422552064,
118
+ "families": [
119
+ "ffn_gate_exps",
120
+ "ffn_up_exps",
121
+ "ffn_down_exps"
122
+ ]
123
+ },
124
+ "8": {
125
+ "tensors": 3,
126
+ "bytes": 3422552064,
127
+ "families": [
128
+ "ffn_gate_exps",
129
+ "ffn_up_exps",
130
+ "ffn_down_exps"
131
+ ]
132
+ },
133
+ "9": {
134
+ "tensors": 3,
135
+ "bytes": 3422552064,
136
+ "families": [
137
+ "ffn_gate_exps",
138
+ "ffn_up_exps",
139
+ "ffn_down_exps"
140
+ ]
141
+ },
142
+ "10": {
143
+ "tensors": 3,
144
+ "bytes": 3422552064,
145
+ "families": [
146
+ "ffn_gate_exps",
147
+ "ffn_up_exps",
148
+ "ffn_down_exps"
149
+ ]
150
+ },
151
+ "11": {
152
+ "tensors": 3,
153
+ "bytes": 3422552064,
154
+ "families": [
155
+ "ffn_gate_exps",
156
+ "ffn_up_exps",
157
+ "ffn_down_exps"
158
+ ]
159
+ },
160
+ "12": {
161
+ "tensors": 3,
162
+ "bytes": 3422552064,
163
+ "families": [
164
+ "ffn_gate_exps",
165
+ "ffn_up_exps",
166
+ "ffn_down_exps"
167
+ ]
168
+ },
169
+ "13": {
170
+ "tensors": 3,
171
+ "bytes": 3422552064,
172
+ "families": [
173
+ "ffn_gate_exps",
174
+ "ffn_up_exps",
175
+ "ffn_down_exps"
176
+ ]
177
+ },
178
+ "14": {
179
+ "tensors": 3,
180
+ "bytes": 3422552064,
181
+ "families": [
182
+ "ffn_gate_exps",
183
+ "ffn_up_exps",
184
+ "ffn_down_exps"
185
+ ]
186
+ },
187
+ "15": {
188
+ "tensors": 3,
189
+ "bytes": 3422552064,
190
+ "families": [
191
+ "ffn_gate_exps",
192
+ "ffn_up_exps",
193
+ "ffn_down_exps"
194
+ ]
195
+ },
196
+ "16": {
197
+ "tensors": 3,
198
+ "bytes": 3422552064,
199
+ "families": [
200
+ "ffn_gate_exps",
201
+ "ffn_up_exps",
202
+ "ffn_down_exps"
203
+ ]
204
+ },
205
+ "17": {
206
+ "tensors": 3,
207
+ "bytes": 3422552064,
208
+ "families": [
209
+ "ffn_gate_exps",
210
+ "ffn_up_exps",
211
+ "ffn_down_exps"
212
+ ]
213
+ },
214
+ "18": {
215
+ "tensors": 3,
216
+ "bytes": 3422552064,
217
+ "families": [
218
+ "ffn_gate_exps",
219
+ "ffn_up_exps",
220
+ "ffn_down_exps"
221
+ ]
222
+ },
223
+ "19": {
224
+ "tensors": 3,
225
+ "bytes": 3422552064,
226
+ "families": [
227
+ "ffn_gate_exps",
228
+ "ffn_up_exps",
229
+ "ffn_down_exps"
230
+ ]
231
+ },
232
+ "20": {
233
+ "tensors": 3,
234
+ "bytes": 3422552064,
235
+ "families": [
236
+ "ffn_gate_exps",
237
+ "ffn_up_exps",
238
+ "ffn_down_exps"
239
+ ]
240
+ },
241
+ "21": {
242
+ "tensors": 3,
243
+ "bytes": 3422552064,
244
+ "families": [
245
+ "ffn_gate_exps",
246
+ "ffn_up_exps",
247
+ "ffn_down_exps"
248
+ ]
249
+ },
250
+ "22": {
251
+ "tensors": 3,
252
+ "bytes": 3422552064,
253
+ "families": [
254
+ "ffn_gate_exps",
255
+ "ffn_up_exps",
256
+ "ffn_down_exps"
257
+ ]
258
+ },
259
+ "23": {
260
+ "tensors": 3,
261
+ "bytes": 3422552064,
262
+ "families": [
263
+ "ffn_gate_exps",
264
+ "ffn_up_exps",
265
+ "ffn_down_exps"
266
+ ]
267
+ },
268
+ "24": {
269
+ "tensors": 3,
270
+ "bytes": 3422552064,
271
+ "families": [
272
+ "ffn_gate_exps",
273
+ "ffn_up_exps",
274
+ "ffn_down_exps"
275
+ ]
276
+ },
277
+ "25": {
278
+ "tensors": 3,
279
+ "bytes": 3422552064,
280
+ "families": [
281
+ "ffn_gate_exps",
282
+ "ffn_up_exps",
283
+ "ffn_down_exps"
284
+ ]
285
+ },
286
+ "26": {
287
+ "tensors": 3,
288
+ "bytes": 3422552064,
289
+ "families": [
290
+ "ffn_gate_exps",
291
+ "ffn_up_exps",
292
+ "ffn_down_exps"
293
+ ]
294
+ },
295
+ "27": {
296
+ "tensors": 3,
297
+ "bytes": 3422552064,
298
+ "families": [
299
+ "ffn_gate_exps",
300
+ "ffn_up_exps",
301
+ "ffn_down_exps"
302
+ ]
303
+ },
304
+ "28": {
305
+ "tensors": 3,
306
+ "bytes": 3422552064,
307
+ "families": [
308
+ "ffn_gate_exps",
309
+ "ffn_up_exps",
310
+ "ffn_down_exps"
311
+ ]
312
+ },
313
+ "29": {
314
+ "tensors": 3,
315
+ "bytes": 3422552064,
316
+ "families": [
317
+ "ffn_gate_exps",
318
+ "ffn_up_exps",
319
+ "ffn_down_exps"
320
+ ]
321
+ },
322
+ "30": {
323
+ "tensors": 3,
324
+ "bytes": 3422552064,
325
+ "families": [
326
+ "ffn_gate_exps",
327
+ "ffn_up_exps",
328
+ "ffn_down_exps"
329
+ ]
330
+ },
331
+ "31": {
332
+ "tensors": 3,
333
+ "bytes": 3422552064,
334
+ "families": [
335
+ "ffn_gate_exps",
336
+ "ffn_up_exps",
337
+ "ffn_down_exps"
338
+ ]
339
+ },
340
+ "32": {
341
+ "tensors": 3,
342
+ "bytes": 3422552064,
343
+ "families": [
344
+ "ffn_gate_exps",
345
+ "ffn_up_exps",
346
+ "ffn_down_exps"
347
+ ]
348
+ },
349
+ "33": {
350
+ "tensors": 3,
351
+ "bytes": 3422552064,
352
+ "families": [
353
+ "ffn_gate_exps",
354
+ "ffn_up_exps",
355
+ "ffn_down_exps"
356
+ ]
357
+ },
358
+ "34": {
359
+ "tensors": 3,
360
+ "bytes": 3422552064,
361
+ "families": [
362
+ "ffn_gate_exps",
363
+ "ffn_up_exps",
364
+ "ffn_down_exps"
365
+ ]
366
+ },
367
+ "35": {
368
+ "tensors": 3,
369
+ "bytes": 3422552064,
370
+ "families": [
371
+ "ffn_gate_exps",
372
+ "ffn_up_exps",
373
+ "ffn_down_exps"
374
+ ]
375
+ },
376
+ "36": {
377
+ "tensors": 3,
378
+ "bytes": 3422552064,
379
+ "families": [
380
+ "ffn_gate_exps",
381
+ "ffn_up_exps",
382
+ "ffn_down_exps"
383
+ ]
384
+ },
385
+ "37": {
386
+ "tensors": 3,
387
+ "bytes": 3422552064,
388
+ "families": [
389
+ "ffn_gate_exps",
390
+ "ffn_up_exps",
391
+ "ffn_down_exps"
392
+ ]
393
+ },
394
+ "38": {
395
+ "tensors": 3,
396
+ "bytes": 3422552064,
397
+ "families": [
398
+ "ffn_gate_exps",
399
+ "ffn_up_exps",
400
+ "ffn_down_exps"
401
+ ]
402
+ },
403
+ "39": {
404
+ "tensors": 3,
405
+ "bytes": 3422552064,
406
+ "families": [
407
+ "ffn_gate_exps",
408
+ "ffn_up_exps",
409
+ "ffn_down_exps"
410
+ ]
411
+ },
412
+ "40": {
413
+ "tensors": 3,
414
+ "bytes": 3422552064,
415
+ "families": [
416
+ "ffn_gate_exps",
417
+ "ffn_up_exps",
418
+ "ffn_down_exps"
419
+ ]
420
+ },
421
+ "41": {
422
+ "tensors": 3,
423
+ "bytes": 3422552064,
424
+ "families": [
425
+ "ffn_gate_exps",
426
+ "ffn_up_exps",
427
+ "ffn_down_exps"
428
+ ]
429
+ },
430
+ "42": {
431
+ "tensors": 3,
432
+ "bytes": 3422552064,
433
+ "families": [
434
+ "ffn_gate_exps",
435
+ "ffn_up_exps",
436
+ "ffn_down_exps"
437
+ ]
438
+ }
439
+ }
440
+ }
441
+ },
442
+ "sidecar": {
443
+ "path": "/Volumes/SN8100/DS/DeepSeek-V4-Flash-FP4-FP8-native-sidecar",
444
+ "summary": {
445
+ "tensor_count": 129,
446
+ "total_bytes": 147169738752,
447
+ "by_family": {
448
+ "ffn_gate_exps": {
449
+ "tensors": 43,
450
+ "bytes": 49056579584,
451
+ "layers": 43
452
+ },
453
+ "ffn_up_exps": {
454
+ "tensors": 43,
455
+ "bytes": 49056579584,
456
+ "layers": 43
457
+ },
458
+ "ffn_down_exps": {
459
+ "tensors": 43,
460
+ "bytes": 49056579584,
461
+ "layers": 43
462
+ }
463
+ },
464
+ "by_layer": {
465
+ "0": {
466
+ "tensors": 3,
467
+ "bytes": 3422552064,
468
+ "families": [
469
+ "ffn_gate_exps",
470
+ "ffn_up_exps",
471
+ "ffn_down_exps"
472
+ ]
473
+ },
474
+ "1": {
475
+ "tensors": 3,
476
+ "bytes": 3422552064,
477
+ "families": [
478
+ "ffn_gate_exps",
479
+ "ffn_up_exps",
480
+ "ffn_down_exps"
481
+ ]
482
+ },
483
+ "2": {
484
+ "tensors": 3,
485
+ "bytes": 3422552064,
486
+ "families": [
487
+ "ffn_gate_exps",
488
+ "ffn_up_exps",
489
+ "ffn_down_exps"
490
+ ]
491
+ },
492
+ "3": {
493
+ "tensors": 3,
494
+ "bytes": 3422552064,
495
+ "families": [
496
+ "ffn_gate_exps",
497
+ "ffn_up_exps",
498
+ "ffn_down_exps"
499
+ ]
500
+ },
501
+ "4": {
502
+ "tensors": 3,
503
+ "bytes": 3422552064,
504
+ "families": [
505
+ "ffn_gate_exps",
506
+ "ffn_up_exps",
507
+ "ffn_down_exps"
508
+ ]
509
+ },
510
+ "5": {
511
+ "tensors": 3,
512
+ "bytes": 3422552064,
513
+ "families": [
514
+ "ffn_gate_exps",
515
+ "ffn_up_exps",
516
+ "ffn_down_exps"
517
+ ]
518
+ },
519
+ "6": {
520
+ "tensors": 3,
521
+ "bytes": 3422552064,
522
+ "families": [
523
+ "ffn_gate_exps",
524
+ "ffn_up_exps",
525
+ "ffn_down_exps"
526
+ ]
527
+ },
528
+ "7": {
529
+ "tensors": 3,
530
+ "bytes": 3422552064,
531
+ "families": [
532
+ "ffn_gate_exps",
533
+ "ffn_up_exps",
534
+ "ffn_down_exps"
535
+ ]
536
+ },
537
+ "8": {
538
+ "tensors": 3,
539
+ "bytes": 3422552064,
540
+ "families": [
541
+ "ffn_gate_exps",
542
+ "ffn_up_exps",
543
+ "ffn_down_exps"
544
+ ]
545
+ },
546
+ "9": {
547
+ "tensors": 3,
548
+ "bytes": 3422552064,
549
+ "families": [
550
+ "ffn_gate_exps",
551
+ "ffn_up_exps",
552
+ "ffn_down_exps"
553
+ ]
554
+ },
555
+ "10": {
556
+ "tensors": 3,
557
+ "bytes": 3422552064,
558
+ "families": [
559
+ "ffn_gate_exps",
560
+ "ffn_up_exps",
561
+ "ffn_down_exps"
562
+ ]
563
+ },
564
+ "11": {
565
+ "tensors": 3,
566
+ "bytes": 3422552064,
567
+ "families": [
568
+ "ffn_gate_exps",
569
+ "ffn_up_exps",
570
+ "ffn_down_exps"
571
+ ]
572
+ },
573
+ "12": {
574
+ "tensors": 3,
575
+ "bytes": 3422552064,
576
+ "families": [
577
+ "ffn_gate_exps",
578
+ "ffn_up_exps",
579
+ "ffn_down_exps"
580
+ ]
581
+ },
582
+ "13": {
583
+ "tensors": 3,
584
+ "bytes": 3422552064,
585
+ "families": [
586
+ "ffn_gate_exps",
587
+ "ffn_up_exps",
588
+ "ffn_down_exps"
589
+ ]
590
+ },
591
+ "14": {
592
+ "tensors": 3,
593
+ "bytes": 3422552064,
594
+ "families": [
595
+ "ffn_gate_exps",
596
+ "ffn_up_exps",
597
+ "ffn_down_exps"
598
+ ]
599
+ },
600
+ "15": {
601
+ "tensors": 3,
602
+ "bytes": 3422552064,
603
+ "families": [
604
+ "ffn_gate_exps",
605
+ "ffn_up_exps",
606
+ "ffn_down_exps"
607
+ ]
608
+ },
609
+ "16": {
610
+ "tensors": 3,
611
+ "bytes": 3422552064,
612
+ "families": [
613
+ "ffn_gate_exps",
614
+ "ffn_up_exps",
615
+ "ffn_down_exps"
616
+ ]
617
+ },
618
+ "17": {
619
+ "tensors": 3,
620
+ "bytes": 3422552064,
621
+ "families": [
622
+ "ffn_gate_exps",
623
+ "ffn_up_exps",
624
+ "ffn_down_exps"
625
+ ]
626
+ },
627
+ "18": {
628
+ "tensors": 3,
629
+ "bytes": 3422552064,
630
+ "families": [
631
+ "ffn_gate_exps",
632
+ "ffn_up_exps",
633
+ "ffn_down_exps"
634
+ ]
635
+ },
636
+ "19": {
637
+ "tensors": 3,
638
+ "bytes": 3422552064,
639
+ "families": [
640
+ "ffn_gate_exps",
641
+ "ffn_up_exps",
642
+ "ffn_down_exps"
643
+ ]
644
+ },
645
+ "20": {
646
+ "tensors": 3,
647
+ "bytes": 3422552064,
648
+ "families": [
649
+ "ffn_gate_exps",
650
+ "ffn_up_exps",
651
+ "ffn_down_exps"
652
+ ]
653
+ },
654
+ "21": {
655
+ "tensors": 3,
656
+ "bytes": 3422552064,
657
+ "families": [
658
+ "ffn_gate_exps",
659
+ "ffn_up_exps",
660
+ "ffn_down_exps"
661
+ ]
662
+ },
663
+ "22": {
664
+ "tensors": 3,
665
+ "bytes": 3422552064,
666
+ "families": [
667
+ "ffn_gate_exps",
668
+ "ffn_up_exps",
669
+ "ffn_down_exps"
670
+ ]
671
+ },
672
+ "23": {
673
+ "tensors": 3,
674
+ "bytes": 3422552064,
675
+ "families": [
676
+ "ffn_gate_exps",
677
+ "ffn_up_exps",
678
+ "ffn_down_exps"
679
+ ]
680
+ },
681
+ "24": {
682
+ "tensors": 3,
683
+ "bytes": 3422552064,
684
+ "families": [
685
+ "ffn_gate_exps",
686
+ "ffn_up_exps",
687
+ "ffn_down_exps"
688
+ ]
689
+ },
690
+ "25": {
691
+ "tensors": 3,
692
+ "bytes": 3422552064,
693
+ "families": [
694
+ "ffn_gate_exps",
695
+ "ffn_up_exps",
696
+ "ffn_down_exps"
697
+ ]
698
+ },
699
+ "26": {
700
+ "tensors": 3,
701
+ "bytes": 3422552064,
702
+ "families": [
703
+ "ffn_gate_exps",
704
+ "ffn_up_exps",
705
+ "ffn_down_exps"
706
+ ]
707
+ },
708
+ "27": {
709
+ "tensors": 3,
710
+ "bytes": 3422552064,
711
+ "families": [
712
+ "ffn_gate_exps",
713
+ "ffn_up_exps",
714
+ "ffn_down_exps"
715
+ ]
716
+ },
717
+ "28": {
718
+ "tensors": 3,
719
+ "bytes": 3422552064,
720
+ "families": [
721
+ "ffn_gate_exps",
722
+ "ffn_up_exps",
723
+ "ffn_down_exps"
724
+ ]
725
+ },
726
+ "29": {
727
+ "tensors": 3,
728
+ "bytes": 3422552064,
729
+ "families": [
730
+ "ffn_gate_exps",
731
+ "ffn_up_exps",
732
+ "ffn_down_exps"
733
+ ]
734
+ },
735
+ "30": {
736
+ "tensors": 3,
737
+ "bytes": 3422552064,
738
+ "families": [
739
+ "ffn_gate_exps",
740
+ "ffn_up_exps",
741
+ "ffn_down_exps"
742
+ ]
743
+ },
744
+ "31": {
745
+ "tensors": 3,
746
+ "bytes": 3422552064,
747
+ "families": [
748
+ "ffn_gate_exps",
749
+ "ffn_up_exps",
750
+ "ffn_down_exps"
751
+ ]
752
+ },
753
+ "32": {
754
+ "tensors": 3,
755
+ "bytes": 3422552064,
756
+ "families": [
757
+ "ffn_gate_exps",
758
+ "ffn_up_exps",
759
+ "ffn_down_exps"
760
+ ]
761
+ },
762
+ "33": {
763
+ "tensors": 3,
764
+ "bytes": 3422552064,
765
+ "families": [
766
+ "ffn_gate_exps",
767
+ "ffn_up_exps",
768
+ "ffn_down_exps"
769
+ ]
770
+ },
771
+ "34": {
772
+ "tensors": 3,
773
+ "bytes": 3422552064,
774
+ "families": [
775
+ "ffn_gate_exps",
776
+ "ffn_up_exps",
777
+ "ffn_down_exps"
778
+ ]
779
+ },
780
+ "35": {
781
+ "tensors": 3,
782
+ "bytes": 3422552064,
783
+ "families": [
784
+ "ffn_gate_exps",
785
+ "ffn_up_exps",
786
+ "ffn_down_exps"
787
+ ]
788
+ },
789
+ "36": {
790
+ "tensors": 3,
791
+ "bytes": 3422552064,
792
+ "families": [
793
+ "ffn_gate_exps",
794
+ "ffn_up_exps",
795
+ "ffn_down_exps"
796
+ ]
797
+ },
798
+ "37": {
799
+ "tensors": 3,
800
+ "bytes": 3422552064,
801
+ "families": [
802
+ "ffn_gate_exps",
803
+ "ffn_up_exps",
804
+ "ffn_down_exps"
805
+ ]
806
+ },
807
+ "38": {
808
+ "tensors": 3,
809
+ "bytes": 3422552064,
810
+ "families": [
811
+ "ffn_gate_exps",
812
+ "ffn_up_exps",
813
+ "ffn_down_exps"
814
+ ]
815
+ },
816
+ "39": {
817
+ "tensors": 3,
818
+ "bytes": 3422552064,
819
+ "families": [
820
+ "ffn_gate_exps",
821
+ "ffn_up_exps",
822
+ "ffn_down_exps"
823
+ ]
824
+ },
825
+ "40": {
826
+ "tensors": 3,
827
+ "bytes": 3422552064,
828
+ "families": [
829
+ "ffn_gate_exps",
830
+ "ffn_up_exps",
831
+ "ffn_down_exps"
832
+ ]
833
+ },
834
+ "41": {
835
+ "tensors": 3,
836
+ "bytes": 3422552064,
837
+ "families": [
838
+ "ffn_gate_exps",
839
+ "ffn_up_exps",
840
+ "ffn_down_exps"
841
+ ]
842
+ },
843
+ "42": {
844
+ "tensors": 3,
845
+ "bytes": 3422552064,
846
+ "families": [
847
+ "ffn_gate_exps",
848
+ "ffn_up_exps",
849
+ "ffn_down_exps"
850
+ ]
851
+ }
852
+ }
853
+ }
854
+ },
855
+ "runtime_hint": null
856
+ }
dense/model-dense.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64c719dac6eb3036a909c94f94d3650c3211448e6a5603fea6d21de48fc1a6bf
3
+ size 8978441472
flashmoe-package.yaml ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ schema_version: 1
2
+ kind: flashmoe_ssd_package
3
+
4
+ model:
5
+ name: "DeepSeek V4 Flash"
6
+ architecture: "deepseek4"
7
+ license: "mit"
8
+ block_count: 43
9
+ context_length: 1048576
10
+ expert_count: 256
11
+ expert_used_count: 6
12
+ leading_dense_block_count: 0
13
+
14
+ quantization:
15
+ dense:
16
+ native_format: "FP8"
17
+ gguf_type: "F8_E4M3_B128"
18
+ notes:
19
+ - "Dense/shared matrix tensors use native DeepSeek FP8 where quantized."
20
+ - "Embeddings, output, norms, routing metadata, and IDs may remain BF16, F32, or I32."
21
+ experts:
22
+ native_format: "FP4"
23
+ gguf_type: "MXFP4"
24
+ block_size: 32
25
+ notes:
26
+ - "Routed MoE experts use native DeepSeek FP4/MXFP4."
27
+ - "Expert tensors are stored outside the dense GGUF in the sidecar layer banks."
28
+
29
+ layout:
30
+ dense_gguf: "dense/model-dense.gguf"
31
+ dense_manifest: "dense/flashmoe-package.json"
32
+ sidecar_manifest: "sidecar/manifest.json"
33
+ sidecar_layers: "sidecar/layer_000.bin..sidecar/layer_042.bin"
34
+ sidecar_layout: "layer_major_whole_tensor"
35
+ include_shared_in_sidecar: false
36
+
37
+ runtime:
38
+ mode: "slot-bank"
39
+ storage: "ssd"
40
+ intended_runtime: "Flash-MoE aware llama.cpp"
41
+ recommended_topk: 6
42
+ example_slot_bank: 96
43
+ example_cache_io_split: 4
sidecar/layer_000.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e131055089d2daeda2e11e48498411708b6758d5992a2c22a59bcb793977d0f1
3
+ size 3422552064
sidecar/layer_001.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:079d078d23ea4ad966be1c4827212aaf90b62d7e95d6a149eb69ca00e65ec3c8
3
+ size 3422552064
sidecar/layer_002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac41ab89041f8125472b84a754ab8ca3ff18c3ed9d3f5b33f7b5bb500be31d30
3
+ size 3422552064
sidecar/layer_003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc0f2864e1b33846fe8e7da436f234b5327f3bbe7930db2f249e68befd6aa0a9
3
+ size 3422552064
sidecar/layer_004.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:132d9e662d85883290f2a4ee865db4e7694e8d14440668a3c86880475de4fa29
3
+ size 3422552064
sidecar/layer_005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16d518a1023377fc51d45c9af261be934a5a802876c9fedfdf4d3bf34c18db39
3
+ size 3422552064
sidecar/layer_006.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82dd5db8d50e52c4710e17c5ba9c666fc0e8da4f3bfeab659eeb6a1803c5fc3d
3
+ size 3422552064
sidecar/layer_007.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:273b62c84a6430eb0df95f2170b0a929db81c33ea25271f39dc51608ba5a1f7e
3
+ size 3422552064
sidecar/layer_008.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef57c4a82302f66b7aab37fd16cb843959bd26e568653d70662bdcc350c1f024
3
+ size 3422552064
sidecar/layer_009.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:577875719de5b2149df0b88487b8dd5b965049287c14efed941e450ce6133a01
3
+ size 3422552064
sidecar/layer_010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf4097c9eaf0e4080916ad660aa8e1b8395550783a8b911f9a9221b451647d9d
3
+ size 3422552064
sidecar/layer_011.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8cae1312bfccdd4be54ec875f7579ab84a0e7ad8e865b43a444f38db29c28dd
3
+ size 3422552064
sidecar/layer_012.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89da09c3ee919131e3cc83f5905f970be4e763924a4c2a61bf40b74aefbcf70b
3
+ size 3422552064
sidecar/layer_013.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d28bc8ff1f9557611ae250cc5e75a65a6e75681a1a191a2060ecf449b7e87d58
3
+ size 3422552064
sidecar/layer_014.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fc5b2db27e9e26f82b29f90d00a2456b545af921c4e08409c92b7cda9d2d2b7
3
+ size 3422552064
sidecar/layer_015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:958aae06421b7e8c9c3a4a02ae0a0ec09e5c75eb9a84387bbf960af00d6fb903
3
+ size 3422552064
sidecar/layer_016.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:263bb6b46af16cba62798922e27b09b674ad4646f94b0cd5f498ee0d8ae96e86
3
+ size 3422552064
sidecar/layer_017.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42ccc7f90f93f85d7112d37535ee75a56ddf1a93356bdefe37b2449c64d6366a
3
+ size 3422552064
sidecar/layer_018.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4189ddda6274fc718e2fd2d4900c9638aab43c0a03facfd39d69e224f35ec003
3
+ size 3422552064
sidecar/layer_019.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7894df7f4f5092d918fc14fe54dbd0d86b840b63727f8f461c76a19f75dd284
3
+ size 3422552064
sidecar/layer_020.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff6636744033c3e2240afc25c70e50a17c8fc9993c6014a470fea8a010022dc0
3
+ size 3422552064
sidecar/layer_021.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2555431ef7a8a66b0907331ed9366645e6c747ac63a46e564182eec56b00dbab
3
+ size 3422552064
sidecar/layer_022.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe20e4201f41ceaf98dfe306beee3b942754926f743dd0740297319ad9448754
3
+ size 3422552064
sidecar/layer_023.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79886ff1898b6927fce4ddc16993ee727f693043fc2c3913bc2e76f88b267374
3
+ size 3422552064
sidecar/layer_024.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6feaa9d3900f2436d6c42f5a131515eccde1c8d2b7ce69383a7fa5a1a32f28bb
3
+ size 3422552064
sidecar/layer_025.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e52ffb3abe0c92bba238b736878656a0126f5fe3dd6a11b2cade29ebc0a3f28f
3
+ size 3422552064
sidecar/layer_026.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29c3b9075a9126c67f50dadffdf628d4395334787b9dc4f443532a4fd49a499f
3
+ size 3422552064
sidecar/layer_027.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d707bc943089a0d37b98b8445abf719a78d536d1171d6dd1e3ba9bc66491b88
3
+ size 3422552064
sidecar/layer_028.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ac56a3bbe130c6ec5ec1f0d1db527b3aebda98f0882fc41e1b846636a42b4b8
3
+ size 3422552064
sidecar/layer_029.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b1da567d02b34c159de2873da9b9f8c8ddf6b255766fb93007b20fe59f69f41
3
+ size 3422552064
sidecar/layer_030.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19d4deae38f01e3ca9b59d2a32bc1b6c9193cf4a96fcf4049592169e1e6603c0
3
+ size 3422552064
sidecar/layer_031.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:196fe6000bf0b0fc6324ea953292070ff92d40b0629d2cdba210ec73f628d532
3
+ size 3422552064
sidecar/layer_032.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a3a22c4a3cfcc2561e7803300123363c947ece97742e8c538107d48aa88c55c
3
+ size 3422552064
sidecar/layer_033.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b49f1bba18cdd94ea3f67c63f69ac397286490b6d7edad9386a342833d09d78e
3
+ size 3422552064
sidecar/layer_034.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84ef9659acff4d2b0cd03cb86c8f4c59fb6a8dae8174b0feb54b320cf3fc4c85
3
+ size 3422552064
sidecar/layer_035.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5191257c911cdfa1b62a14092696978c71be350deed5c24ea598bdb85da68fc1
3
+ size 3422552064
sidecar/layer_036.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ea17989494e885e301f66957d34eed8a04fa47c44a96d9c5a24d0aa87ef2d28
3
+ size 3422552064
sidecar/layer_037.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dacdbf5bac8587c89274a1bfa458663be77e739e8c4e6141c0ac8862c84c818
3
+ size 3422552064
sidecar/layer_038.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75cdedb380e9078435bfd46ba61e7607ec948c55bb4bfe44f4d2627ad469bfef
3
+ size 3422552064
sidecar/layer_039.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ea00fde2ccbb9cf4bff73a503a61b083e93b84643e770bca8658367399da024
3
+ size 3422552064
sidecar/layer_040.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5974286dd4b3bb8946e809ffde53da12baaf9b3e0610be475dbdbc81eb0d68fa
3
+ size 3422552064
sidecar/layer_041.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54dfea7bac34ac119d6f35b86f47060a719b72ea0ff2892c5fe7b1dd74854515
3
+ size 3422552064
sidecar/layer_042.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19eb5c437d0486568b2b7b763780dc77729bd4a9bce1cc8252800268e1cbf674
3
+ size 3422552064
sidecar/manifest.json ADDED
The diff for this file is too large to render. See raw diff