machiabeli commited on
Commit
f43d934
·
verified ·
1 Parent(s): 59986d5

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: mlx
5
+ license: mit
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - mlx
9
+ - safetensors
10
+ - deepseek_v4
11
+ - 4-bit
12
+ base_model: deepseek-ai/DeepSeek-V4-Flash
13
+ base_model_relation: quantized
14
+ ---
15
+
16
+ # DeepSeek-V4-Flash-4bit (MLX)
17
+
18
+ 4-bit quantized MLX port of [`deepseek-ai/DeepSeek-V4-Flash`](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) for Apple Silicon.
19
+
20
+ 158B total params (~37B active), 149 GB on disk, fits comfortably on a single M3/M5 Ultra (256GB+).
21
+
22
+ ## Requires the V4 mlx-lm port
23
+
24
+ DeepSeek-V4 is a new architecture (mHC, hash-routed MoE, sqrtsoftplus, Compressor + Indexer for compressed KV) and is **not yet in stock mlx-lm**. To use this model you need the V4 port:
25
+
26
+ ```bash
27
+ git clone https://github.com/machiabeli/mlx-lm-1.git mlx-lm
28
+ cd mlx-lm && git checkout feat/deepseek-v4
29
+ pip install -e .
30
+ ```
31
+
32
+ Tracking PR: [`ml-explore/mlx-lm#1189`](https://github.com/ml-explore/mlx-lm/pull/1189). Once merged, `pip install mlx-lm` will work.
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ from mlx_lm import load, generate
38
+
39
+ model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
40
+ out = generate(model, tokenizer, prompt="Q: What is 2+2?\nA:", max_tokens=64)
41
+ print(out)
42
+ ```
43
+
44
+ ## Performance
45
+
46
+ Measured on M3 Ultra (512GB) single-node, batch=1:
47
+
48
+ | Stage | tok/s |
49
+ |-------|-------|
50
+ | Prompt processing | 6.6 |
51
+ | Generation | **20.2** |
52
+ | Peak RAM | 160 GB |
53
+
54
+ Generation throughput includes the **fused Metal kernel for mHC Sinkhorn** added in PR #1189 (1.83x over the Python reference).
55
+
56
+ ## Source quality caveat
57
+
58
+ The bf16 source weights used for this conversion were upcasted from DeepSeek's native FP8 release rather than re-quantized directly from FP8. This stacks two quantization passes (FP8 -> BF16 -> Q4) and may produce slightly worse outputs than a direct FP8 -> Q4 conversion. A re-conversion from native FP8 is planned.
59
+
60
+ ## Conversion
61
+
62
+ ```
63
+ mlx_lm.convert \
64
+ --hf-path deepseek-ai/DeepSeek-V4-Flash \
65
+ --mlx-path DeepSeek-V4-Flash-4bit \
66
+ -q --q-bits 4 --q-group-size 64
67
+ ```
68
+
69
+ Result: 4.506 bits per weight, 33 sharded safetensors.
70
+
71
+ ## Architecture
72
+
73
+ V4 is a substantial step from V3:
74
+
75
+ - **mHC (Manifold-constrained Hyper-Connections)** — replaces residual connections with `hc_mult=4` parallel hidden-state copies recombined via a doubly-stochastic Sinkhorn-normalized mix matrix.
76
+ - **Hash-routed MoE** — first 3 layers use a deterministic `tid2eid` table (token id -> expert id) instead of learned routing.
77
+ - **`sqrtsoftplus` scoring** — `sqrt(softplus(x))` instead of softmax for expert scores.
78
+ - **MLA with single shared 512-dim KV head** — broadcast across 64 query heads (no kv_lora_rank up-projection step like V3).
79
+ - **Compressor + Indexer** for compressed KV attention with topk sparse selection (Indexer at compress_ratio=4).
80
+ - **Per-head learnable `attn_sink`** in softmax denominator.
81
+
82
+ Full details: [DeepSeek V4 technical report](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf).
83
+
84
+ ## License
85
+
86
+ MIT (matches upstream).
config.json ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DeepseekV4ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "compress_ratios": [
9
+ 0,
10
+ 0,
11
+ 4,
12
+ 128,
13
+ 4,
14
+ 128,
15
+ 4,
16
+ 128,
17
+ 4,
18
+ 128,
19
+ 4,
20
+ 128,
21
+ 4,
22
+ 128,
23
+ 4,
24
+ 128,
25
+ 4,
26
+ 128,
27
+ 4,
28
+ 128,
29
+ 4,
30
+ 128,
31
+ 4,
32
+ 128,
33
+ 4,
34
+ 128,
35
+ 4,
36
+ 128,
37
+ 4,
38
+ 128,
39
+ 4,
40
+ 128,
41
+ 4,
42
+ 128,
43
+ 4,
44
+ 128,
45
+ 4,
46
+ 128,
47
+ 4,
48
+ 128,
49
+ 4,
50
+ 128,
51
+ 4,
52
+ 0
53
+ ],
54
+ "compress_rope_theta": 160000,
55
+ "eos_token_id": 1,
56
+ "hc_eps": 1e-06,
57
+ "hc_mult": 4,
58
+ "hc_sinkhorn_iters": 20,
59
+ "head_dim": 512,
60
+ "hidden_act": "silu",
61
+ "hidden_size": 4096,
62
+ "index_head_dim": 128,
63
+ "index_n_heads": 64,
64
+ "index_topk": 512,
65
+ "initializer_range": 0.02,
66
+ "max_position_embeddings": 1048576,
67
+ "model_type": "deepseek_v4",
68
+ "moe_intermediate_size": 2048,
69
+ "n_routed_experts": 256,
70
+ "n_shared_experts": 1,
71
+ "norm_topk_prob": true,
72
+ "num_attention_heads": 64,
73
+ "num_experts_per_tok": 6,
74
+ "num_hash_layers": 3,
75
+ "num_hidden_layers": 43,
76
+ "num_key_value_heads": 1,
77
+ "num_nextn_predict_layers": 1,
78
+ "o_groups": 8,
79
+ "o_lora_rank": 1024,
80
+ "q_lora_rank": 1024,
81
+ "qk_rope_head_dim": 64,
82
+ "quantization": {
83
+ "group_size": 64,
84
+ "bits": 4,
85
+ "mode": "affine"
86
+ },
87
+ "quantization_config": {
88
+ "group_size": 64,
89
+ "bits": 4,
90
+ "mode": "affine"
91
+ },
92
+ "rms_norm_eps": 1e-06,
93
+ "rope_scaling": {
94
+ "beta_fast": 32,
95
+ "beta_slow": 1,
96
+ "factor": 16,
97
+ "original_max_position_embeddings": 65536,
98
+ "type": "yarn"
99
+ },
100
+ "rope_theta": 10000,
101
+ "routed_scaling_factor": 1.5,
102
+ "scoring_func": "sqrtsoftplus",
103
+ "sliding_window": 128,
104
+ "swiglu_limit": 10.0,
105
+ "tie_word_embeddings": false,
106
+ "topk_method": "noaux_tc",
107
+ "torch_dtype": "bfloat16",
108
+ "transformers_version": "4.57.1",
109
+ "use_cache": true,
110
+ "vocab_size": 129280
111
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "do_sample": true,
6
+ "temperature": 1.0,
7
+ "top_p": 1.0,
8
+ "transformers_version": "4.46.3"
9
+ }
model-00001-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e7138a87342ae2954c877858a6a52ea92135576143b63ebdddbb1fa6bb59f39
3
+ size 5277248520
model-00002-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93ad57b7690068c50c9c43a4ebf7ffc9ccf312f07dfca2bcf57d0f251331d8c3
3
+ size 4928408324
model-00003-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb1e24556cba54e3185ea2c44d4634b6ef38c4ec9b4877c04b3e43611de41aec
3
+ size 5010486184
model-00004-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca045df175c585a0d755a32d3c5026a1df84e2413d57749fa925d86e4a723bd5
3
+ size 4913917832
model-00005-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9d6925a7e285105eadd368ce9c6ed3aa9cf2dc5ae35575915e63be9ea769e48
3
+ size 4922203959
model-00006-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52b8176ecf961fe5d930194abfebc34ba6ed8df3136d174c8fa81936b6668493
3
+ size 5004281917
model-00007-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10abd891ba26ef1441aebc64acc590ab4af2475a63c903a85a264715c1b00892
3
+ size 4913917836
model-00008-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b53574c5b215102c7560ec8c2cf7ed50f078ca846835f3cc68eea2c0a5455634
3
+ size 4922203942
model-00009-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f94875d768916c4a7bb0ff271f1ed99853da86d1eeacecc6c98f44676dc147f
3
+ size 5004281981
model-00010-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e74f6ccaec1c3e3ffa358eb1a8a209dc94229f9a320aa3b96c1a1b0f22602298
3
+ size 4913917897
model-00011-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8516e2abe1f2328f7859333fcdef317af1115638aed21833891cc99530eb8c31
3
+ size 4922204036
model-00012-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8dc326495f409ec525ff0df85d025ab943e05a9ca66f988fbaa7e18d7b823bb
3
+ size 5004281979
model-00013-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:079b0734664a77fdc39b6cd9d4acb27c637cf393a572b1adcd2ba6918de96771
3
+ size 4913917893
model-00014-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0aa5907b79f2d31fcbc50154262f477b7720ce0c8f1d7b9323ac1e32eb75feb
3
+ size 4922204030
model-00015-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7303b1894e40f0dd72dc71013190bab797938247d53f8051be273218d15c7c30
3
+ size 5004281947
model-00016-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9e3d0a26fe0945d830e4a68128a53e65ccdd32a34b34a7e8d735585af66c620
3
+ size 4913917893
model-00017-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9068d7739647067073b6e95ebe9806c90f3deb61d2d7362af07aa6885861f694
3
+ size 4922203990
model-00018-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4372244a76f1ef2747143abaab07d1b693a14857c4dfa20b4943d0bb969b5b8d
3
+ size 5004281977
model-00019-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bac395f3b40e9cb3c5f4759c650be7e559a32bdaf971bbde51514999b37f94ea
3
+ size 4913917891
model-00020-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06a3f9bf84561dd8f96ecb46bd03b731e320de74e0b7204b629f263edc56bf4b
3
+ size 4922203988
model-00021-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f5094800827c830146af680068a76e9b48c32b99e94e0e57350c45b2a49d1b2
3
+ size 5004282031
model-00022-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbde3d2c82a7da1faa0cf30a9ff56d65f784a4de3b6ea950d3b23b8ca7722ce8
3
+ size 4913917893
model-00023-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecd773744c124acd18e3653d6c5ea57caac0dbfa2355444b4e679468b0ae881c
3
+ size 4922204032
model-00024-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bc578510fb5d2467e60da73dadc9f5fcc3781e8fdc273362a51bf9379140dde
3
+ size 5004281971
model-00025-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:512d3a13b02f38331ddd4653787df159004d574599396fed9cd76bedc903944c
3
+ size 4913917839
model-00026-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83f3f5cd8b754fc673db1ab5e1fad083824e51e83fb9ec9f2c01ad743231bef9
3
+ size 4922204028
model-00027-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f0d8f816dd4570fa770564deceaa0152ea2795c0e7e5dcb946215f38c007c67
3
+ size 5004281961
model-00028-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5554bc6a602d14ea94a0e007179e01132ae8255d72eca517836cd99852b3a1e5
3
+ size 4913917891
model-00029-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7bc7b0b3585b65ec4e41a1808f522c76b6724929eda8d9ef7de0dc44a445dfa
3
+ size 4922204028
model-00030-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac621260f9f5ecd28d91b925026a5e4466f94a553191e8908d0397abbcd6802a
3
+ size 5004281963
model-00031-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd1f70ade91d09284a3fd0dd3d29e46cc64f97c558a6d0a91fb966dc80672fda
3
+ size 4913917839
model-00032-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0105f55a39e251f4e178631a64805b89a551e30c5265a31e832ec111c838e87
3
+ size 4922204036
model-00033-of-00033.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfdaa9af3024a428aefba89d3627aed748344df61ac19ca0e41b41f69b610bf2
3
+ size 1523920642
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "bos_token": "<|begin▁of▁sentence|>",
4
+ "eos_token": "<|end▁of▁sentence|>",
5
+ "model_max_length": 1048576,
6
+ "pad_token": "<|end▁of▁sentence|>",
7
+ "tokenizer_class": "TokenizersBackend",
8
+ "trust_remote_code": false,
9
+ "unk_token": null
10
+ }