machiabeli commited on 30 days ago

Commit

f43d934

verified ·

1 Parent(s): 59986d5

Add files using upload-large-folder tool

Browse files

Files changed (39) hide show

README.md +86 -0
config.json +111 -0
generation_config.json +9 -0
model-00001-of-00033.safetensors +3 -0
model-00002-of-00033.safetensors +3 -0
model-00003-of-00033.safetensors +3 -0
model-00004-of-00033.safetensors +3 -0
model-00005-of-00033.safetensors +3 -0
model-00006-of-00033.safetensors +3 -0
model-00007-of-00033.safetensors +3 -0
model-00008-of-00033.safetensors +3 -0
model-00009-of-00033.safetensors +3 -0
model-00010-of-00033.safetensors +3 -0
model-00011-of-00033.safetensors +3 -0
model-00012-of-00033.safetensors +3 -0
model-00013-of-00033.safetensors +3 -0
model-00014-of-00033.safetensors +3 -0
model-00015-of-00033.safetensors +3 -0
model-00016-of-00033.safetensors +3 -0
model-00017-of-00033.safetensors +3 -0
model-00018-of-00033.safetensors +3 -0
model-00019-of-00033.safetensors +3 -0
model-00020-of-00033.safetensors +3 -0
model-00021-of-00033.safetensors +3 -0
model-00022-of-00033.safetensors +3 -0
model-00023-of-00033.safetensors +3 -0
model-00024-of-00033.safetensors +3 -0
model-00025-of-00033.safetensors +3 -0
model-00026-of-00033.safetensors +3 -0
model-00027-of-00033.safetensors +3 -0
model-00028-of-00033.safetensors +3 -0
model-00029-of-00033.safetensors +3 -0
model-00030-of-00033.safetensors +3 -0
model-00031-of-00033.safetensors +3 -0
model-00032-of-00033.safetensors +3 -0
model-00033-of-00033.safetensors +3 -0
model.safetensors.index.json +0 -0
tokenizer.json +0 -0
tokenizer_config.json +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,86 @@

+---
+language:
+- en
+library_name: mlx
+license: mit
+pipeline_tag: text-generation
+tags:
+- mlx
+- safetensors
+- deepseek_v4
+- 4-bit
+base_model: deepseek-ai/DeepSeek-V4-Flash
+base_model_relation: quantized
+---
+# DeepSeek-V4-Flash-4bit (MLX)
+4-bit quantized MLX port of [`deepseek-ai/DeepSeek-V4-Flash`](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) for Apple Silicon.
+158B total params (~37B active), 149 GB on disk, fits comfortably on a single M3/M5 Ultra (256GB+).
+## Requires the V4 mlx-lm port
+DeepSeek-V4 is a new architecture (mHC, hash-routed MoE, sqrtsoftplus, Compressor + Indexer for compressed KV) and is **not yet in stock mlx-lm**. To use this model you need the V4 port:
+```bash
+git clone https://github.com/machiabeli/mlx-lm-1.git mlx-lm
+cd mlx-lm && git checkout feat/deepseek-v4
+pip install -e .
+```
+Tracking PR: [`ml-explore/mlx-lm#1189`](https://github.com/ml-explore/mlx-lm/pull/1189). Once merged, `pip install mlx-lm` will work.
+## Usage
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
+out = generate(model, tokenizer, prompt="Q: What is 2+2?\nA:", max_tokens=64)
+print(out)
+```
+## Performance
+Measured on M3 Ultra (512GB) single-node, batch=1:
+| Stage | tok/s |
+|-------|-------|
+| Prompt processing | 6.6 |
+| Generation | **20.2** |
+| Peak RAM | 160 GB |
+Generation throughput includes the **fused Metal kernel for mHC Sinkhorn** added in PR #1189 (1.83x over the Python reference).
+## Source quality caveat
+The bf16 source weights used for this conversion were upcasted from DeepSeek's native FP8 release rather than re-quantized directly from FP8. This stacks two quantization passes (FP8 -> BF16 -> Q4) and may produce slightly worse outputs than a direct FP8 -> Q4 conversion. A re-conversion from native FP8 is planned.
+## Conversion
+```
+mlx_lm.convert \
+  --hf-path deepseek-ai/DeepSeek-V4-Flash \
+  --mlx-path DeepSeek-V4-Flash-4bit \
+  -q --q-bits 4 --q-group-size 64
+```
+Result: 4.506 bits per weight, 33 sharded safetensors.
+## Architecture
+V4 is a substantial step from V3:
+- **mHC (Manifold-constrained Hyper-Connections)** — replaces residual connections with `hc_mult=4` parallel hidden-state copies recombined via a doubly-stochastic Sinkhorn-normalized mix matrix.
+- **Hash-routed MoE** — first 3 layers use a deterministic `tid2eid` table (token id -> expert id) instead of learned routing.
+- **`sqrtsoftplus` scoring** — `sqrt(softplus(x))` instead of softmax for expert scores.
+- **MLA with single shared 512-dim KV head** — broadcast across 64 query heads (no kv_lora_rank up-projection step like V3).
+- **Compressor + Indexer** for compressed KV attention with topk sparse selection (Indexer at compress_ratio=4).
+- **Per-head learnable `attn_sink`** in softmax denominator.
+Full details: [DeepSeek V4 technical report](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf).
+## License
+MIT (matches upstream).

config.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+    "architectures": [
+        "DeepseekV4ForCausalLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "bos_token_id": 0,
+    "compress_ratios": [
+        0,
+        0,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        128,
+        4,
+        0
+    ],
+    "compress_rope_theta": 160000,
+    "eos_token_id": 1,
+    "hc_eps": 1e-06,
+    "hc_mult": 4,
+    "hc_sinkhorn_iters": 20,
+    "head_dim": 512,
+    "hidden_act": "silu",
+    "hidden_size": 4096,
+    "index_head_dim": 128,
+    "index_n_heads": 64,
+    "index_topk": 512,
+    "initializer_range": 0.02,
+    "max_position_embeddings": 1048576,
+    "model_type": "deepseek_v4",
+    "moe_intermediate_size": 2048,
+    "n_routed_experts": 256,
+    "n_shared_experts": 1,
+    "norm_topk_prob": true,
+    "num_attention_heads": 64,
+    "num_experts_per_tok": 6,
+    "num_hash_layers": 3,
+    "num_hidden_layers": 43,
+    "num_key_value_heads": 1,
+    "num_nextn_predict_layers": 1,
+    "o_groups": 8,
+    "o_lora_rank": 1024,
+    "q_lora_rank": 1024,
+    "qk_rope_head_dim": 64,
+    "quantization": {
+        "group_size": 64,
+        "bits": 4,
+        "mode": "affine"
+    },
+    "quantization_config": {
+        "group_size": 64,
+        "bits": 4,
+        "mode": "affine"
+    },
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": {
+        "beta_fast": 32,
+        "beta_slow": 1,
+        "factor": 16,
+        "original_max_position_embeddings": 65536,
+        "type": "yarn"
+    },
+    "rope_theta": 10000,
+    "routed_scaling_factor": 1.5,
+    "scoring_func": "sqrtsoftplus",
+    "sliding_window": 128,
+    "swiglu_limit": 10.0,
+    "tie_word_embeddings": false,
+    "topk_method": "noaux_tc",
+    "torch_dtype": "bfloat16",
+    "transformers_version": "4.57.1",
+    "use_cache": true,
+    "vocab_size": 129280
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "do_sample": true,
+  "temperature": 1.0,
+  "top_p": 1.0,
+  "transformers_version": "4.46.3"
+}

model-00001-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e7138a87342ae2954c877858a6a52ea92135576143b63ebdddbb1fa6bb59f39
+size 5277248520

model-00002-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93ad57b7690068c50c9c43a4ebf7ffc9ccf312f07dfca2bcf57d0f251331d8c3
+size 4928408324

model-00003-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb1e24556cba54e3185ea2c44d4634b6ef38c4ec9b4877c04b3e43611de41aec
+size 5010486184

model-00004-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca045df175c585a0d755a32d3c5026a1df84e2413d57749fa925d86e4a723bd5
+size 4913917832

model-00005-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9d6925a7e285105eadd368ce9c6ed3aa9cf2dc5ae35575915e63be9ea769e48
+size 4922203959

model-00006-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52b8176ecf961fe5d930194abfebc34ba6ed8df3136d174c8fa81936b6668493
+size 5004281917

model-00007-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10abd891ba26ef1441aebc64acc590ab4af2475a63c903a85a264715c1b00892
+size 4913917836

model-00008-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b53574c5b215102c7560ec8c2cf7ed50f078ca846835f3cc68eea2c0a5455634
+size 4922203942

model-00009-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f94875d768916c4a7bb0ff271f1ed99853da86d1eeacecc6c98f44676dc147f
+size 5004281981

model-00010-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e74f6ccaec1c3e3ffa358eb1a8a209dc94229f9a320aa3b96c1a1b0f22602298
+size 4913917897

model-00011-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8516e2abe1f2328f7859333fcdef317af1115638aed21833891cc99530eb8c31
+size 4922204036

model-00012-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8dc326495f409ec525ff0df85d025ab943e05a9ca66f988fbaa7e18d7b823bb
+size 5004281979

model-00013-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:079b0734664a77fdc39b6cd9d4acb27c637cf393a572b1adcd2ba6918de96771
+size 4913917893

model-00014-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0aa5907b79f2d31fcbc50154262f477b7720ce0c8f1d7b9323ac1e32eb75feb
+size 4922204030

model-00015-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7303b1894e40f0dd72dc71013190bab797938247d53f8051be273218d15c7c30
+size 5004281947

model-00016-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9e3d0a26fe0945d830e4a68128a53e65ccdd32a34b34a7e8d735585af66c620
+size 4913917893

model-00017-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9068d7739647067073b6e95ebe9806c90f3deb61d2d7362af07aa6885861f694
+size 4922203990

model-00018-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4372244a76f1ef2747143abaab07d1b693a14857c4dfa20b4943d0bb969b5b8d
+size 5004281977

model-00019-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bac395f3b40e9cb3c5f4759c650be7e559a32bdaf971bbde51514999b37f94ea
+size 4913917891

model-00020-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06a3f9bf84561dd8f96ecb46bd03b731e320de74e0b7204b629f263edc56bf4b
+size 4922203988

model-00021-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f5094800827c830146af680068a76e9b48c32b99e94e0e57350c45b2a49d1b2
+size 5004282031

model-00022-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbde3d2c82a7da1faa0cf30a9ff56d65f784a4de3b6ea950d3b23b8ca7722ce8
+size 4913917893

model-00023-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecd773744c124acd18e3653d6c5ea57caac0dbfa2355444b4e679468b0ae881c
+size 4922204032

model-00024-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bc578510fb5d2467e60da73dadc9f5fcc3781e8fdc273362a51bf9379140dde
+size 5004281971

model-00025-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:512d3a13b02f38331ddd4653787df159004d574599396fed9cd76bedc903944c
+size 4913917839

model-00026-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83f3f5cd8b754fc673db1ab5e1fad083824e51e83fb9ec9f2c01ad743231bef9
+size 4922204028

model-00027-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f0d8f816dd4570fa770564deceaa0152ea2795c0e7e5dcb946215f38c007c67
+size 5004281961

model-00028-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5554bc6a602d14ea94a0e007179e01132ae8255d72eca517836cd99852b3a1e5
+size 4913917891

model-00029-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e7bc7b0b3585b65ec4e41a1808f522c76b6724929eda8d9ef7de0dc44a445dfa
+size 4922204028

model-00030-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac621260f9f5ecd28d91b925026a5e4466f94a553191e8908d0397abbcd6802a
+size 5004281963

model-00031-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cd1f70ade91d09284a3fd0dd3d29e46cc64f97c558a6d0a91fb966dc80672fda
+size 4913917839

model-00032-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0105f55a39e251f4e178631a64805b89a551e30c5265a31e832ec111c838e87
+size 4922204036

model-00033-of-00033.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dfdaa9af3024a428aefba89d3627aed748344df61ac19ca0e41b41f69b610bf2
+size 1523920642

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "model_max_length": 1048576,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "tokenizer_class": "TokenizersBackend",
+  "trust_remote_code": false,
+  "unk_token": null
+}