prince-canuma commited on about 1 month ago

Commit

29d71e1

verified ·

1 Parent(s): 3ed8904

Add files using upload-large-folder tool

Browse files

Files changed (40) hide show

README.md +13 -68
config.json +1036 -4
model-00001-of-00036.safetensors +3 -0
model-00002-of-00036.safetensors +3 -0
model-00003-of-00036.safetensors +3 -0
model-00004-of-00036.safetensors +3 -0
model-00005-of-00036.safetensors +3 -0
model-00006-of-00036.safetensors +3 -0
model-00007-of-00036.safetensors +3 -0
model-00008-of-00036.safetensors +3 -0
model-00009-of-00036.safetensors +3 -0
model-00010-of-00036.safetensors +3 -0
model-00011-of-00036.safetensors +3 -0
model-00012-of-00036.safetensors +3 -0
model-00013-of-00036.safetensors +3 -0
model-00014-of-00036.safetensors +3 -0
model-00015-of-00036.safetensors +3 -0
model-00016-of-00036.safetensors +3 -0
model-00017-of-00036.safetensors +3 -0
model-00018-of-00036.safetensors +3 -0
model-00019-of-00036.safetensors +3 -0
model-00020-of-00036.safetensors +3 -0
model-00021-of-00036.safetensors +3 -0
model-00022-of-00036.safetensors +3 -0
model-00023-of-00036.safetensors +3 -0
model-00024-of-00036.safetensors +3 -0
model-00025-of-00036.safetensors +3 -0
model-00026-of-00036.safetensors +3 -0
model-00027-of-00036.safetensors +3 -0
model-00028-of-00036.safetensors +3 -0
model-00029-of-00036.safetensors +3 -0
model-00030-of-00036.safetensors +3 -0
model-00031-of-00036.safetensors +3 -0
model-00032-of-00036.safetensors +3 -0
model-00033-of-00036.safetensors +3 -0
model-00034-of-00036.safetensors +3 -0
model-00035-of-00036.safetensors +3 -0
model-00036-of-00036.safetensors +3 -0
model.safetensors.index.json +0 -0
tokenizer_config.json +5 -1

README.md CHANGED Viewed

@@ -1,86 +1,31 @@
 ---
-language:
-- en
-library_name: mlx
-license: mit
-pipeline_tag: text-generation
 tags:
 - mlx
-- safetensors
-- deepseek_v4
-- 4-bit
-base_model: deepseek-ai/DeepSeek-V4-Flash
-base_model_relation: quantized
 ---
-# DeepSeek-V4-Flash-4bit (MLX)
-4-bit quantized MLX port of [`deepseek-ai/DeepSeek-V4-Flash`](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) for Apple Silicon.
-158B total params (~37B active), 149 GB on disk, fits comfortably on a single M3/M5 Ultra (256GB+).
-## Requires the V4 mlx-lm port
-DeepSeek-V4 is a new architecture (mHC, hash-routed MoE, sqrtsoftplus, Compressor + Indexer for compressed KV) and is **not yet in stock mlx-lm**. To use this model you need the V4 port:
 ```bash
-git clone https://github.com/machiabeli/mlx-lm-1.git mlx-lm
-cd mlx-lm && git checkout feat/deepseek-v4
-pip install -e .
 ```
-Tracking PR: [`ml-explore/mlx-lm#1189`](https://github.com/ml-explore/mlx-lm/pull/1189). Once merged, `pip install mlx-lm` will work.
-## Usage
 ```python
 from mlx_lm import load, generate
 model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
-out = generate(model, tokenizer, prompt="Q: What is 2+2?\nA:", max_tokens=64)
-print(out)
-```
-## Performance
-Measured on M3 Ultra (512GB) single-node, batch=1:
-| Stage | tok/s |
-|-------|-------|
-| Prompt processing | 6.6 |
-| Generation | **20.2** |
-| Peak RAM | 160 GB |
-Generation throughput includes the **fused Metal kernel for mHC Sinkhorn** added in PR #1189 (1.83x over the Python reference).
-## Source quality caveat
-The bf16 source weights used for this conversion were upcasted from DeepSeek's native FP8 release rather than re-quantized directly from FP8. This stacks two quantization passes (FP8 -> BF16 -> Q4) and may produce slightly worse outputs than a direct FP8 -> Q4 conversion. A re-conversion from native FP8 is planned.
-## Conversion
 ```
-mlx_lm.convert \
-  --hf-path deepseek-ai/DeepSeek-V4-Flash \
-  --mlx-path DeepSeek-V4-Flash-4bit \
-  -q --q-bits 4 --q-group-size 64
-```
-Result: 4.506 bits per weight, 33 sharded safetensors.
-## Architecture
-V4 is a substantial step from V3:
-- **mHC (Manifold-constrained Hyper-Connections)** — replaces residual connections with `hc_mult=4` parallel hidden-state copies recombined via a doubly-stochastic Sinkhorn-normalized mix matrix.
-- **Hash-routed MoE** — first 3 layers use a deterministic `tid2eid` table (token id -> expert id) instead of learned routing.
-- **`sqrtsoftplus` scoring** — `sqrt(softplus(x))` instead of softmax for expert scores.
-- **MLA with single shared 512-dim KV head** — broadcast across 64 query heads (no kv_lora_rank up-projection step like V3).
-- **Compressor + Indexer** for compressed KV attention with topk sparse selection (Indexer at compress_ratio=4).
-- **Per-head learnable `attn_sink`** in softmax denominator.
-Full details: [DeepSeek V4 technical report](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf).
-## License
-MIT (matches upstream).

 ---
+language: en
 tags:
 - mlx
+library_name: mlx
+pipeline_tag: text-generation
 ---
+# mlx-community/DeepSeek-V4-Flash-4bit
+## Use with mlx
 ```bash
+pip install mlx-lm
 ```
 ```python
 from mlx_lm import load, generate
 model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-4bit")
+prompt = "hello"
+if tokenizer.chat_template is not None:
+    messages = [{"role": "user", "content": prompt}]
+    prompt = tokenizer.apply_chat_template(
+        messages, add_generation_prompt=True, return_dict=False,
+    )
+response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```

config.json CHANGED Viewed

@@ -51,7 +51,7 @@
         4,
         0
     ],
-    "compress_rope_theta": 160000,
     "eos_token_id": 1,
     "hc_eps": 1e-06,
     "hc_mult": 4,
@@ -82,12 +82,1044 @@
     "quantization": {
         "group_size": 64,
         "bits": 4,
-        "mode": "affine"
     },
     "quantization_config": {
         "group_size": 64,
         "bits": 4,
-        "mode": "affine"
     },
     "rms_norm_eps": 1e-06,
     "rope_scaling": {
@@ -97,7 +1129,7 @@
         "original_max_position_embeddings": 65536,
         "type": "yarn"
     },
-    "rope_theta": 10000,
     "routed_scaling_factor": 1.5,
     "scoring_func": "sqrtsoftplus",
     "sliding_window": 128,

         4,
         0
     ],
+    "compress_rope_theta": 160000.0,
     "eos_token_id": 1,
     "hc_eps": 1e-06,
     "hc_mult": 4,
     "quantization": {
         "group_size": 64,
         "bits": 4,
+        "mode": "affine",
+        "model.layers.0.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.0.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.0.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        }
     },
     "quantization_config": {
         "group_size": 64,
         "bits": 4,
+        "mode": "affine",
+        "model.layers.0.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.0.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.0.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.1.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.2.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.3.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.4.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.5.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.6.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.7.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.8.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.9.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.10.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.11.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.12.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.13.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.14.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.15.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.16.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.17.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.18.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.19.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.20.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.21.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.22.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.23.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.24.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.25.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.26.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.27.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.28.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.29.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.30.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.31.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.32.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.33.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.34.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.35.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.36.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.37.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.38.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.39.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.40.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.41.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.gate_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.up_proj": {
+            "group_size": 32,
+            "bits": 4
+        },
+        "model.layers.42.ffn.switch_mlp.down_proj": {
+            "group_size": 32,
+            "bits": 4
+        }
     },
     "rms_norm_eps": 1e-06,
     "rope_scaling": {
         "original_max_position_embeddings": 65536,
         "type": "yarn"
     },
+    "rope_theta": 10000.0,
     "routed_scaling_factor": 1.5,
     "scoring_func": "sqrtsoftplus",
     "sliding_window": 128,

model-00001-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fef56d4a4dfbe23124b35ab58364423b42013fb2ee5937cf22bc36b4559c552b
+size 4478654688

model-00002-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d1c18825aa46b5c72382d8c1af673c5bcd007e22634a2cc6dabecbffd73ff866
+size 5331061438

model-00003-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:665a1470a6e3fdc71bc77520a21f25056bcaff12f9be24121af8585705e5fc30
+size 5316570854

model-00004-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c52b546ec7c0914b77a2aac12d2dc9f10706976f8570d66349770c6cbd6581d
+size 4385332932

model-00005-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47f81e2aebbf14001a10d8f3ff82b73caa4459aa913b6b0bc4eb7a95ae1736b6
+size 5316570901

model-00006-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:325f47003b727e51e6f7fd3abece5e60e941e6a2d4564c84f5aa9615cbb6c769
+size 4333193053

model-00007-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a7c9db488d14e689f6b1e8e4ed726d3024fe90e2c947c08835342dac99c1248
+size 5324857027

model-00008-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf22873f77046371a410e63b9012bc537249525e85fe4410b9833cb96ce99d50
+size 5316570900

model-00009-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8aecbea8d37b641bda95a3760b9f3f8523481b9dde015443adbaf4cf628619d4
+size 4385332980

model-00010-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7953f06831fce22a493edf3304cd8d77da6b5409c59b0828ef19f3ae9794f954
+size 5316570957

model-00011-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:366876c173e6003245fab6824825559b11c3ebcfeeb65119b62d7a9af5829229
+size 4333193153

model-00012-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d6201e1dc5abef1c5df9622ba5319ef44d123d7752cac684bf210479fc2c831
+size 5324857101

model-00013-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f14c642519e6c042b2cf57c07087d044a4a44f15061889e2216d491dc75ba872
+size 5316570956

model-00014-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a088dd92e2381dffb8606e7d9c3fffa05283e7b22e1b22464b99b817d104d172
+size 4385333000

model-00015-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a1eb94c4e78a2bdcc6c7a86c46d8135fc5b521be42d2bea0f0a4cd9510184c4
+size 5316570957

model-00016-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c04bf745b3d1aadd8fc47ab8fb70c1589b55ce3c47e72063980e04db7ab7b52
+size 4333193165

model-00017-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0078ef7238ff727affa7c4ed8092766b2b1231969ae04076de8aea8f17afb049
+size 5324857103

model-00018-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3471c74a7c1f5bddcc66b5913d3935b95b2f120e4b7e6863096d2866d44e28d3
+size 5316570924

model-00019-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:382ec301b4be7bdc16e0234f7a144c946f8171bb6fcd5d35352f3ff758018947
+size 4385332994

model-00020-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e7aefe87ce91213491c7dced136eb0b9aa18a884f428e50f5f66de8edf10a3f
+size 5316570957

model-00021-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8882b4837ecb9069535a3fbb0f5f86cf973de83833d8af7389a494d3ac5f904
+size 4333193125

model-00022-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c22f242ecbdfd17d50992870209a1faea7e069f5e6629fc0a8b0d4988fcf137
+size 5324857103

model-00023-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c8452eef7543f40181f1b76644e2f04be516ae6ea549feab36550dc38fcc6b9
+size 5316570956

model-00024-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5ef8741f6487c56a89afc9d4c0f9c2b25a38d8cf8c45d2a97340b4500beef9f
+size 4385332968

model-00025-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27b4cea1d20d31b0c56a007b7ec7509a7901ba1bf6080717456fd7290562bf7f
+size 5316570951

model-00026-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3181ff0bfe73526da708471864022f62f7eb322701e94ba3fb6ab42371c0585e
+size 4333193163

model-00027-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e30ed06b7ad2b2eebb22274c1be460b0b9924fefff55261fc1f8629a3b6bbc4e
+size 5324857031

model-00028-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:956ad1192b8a17ffc7e3f0e112143c1cbda6fcb013bb96c47bb7dda5c8963ebc
+size 5316570954

model-00029-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff72aa3b335a9d17db9b64af993c7f45db4edf07f715a0094376cd4a58deb6c1
+size 4385332998

model-00030-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5275001c4459ec1c384039d32a5020f2d4989ec3fa92e7bb96a92a6b2a8f3d76
+size 5316570957

model-00031-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:839788481e65983638cd79e018310d76055a83c2fb3a664fc486e0251d61210d
+size 4333193147

model-00032-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1bedc5b6b28f486a2fa47e999531a24c2609515dd12f1db6cffa7bbc6526ec9f
+size 5324857063

model-00033-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2eb6adfb1dc5d9c0e05340aadb35118cbcd429449f14661a90eaef28d0240d1
+size 5316570906

model-00034-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a829039a565e4e72b1c61a9dcf8906e0569fabcea89e5b7e112657ad64c260d8
+size 4385333000

model-00035-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a8ecf35a3e7f5b3f8185cb19f682bba1587db608ae804f20985e42ba23c7838
+size 5316570953

model-00036-of-00036.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee821e8707322b55b43ed13d033ad662687753f1310202dd7de192229baa0371
+size 4566567283

model.safetensors.index.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -1,10 +1,14 @@
 {
   "backend": "tokenizers",
   "bos_token": "<｜begin▁of▁sentence｜>",
   "eos_token": "<｜end▁of▁sentence｜>",
   "model_max_length": 1048576,
   "pad_token": "<｜end▁of▁sentence｜>",
   "tokenizer_class": "TokenizersBackend",
-  "trust_remote_code": false,
   "unk_token": null
 }

 {
   "backend": "tokenizers",
   "bos_token": "<｜begin▁of▁sentence｜>",
+  "clean_up_tokenization_spaces": false,
   "eos_token": "<｜end▁of▁sentence｜>",
+  "is_local": true,
+  "legacy": true,
+  "local_files_only": false,
   "model_max_length": 1048576,
   "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
   "tokenizer_class": "TokenizersBackend",
   "unk_token": null
 }