Infatoshi commited on
Commit
15d1fc7
·
verified ·
1 Parent(s): d9eef67

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: deepseek
4
+ license_link: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL
5
+ base_model: deepseek-ai/DeepSeek-V4-Flash
6
+ tags:
7
+ - quantized
8
+ - gptq
9
+ - int2
10
+ - moe
11
+ - deepseek
12
+ - deepseek-v4-flash
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # DeepSeek-V4-Flash INT2-G64
17
+
18
+ INT2 group-64 quantization of [DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)'s 256 routed experts. The full 284B-parameter MoE fits in 96 GB of VRAM and runs on a single GPU.
19
+
20
+ **Inference code, kernels, and the full quantization pipeline live at [github.com/Infatoshi/dsv4-int2](https://github.com/Infatoshi/dsv4-int2).** This repository contains weights only — they will not load with vanilla `transformers` or `vllm`.
21
+
22
+ ## Numbers
23
+
24
+ | | |
25
+ |---|---|
26
+ | Checkpoint size | **75 GB** (vs 132 GB MXFP4, 543 GB BF16) |
27
+ | Routed-expert format | INT2 g64, FP16 scale + INT4 zero |
28
+ | Layers | 43 expert MoE layers (one per `layer_NN.safetensors`) |
29
+ | MMLU 0-shot, 14,042 questions, V4 chat template | **72.46%** |
30
+ | Decode throughput, RTX PRO 6000 Blackwell | 17 tok/s eager (reference path; not perf-tuned) |
31
+
32
+ The official BF16 V4-Flash-Base 5-shot MMLU is 88.7%; the gap is partly setup (0-shot vs 5-shot) and partly real quantization cost.
33
+
34
+ ## Format
35
+
36
+ Each `layer_NN.safetensors` holds the routed experts for one MoE layer. For each of the three projections (`w1` gate, `w3` up, `w2` down):
37
+
38
+ - `w_packed`: `[E=256, K_out, K_in/16]` `uint32` — 16 INT2 values per `uint32`
39
+ - `w_scale`: `[E, K_out, K_in/G]` `float16` — per-group of `G=64` input channels
40
+ - `w_zero_packed`: `[E, K_out, K_in/(2G)]` `int8` — INT4 zero-points, two-per-byte
41
+
42
+ Non-expert weights (MLA, embeddings, norms, shared expert, indexer, compressor, head) are NOT in this checkpoint — pull them from the upstream [DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) MXFP4 release. The hybrid loader in the GitHub repo does this automatically.
43
+
44
+ `quant_stats.json` records per-layer GPTQ reconstruction error and routing-coverage stats (RTN-fallback count, visit min/max/median per expert).
45
+
46
+ ## Method
47
+
48
+ Standard GPTQ with INT2 g64, run per-expert. Calibration uses Mistral-7B-v0.1 layer-16 hidden states as the proxy distribution — chosen for portability rather than parity with V4. Two implications worth knowing before quoting these numbers:
49
+
50
+ - Across 41 layers, 211 of 256 routed experts received zero calibration tokens (V4's HC-sinkhorn routing is highly domain-specific and Mistral natural-text activations don't reach all experts). Under-covered experts fall back to per-channel RTN.
51
+ - V4 self-calibration would close this; it is not run here. See `quant/v4_self_calib.py` in the GitHub repo for a starting point.
52
+
53
+ ## Loading
54
+
55
+ This is research code; there is no `from_pretrained` path. To run inference:
56
+
57
+ ```bash
58
+ git clone https://github.com/Infatoshi/dsv4-int2
59
+ cd dsv4-int2
60
+ uv venv && uv sync
61
+
62
+ # point the loader at this checkpoint + the upstream V4-Flash release
63
+ export DSV4_REF=/path/to/DeepSeek-V4-Flash # MXFP4 release (tokenizer + non-expert weights)
64
+ export DSV4_INT2=/path/to/this/checkpoint # this directory
65
+
66
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
67
+ uv run python eval/v4_int2/repl.py
68
+ ```
69
+
70
+ ## Limitations
71
+
72
+ - **Quantization-only.** This is a quant + reference inference path, not a perf-tuned serving stack. Decode hits ~26% of HBM peak.
73
+ - **Custom kernel required.** Cannot be loaded with stock transformers or vLLM. Triton kernels in the GitHub repo handle dequantization on-the-fly.
74
+ - **Calibration coverage gap.** 211/256 experts per layer get zero calibration visits under our setup. Rare-domain quality may be worse than the headline MMLU suggests.
75
+ - **Single-GPU only.** Loader assumes `world_size=1`. No tensor parallelism.
76
+ - **Hardware tested:** RTX PRO 6000 Blackwell SM_120 (96 GB). Other architectures should work via Triton autotune but have not been measured.
77
+
78
+ ## License
79
+
80
+ Source code on GitHub is MIT. These weights are derivatives of DeepSeek-V4-Flash and inherit the [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL).
81
+
82
+ ## Citation
83
+
84
+ ```bibtex
85
+ @misc{dsv4int2,
86
+ title = {dsv4-int2: INT2 quantization of DeepSeek-V4-Flash for single-GPU inference},
87
+ author = {Arledge, Elliot},
88
+ year = {2026},
89
+ url = {https://github.com/Infatoshi/dsv4-int2}
90
+ }
91
+ ```
layer_00.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65efc8d5129ef6ae89dc74121cd00188aa1664b988dee6bc0d3ff9f8cdbffe7e
3
+ size 1862526984
layer_01.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9267c67376da701e28475a87648c77b4bacd3fda74255d2a5ea46dbb49e813eb
3
+ size 1862526984
layer_02.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc5fd16ad3b6bc8b79ddf899144067c8937808545b77e3b4b1715e3236dd1c59
3
+ size 1862526984
layer_03.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd78f5672a751e206167daafba1ecbb5bd8e840730728e7b8ac985fa76a1881c
3
+ size 1862526984
layer_04.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f317860d0cb9ee67cffeab4249a4068f895a5f5a4560808244d83efc2c39eedc
3
+ size 1862526984
layer_05.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64d7106f4a2a9f6fec4e649615fef52a42ef1aae431c011c681960de699985c0
3
+ size 1862526984
layer_06.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:910f534cda33349d58c38ffbe4f79d164e45307c34498a00aa5b572774de99c8
3
+ size 1862526984
layer_07.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8125827cc86436bd2b99620a84baaa3e3520644316586e7df7df203e0d4a6dea
3
+ size 1862526984
layer_08.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93c345bafbfd60ae55cc1c9006a0b3944040a17490c11ddbf7d29b23e77e3b20
3
+ size 1862526984
layer_09.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6883dedc618912fe33afd6e95cf1936e09a53fae9fe491aadd8db8faadd55b4b
3
+ size 1862526984
layer_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8b00b22b0851043cdf7dfece331725586551cc8efabbb5c333e20c2dba0ae47
3
+ size 1862529288
layer_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8b081b04ff241c66f8be2d14a501bdf916e7b1fb6ac8cb12ed545870602cfac
3
+ size 1862529288
layer_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f84144949f4dfa73df10df6b2dc1d109c1c7b7f7557e2b05b8b32ec6da4dee5e
3
+ size 1862529288
layer_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b631cdcc85f960dfbdd5734c220502b0be09cd645add621047888a91d6f8979
3
+ size 1862529288
layer_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5dc0ac9805d325d75ac33415862d3a3936986dc0512fdfa662e8efbdfed67e0
3
+ size 1862529288
layer_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0af0cb88f440d526cc482217c19d965e9e254796c9a2c21de2172d0612a6956c
3
+ size 1862529288
layer_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75b611c5527851c5d955cf7295fec5e54d180f29b3f4a790a95618b8fe0edc30
3
+ size 1862529288
layer_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d87e402a5532cdf18eac78c455fd0f093abe332524eec133124d822e20132a06
3
+ size 1862529288
layer_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8440b33b5561e8b0af982d9ba5a308003296641eaeae62e2b2a490e230bc98c
3
+ size 1862529288
layer_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e985cce134eb7917031e1de5f549f6ba2807ea1537ed1ee74d13d5a65373559
3
+ size 1862529288
layer_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa39c0b722c3b649c147c9697fd60b5eab0aa9a89335db3bff0836670827eb06
3
+ size 1862529288
layer_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afbd21e08c6774ed7ee8e70680eb9353c1911c331f3e4e51bd5531d746c69ee1
3
+ size 1862529288
layer_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abe5873072c47810680843420293aee5042d510c1374ccba117c2ded69d6fe0e
3
+ size 1862529288
layer_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:339dcd50418dbdcfbfcb96ae6e3df9e6cff8e31f10e57776c7d0dc9c57952c48
3
+ size 1862529288
layer_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd050087812249fe67728900363ca996567f38e5b839efe2e4c4360d032b3760
3
+ size 1862529288
layer_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0263720eeb5412cc1dd08b6cbad9dce4cf6c1411f54e8594499d976b3a224e3
3
+ size 1862529288
layer_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5207982d43c48e27ce526ae49c2fd77683c7897f60a87699dce0e9ae7960706f
3
+ size 1862529288
layer_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:432712b3aed0a3d7579b97d9b06a979ce498b7a66c7086a8b8a85c806c21539f
3
+ size 1862529288
layer_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7a3378c5f1ab0abd955c5839a818195014000f6b71d3e90d02efe5ca4971172
3
+ size 1862529288
layer_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2a833193d2976d66b3cfc05d8214b6ed87abd77553c21a15c0bf9bb0ba927eb
3
+ size 1862529288
layer_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16eccf1962a09cf88358a2c62fdba802b50c4774983a15182220e370dc4c69ee
3
+ size 1862529288
layer_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbaf165518c85fa1e1084b56db2ff155db5f3d8197ba14819b9dafc2b35c3238
3
+ size 1862529288
layer_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d767d7931f03f7340b51528b64e860632a7b1133a86e77328b3e57caf717b28
3
+ size 1862529288
layer_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:627ce332da629a064160dc63187bdccf0f8a9c11fd4685e98415a12d76158210
3
+ size 1862529288
layer_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65de880b52ac4dee7f578fe8aa8e24972c27d69a6218e3c4171e1e634dd505df
3
+ size 1862529288
layer_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6f493eedc1751a84c59a55f8197b929b90b61c4f88f9fde27bfe2fadfaddb2c
3
+ size 1862529288
layer_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc440e6d277c7e5a25a1c9125a57a9ec9c1e4b5bf5f7689d6536541caff344e4
3
+ size 1862529288
layer_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:266640e7a0a846772620c3f886d7e4ca1dd4bab5a308ad2700b4b1517d670d5f
3
+ size 1862529288
layer_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bf4b675b0acdbdc2156315119fd3d6f859492e55fbd4a3783e2c286529cdcaa
3
+ size 1862529288
layer_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883076068bf168099eeae703e0adf59836e59a3f36dc66ed1dbf289ff6292216
3
+ size 1862529288
layer_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ced81e608d8c109620d97ca6c6532042202faef1289bec97923be33fb3211e6c
3
+ size 1862529288
layer_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6af4fb89fb72e1c72f584b53ec38132c3fff73c6a62b379c27bc19c6e09ba220
3
+ size 1862529288
layer_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a06a67cf899d3a954a562c86d05c4323ea3534d43f0e58de66f05240428dcc7
3
+ size 1862529288
quant_stats.json ADDED
@@ -0,0 +1,561 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "layer": 0,
4
+ "n_experts": 256,
5
+ "n_rtn_fallback": 20,
6
+ "visit_min": 0,
7
+ "visit_max": 104701,
8
+ "visit_median": 1554,
9
+ "rec_err_mean": {
10
+ "w1": 0.012415080978826154,
11
+ "w2": 0.01343309399089776,
12
+ "w3": 0.01253566544255591
13
+ }
14
+ },
15
+ {
16
+ "layer": 1,
17
+ "n_experts": 256,
18
+ "n_rtn_fallback": 13,
19
+ "visit_min": 0,
20
+ "visit_max": 90933,
21
+ "visit_median": 2058,
22
+ "rec_err_mean": {
23
+ "w1": 0.012326016072620405,
24
+ "w2": 0.013364352460484952,
25
+ "w3": 0.012376106271403842
26
+ }
27
+ },
28
+ {
29
+ "layer": 2,
30
+ "n_experts": 256,
31
+ "n_rtn_fallback": 14,
32
+ "visit_min": 0,
33
+ "visit_max": 89672,
34
+ "visit_median": 1993,
35
+ "rec_err_mean": {
36
+ "w1": 0.012438499546988169,
37
+ "w2": 0.013484358845744282,
38
+ "w3": 0.012577164387039375
39
+ }
40
+ },
41
+ {
42
+ "layer": 3,
43
+ "n_experts": 256,
44
+ "n_rtn_fallback": 28,
45
+ "visit_min": 0,
46
+ "visit_max": 87656,
47
+ "visit_median": 1593,
48
+ "rec_err_mean": {
49
+ "w1": 0.012284352793358266,
50
+ "w2": 0.013022524533880642,
51
+ "w3": 0.01228655110753607
52
+ }
53
+ },
54
+ {
55
+ "layer": 4,
56
+ "n_experts": 256,
57
+ "n_rtn_fallback": 22,
58
+ "visit_min": 0,
59
+ "visit_max": 91759,
60
+ "visit_median": 2509,
61
+ "rec_err_mean": {
62
+ "w1": 0.012322144266363466,
63
+ "w2": 0.013148894046025816,
64
+ "w3": 0.012335499126493232
65
+ }
66
+ },
67
+ {
68
+ "layer": 5,
69
+ "n_experts": 256,
70
+ "n_rtn_fallback": 22,
71
+ "visit_min": 0,
72
+ "visit_max": 53067,
73
+ "visit_median": 2692,
74
+ "rec_err_mean": {
75
+ "w1": 0.012205896578961983,
76
+ "w2": 0.013076323386485456,
77
+ "w3": 0.012235811740538338
78
+ }
79
+ },
80
+ {
81
+ "layer": 6,
82
+ "n_experts": 256,
83
+ "n_rtn_fallback": 13,
84
+ "visit_min": 0,
85
+ "visit_max": 105315,
86
+ "visit_median": 2462,
87
+ "rec_err_mean": {
88
+ "w1": 0.012188850159873255,
89
+ "w2": 0.013173459687095601,
90
+ "w3": 0.012253230812348193
91
+ }
92
+ },
93
+ {
94
+ "layer": 7,
95
+ "n_experts": 256,
96
+ "n_rtn_fallback": 10,
97
+ "visit_min": 0,
98
+ "visit_max": 89430,
99
+ "visit_median": 2105,
100
+ "rec_err_mean": {
101
+ "w1": 0.0121999701623281,
102
+ "w2": 0.013163029976567486,
103
+ "w3": 0.012289142563531641
104
+ }
105
+ },
106
+ {
107
+ "layer": 8,
108
+ "n_experts": 256,
109
+ "n_rtn_fallback": 21,
110
+ "visit_min": 0,
111
+ "visit_max": 69468,
112
+ "visit_median": 2146,
113
+ "rec_err_mean": {
114
+ "w1": 0.012145253331254935,
115
+ "w2": 0.013199328950577183,
116
+ "w3": 0.012249131043063244
117
+ }
118
+ },
119
+ {
120
+ "layer": 9,
121
+ "n_experts": 256,
122
+ "n_rtn_fallback": 20,
123
+ "visit_min": 0,
124
+ "visit_max": 74772,
125
+ "visit_median": 1885,
126
+ "rec_err_mean": {
127
+ "w1": 0.012222615572682116,
128
+ "w2": 0.013086030532576842,
129
+ "w3": 0.012252453707333189
130
+ }
131
+ },
132
+ {
133
+ "layer": 10,
134
+ "n_experts": 256,
135
+ "n_rtn_fallback": 28,
136
+ "visit_min": 0,
137
+ "visit_max": 78211,
138
+ "visit_median": 2122,
139
+ "rec_err_mean": {
140
+ "w1": 0.012135391662013717,
141
+ "w2": 0.01296397465193877,
142
+ "w3": 0.012168045926955529
143
+ }
144
+ },
145
+ {
146
+ "layer": 11,
147
+ "n_experts": 256,
148
+ "n_rtn_fallback": 24,
149
+ "visit_min": 0,
150
+ "visit_max": 59807,
151
+ "visit_median": 2105,
152
+ "rec_err_mean": {
153
+ "w1": 0.012127576275815954,
154
+ "w2": 0.012990429178898921,
155
+ "w3": 0.012161536022176733
156
+ }
157
+ },
158
+ {
159
+ "layer": 12,
160
+ "n_experts": 256,
161
+ "n_rtn_fallback": 23,
162
+ "visit_min": 0,
163
+ "visit_max": 120073,
164
+ "visit_median": 1352,
165
+ "rec_err_mean": {
166
+ "w1": 0.012006504137389129,
167
+ "w2": 0.012896691245259717,
168
+ "w3": 0.012060902728990186
169
+ }
170
+ },
171
+ {
172
+ "layer": 13,
173
+ "n_experts": 256,
174
+ "n_rtn_fallback": 33,
175
+ "visit_min": 0,
176
+ "visit_max": 86203,
177
+ "visit_median": 1580,
178
+ "rec_err_mean": {
179
+ "w1": 0.011981866897258442,
180
+ "w2": 0.012806476159312297,
181
+ "w3": 0.012059378317644587
182
+ }
183
+ },
184
+ {
185
+ "layer": 14,
186
+ "n_experts": 256,
187
+ "n_rtn_fallback": 18,
188
+ "visit_min": 0,
189
+ "visit_max": 84296,
190
+ "visit_median": 2398,
191
+ "rec_err_mean": {
192
+ "w1": 0.012034156163281295,
193
+ "w2": 0.01298547148326179,
194
+ "w3": 0.012165923355496489
195
+ }
196
+ },
197
+ {
198
+ "layer": 15,
199
+ "n_experts": 256,
200
+ "n_rtn_fallback": 17,
201
+ "visit_min": 0,
202
+ "visit_max": 90098,
203
+ "visit_median": 1650,
204
+ "rec_err_mean": {
205
+ "w1": 0.012026469823467778,
206
+ "w2": 0.012949477848451352,
207
+ "w3": 0.012168781540822238
208
+ }
209
+ },
210
+ {
211
+ "layer": 16,
212
+ "n_experts": 256,
213
+ "n_rtn_fallback": 16,
214
+ "visit_min": 0,
215
+ "visit_max": 138818,
216
+ "visit_median": 2080,
217
+ "rec_err_mean": {
218
+ "w1": 0.0119970367049973,
219
+ "w2": 0.013099638956191484,
220
+ "w3": 0.012215566694067093
221
+ }
222
+ },
223
+ {
224
+ "layer": 17,
225
+ "n_experts": 256,
226
+ "n_rtn_fallback": 22,
227
+ "visit_min": 0,
228
+ "visit_max": 81805,
229
+ "visit_median": 1463,
230
+ "rec_err_mean": {
231
+ "w1": 0.011958844468608731,
232
+ "w2": 0.013121607535140356,
233
+ "w3": 0.01224544375872938
234
+ }
235
+ },
236
+ {
237
+ "layer": 18,
238
+ "n_experts": 256,
239
+ "n_rtn_fallback": 17,
240
+ "visit_min": 0,
241
+ "visit_max": 118432,
242
+ "visit_median": 1752,
243
+ "rec_err_mean": {
244
+ "w1": 0.011972683361818781,
245
+ "w2": 0.013337772059458075,
246
+ "w3": 0.01234140323867905
247
+ }
248
+ },
249
+ {
250
+ "layer": 19,
251
+ "n_experts": 256,
252
+ "n_rtn_fallback": 16,
253
+ "visit_min": 0,
254
+ "visit_max": 62625,
255
+ "visit_median": 2217,
256
+ "rec_err_mean": {
257
+ "w1": 0.01252702346755541,
258
+ "w2": 0.01334010675054742,
259
+ "w3": 0.012576570428791456
260
+ }
261
+ },
262
+ {
263
+ "layer": 20,
264
+ "n_experts": 256,
265
+ "n_rtn_fallback": 29,
266
+ "visit_min": 0,
267
+ "visit_max": 98240,
268
+ "visit_median": 1558,
269
+ "rec_err_mean": {
270
+ "w1": 0.012045820782077499,
271
+ "w2": 0.013375726299273083,
272
+ "w3": 0.01248381885670824
273
+ }
274
+ },
275
+ {
276
+ "layer": 21,
277
+ "n_experts": 256,
278
+ "n_rtn_fallback": 23,
279
+ "visit_min": 0,
280
+ "visit_max": 95130,
281
+ "visit_median": 1399,
282
+ "rec_err_mean": {
283
+ "w1": 0.012071445104083978,
284
+ "w2": 0.013229186693934025,
285
+ "w3": 0.012358925163425738
286
+ }
287
+ },
288
+ {
289
+ "layer": 22,
290
+ "n_experts": 256,
291
+ "n_rtn_fallback": 22,
292
+ "visit_min": 0,
293
+ "visit_max": 112438,
294
+ "visit_median": 1997,
295
+ "rec_err_mean": {
296
+ "w1": 0.012246504975337302,
297
+ "w2": 0.013522858513169922,
298
+ "w3": 0.01259598697288311
299
+ }
300
+ },
301
+ {
302
+ "layer": 23,
303
+ "n_experts": 256,
304
+ "n_rtn_fallback": 18,
305
+ "visit_min": 0,
306
+ "visit_max": 92691,
307
+ "visit_median": 2472,
308
+ "rec_err_mean": {
309
+ "w1": 0.012362111072434345,
310
+ "w2": 0.013373729965678649,
311
+ "w3": 0.012536489943158813
312
+ }
313
+ },
314
+ {
315
+ "layer": 24,
316
+ "n_experts": 256,
317
+ "n_rtn_fallback": 14,
318
+ "visit_min": 0,
319
+ "visit_max": 80941,
320
+ "visit_median": 1817,
321
+ "rec_err_mean": {
322
+ "w1": 0.012180447134596761,
323
+ "w2": 0.013405652913206723,
324
+ "w3": 0.012389732190058567
325
+ }
326
+ },
327
+ {
328
+ "layer": 25,
329
+ "n_experts": 256,
330
+ "n_rtn_fallback": 9,
331
+ "visit_min": 0,
332
+ "visit_max": 58896,
333
+ "visit_median": 2313,
334
+ "rec_err_mean": {
335
+ "w1": 0.01233544500428252,
336
+ "w2": 0.013383177894866094,
337
+ "w3": 0.01257531678857049
338
+ }
339
+ },
340
+ {
341
+ "layer": 26,
342
+ "n_experts": 256,
343
+ "n_rtn_fallback": 10,
344
+ "visit_min": 8,
345
+ "visit_max": 100000,
346
+ "visit_median": 2916,
347
+ "rec_err_mean": {
348
+ "w1": 0.012328778535447782,
349
+ "w2": 0.013479596622346435,
350
+ "w3": 0.012599166751897428
351
+ }
352
+ },
353
+ {
354
+ "layer": 27,
355
+ "n_experts": 256,
356
+ "n_rtn_fallback": 10,
357
+ "visit_min": 0,
358
+ "visit_max": 69160,
359
+ "visit_median": 1757,
360
+ "rec_err_mean": {
361
+ "w1": 0.012426368215528782,
362
+ "w2": 0.013210742919909535,
363
+ "w3": 0.012652500234253239
364
+ }
365
+ },
366
+ {
367
+ "layer": 28,
368
+ "n_experts": 256,
369
+ "n_rtn_fallback": 8,
370
+ "visit_min": 8,
371
+ "visit_max": 102322,
372
+ "visit_median": 2247,
373
+ "rec_err_mean": {
374
+ "w1": 0.012327002572419588,
375
+ "w2": 0.013446205142827239,
376
+ "w3": 0.012607248230779078
377
+ }
378
+ },
379
+ {
380
+ "layer": 29,
381
+ "n_experts": 256,
382
+ "n_rtn_fallback": 16,
383
+ "visit_min": 0,
384
+ "visit_max": 105865,
385
+ "visit_median": 1893,
386
+ "rec_err_mean": {
387
+ "w1": 0.012322824510192731,
388
+ "w2": 0.01330637131104595,
389
+ "w3": 0.012568867361551384
390
+ }
391
+ },
392
+ {
393
+ "layer": 30,
394
+ "n_experts": 256,
395
+ "n_rtn_fallback": 12,
396
+ "visit_min": 0,
397
+ "visit_max": 91393,
398
+ "visit_median": 2442,
399
+ "rec_err_mean": {
400
+ "w1": 0.012352064775768667,
401
+ "w2": 0.013372166551562259,
402
+ "w3": 0.012607292326720199
403
+ }
404
+ },
405
+ {
406
+ "layer": 31,
407
+ "n_experts": 256,
408
+ "n_rtn_fallback": 11,
409
+ "visit_min": 0,
410
+ "visit_max": 129256,
411
+ "visit_median": 2164,
412
+ "rec_err_mean": {
413
+ "w1": 0.012379363139189081,
414
+ "w2": 0.01339083846687572,
415
+ "w3": 0.012637160616577603
416
+ }
417
+ },
418
+ {
419
+ "layer": 32,
420
+ "n_experts": 256,
421
+ "n_rtn_fallback": 14,
422
+ "visit_min": 0,
423
+ "visit_max": 117053,
424
+ "visit_median": 1704,
425
+ "rec_err_mean": {
426
+ "w1": 0.012300585967750521,
427
+ "w2": 0.013404446373897372,
428
+ "w3": 0.01260461766651133
429
+ }
430
+ },
431
+ {
432
+ "layer": 33,
433
+ "n_experts": 256,
434
+ "n_rtn_fallback": 14,
435
+ "visit_min": 0,
436
+ "visit_max": 86566,
437
+ "visit_median": 1884,
438
+ "rec_err_mean": {
439
+ "w1": 0.012293105512071634,
440
+ "w2": 0.013338803430087864,
441
+ "w3": 0.012589504429342924
442
+ }
443
+ },
444
+ {
445
+ "layer": 34,
446
+ "n_experts": 256,
447
+ "n_rtn_fallback": 27,
448
+ "visit_min": 0,
449
+ "visit_max": 104265,
450
+ "visit_median": 1972,
451
+ "rec_err_mean": {
452
+ "w1": 0.01226013481937116,
453
+ "w2": 0.013365076509217033,
454
+ "w3": 0.012629955643205903
455
+ }
456
+ },
457
+ {
458
+ "layer": 35,
459
+ "n_experts": 256,
460
+ "n_rtn_fallback": 19,
461
+ "visit_min": 0,
462
+ "visit_max": 81482,
463
+ "visit_median": 1441,
464
+ "rec_err_mean": {
465
+ "w1": 0.012239336298080161,
466
+ "w2": 0.013319830293767154,
467
+ "w3": 0.012585276490426622
468
+ }
469
+ },
470
+ {
471
+ "layer": 36,
472
+ "n_experts": 256,
473
+ "n_rtn_fallback": 31,
474
+ "visit_min": 0,
475
+ "visit_max": 76229,
476
+ "visit_median": 1601,
477
+ "rec_err_mean": {
478
+ "w1": 0.012273650947463466,
479
+ "w2": 0.01333218400759506,
480
+ "w3": 0.012649128948396537
481
+ }
482
+ },
483
+ {
484
+ "layer": 37,
485
+ "n_experts": 256,
486
+ "n_rtn_fallback": 29,
487
+ "visit_min": 0,
488
+ "visit_max": 140116,
489
+ "visit_median": 2095,
490
+ "rec_err_mean": {
491
+ "w1": 0.012312455397477606,
492
+ "w2": 0.01347610967059154,
493
+ "w3": 0.012669756655668607
494
+ }
495
+ },
496
+ {
497
+ "layer": 38,
498
+ "n_experts": 256,
499
+ "n_rtn_fallback": 29,
500
+ "visit_min": 0,
501
+ "visit_max": 85568,
502
+ "visit_median": 1999,
503
+ "rec_err_mean": {
504
+ "w1": 0.012275624583708122,
505
+ "w2": 0.013469703590089921,
506
+ "w3": 0.012665822468989063
507
+ }
508
+ },
509
+ {
510
+ "layer": 39,
511
+ "n_experts": 256,
512
+ "n_rtn_fallback": 35,
513
+ "visit_min": 0,
514
+ "visit_max": 116615,
515
+ "visit_median": 1707,
516
+ "rec_err_mean": {
517
+ "w1": 0.012040700054058107,
518
+ "w2": 0.01377619295817567,
519
+ "w3": 0.01264991252537584
520
+ }
521
+ },
522
+ {
523
+ "layer": 40,
524
+ "n_experts": 256,
525
+ "n_rtn_fallback": 40,
526
+ "visit_min": 0,
527
+ "visit_max": 112269,
528
+ "visit_median": 1404,
529
+ "rec_err_mean": {
530
+ "w1": 0.012377002352877753,
531
+ "w2": 0.013884072603104869,
532
+ "w3": 0.012798941337678116
533
+ }
534
+ },
535
+ {
536
+ "layer": 41,
537
+ "n_experts": 256,
538
+ "n_rtn_fallback": 35,
539
+ "visit_min": 0,
540
+ "visit_max": 164712,
541
+ "visit_median": 2210,
542
+ "rec_err_mean": {
543
+ "w1": 0.01262030862926622,
544
+ "w2": 0.014570762788935099,
545
+ "w3": 0.012976833964785328
546
+ }
547
+ },
548
+ {
549
+ "layer": 42,
550
+ "n_experts": 256,
551
+ "n_rtn_fallback": 24,
552
+ "visit_min": 0,
553
+ "visit_max": 82726,
554
+ "visit_median": 1562,
555
+ "rec_err_mean": {
556
+ "w1": 0.01345529514219379,
557
+ "w2": 0.015836902584851487,
558
+ "w3": 0.013570939121564152
559
+ }
560
+ }
561
+ ]