--- license: mit tags: - vindex - moe - sparse-routing - mechanistic-interpretability base_model: deepseek-ai/DeepSeek-V4-Flash --- # deepseek-v4-flash-vindex Per-expert gate-vector vindex for `deepseek-ai/DeepSeek-V4-Flash`, built by the [Divinci-AI](https://huggingface.co/Divinci-AI) team for use with [LarQL](https://github.com/chrishayuk/larql) (Chris Hay) and adjacent feature-routing inference research. ## Vindex specs - **Source model**: `deepseek-ai/DeepSeek-V4-Flash` - **Architecture**: `deepseek_v4` (43 layers, 4096 hidden, 2048 moe_intermediate) - **Experts**: 256 routed + 1 shared, 6 per token - **Layers indexed**: 43 MoE layers (L00-L42) - **Features per expert**: 64 (top-K right singular vectors of `gate_proj`) - **Format**: float32, mmap-friendly contiguous binary - **Total size**: 11.54 GB ## What this is - **`gate_vectors.bin`** — flat float32 binary, layout `[moe_layers, n_experts, num_feats, hidden_size]`. Each per-expert chunk is the top-64 right singular vectors (`Vt[:K, :]`) of that expert's `gate_proj` weight after fp8/MXFP4 dequantization. - **`gate_vectors_index.json`** — sidecar with per-layer `file_offset` (bytes), `shape`, and SVD stats (`median_var64`, `q25_var64`, `q75_var64`). Lookup table for mmap. - **`phase1_moe_svd.json`** — full per-layer Phase 1 stats (routed/shared/router decomposition). - **`phase2_router_svd.json`** — router weight SVD per layer (top-K variance, effective rank, s0/s1 ratio). ## What this is **not** - Not a runnable model (no inference path on its own). - Not raw weights — only top-K right singular vectors of `gate_proj`, with the singular values *not retained*. Reconstruction is lossy. - Not a fine-tune or quantization of the base model. ## Usage ```python import numpy as np # Memory-map the binary arr = np.memmap("gate_vectors.bin", dtype=np.float32, mode="r") import json idx = json.load(open("gate_vectors_index.json")) moe = idx["model_config"]["moe"] n_experts = moe["n_routed_experts"] n_feats = idx["num_feats"] hidden = moe["hidden_size"] # Get layer L's experts def get_layer(L): meta = idx["layers"][str(L)] offset = meta["file_offset"] // 4 # bytes → float32 elements n = n_experts * n_feats * hidden return arr[offset:offset+n].reshape(n_experts, n_feats, hidden) V_L1 = get_layer(1) # shape (n_experts, n_feats, hidden) print("L1 expert 0 top vector L2 norm:", np.linalg.norm(V_L1[0, 0])) # ≈ 1.0 ``` ## Citation If you use this vindex in research, please cite: ```bibtex @misc{divinci_deepseek_v4_flash_vindex_2026, title = {deepseek-v4-flash-vindex: per-expert gate-vector vindex for deepseek-ai/DeepSeek-V4-Flash}, author = {Divinci-AI}, year = {2026}, url = {https://huggingface.co/Divinci-AI/deepseek-v4-flash-vindex}, } ``` Built using [`moe_vindex_builder.py`](https://github.com/Divinci-AI/server/blob/preview/notebooks/moe_vindex_builder.py).