| --- |
| license: other |
| tags: |
| - vindex |
| - moe |
| - sparse-routing |
| - mechanistic-interpretability |
| base_model: moonshotai/Kimi-K2-Instruct |
| --- |
| |
| # kimi-k2-instruct-vindex |
|
|
| Per-expert gate-vector vindex for `moonshotai/Kimi-K2-Instruct`, built by the [Divinci-AI](https://huggingface.co/Divinci-AI) team for use with [LarQL](https://github.com/chrishayuk/larql) (Chris Hay) and adjacent feature-routing inference research. |
|
|
| ## Vindex specs |
| - **Source model**: `moonshotai/Kimi-K2-Instruct` |
| - **Architecture**: `kimi_k2` (61 layers, 7168 hidden, 2048 moe_intermediate) |
| - **Experts**: 384 routed + 1 shared, 8 per token |
| - **Layers indexed**: 60 MoE layers (L01-L60) |
| - **Features per expert**: 64 (top-K right singular vectors of `gate_proj`) |
| - **Format**: float32, mmap-friendly contiguous binary |
| - **Total size**: 42.28 GB |
|
|
| ## What this is |
| - **`gate_vectors.bin`** β flat float32 binary, layout `[moe_layers, n_experts, num_feats, hidden_size]`. Each per-expert chunk is the top-64 right singular vectors (`Vt[:K, :]`) of that expert's `gate_proj` weight after fp8/MXFP4 dequantization. |
| - **`gate_vectors_index.json`** β sidecar with per-layer `file_offset` (bytes), `shape`, and SVD stats (`median_var64`, `q25_var64`, `q75_var64`). Lookup table for mmap. |
| - **`phase1_moe_svd.json`** β full per-layer Phase 1 stats (routed/shared/router decomposition). |
| - **`phase2_router_svd.json`** β router weight SVD per layer (top-K variance, effective rank, s0/s1 ratio). |
|
|
| ## What this is **not** |
| - Not a runnable model (no inference path on its own). |
| - Not raw weights β only top-K right singular vectors of `gate_proj`, with the singular values *not retained*. Reconstruction is lossy. |
| - Not a fine-tune or quantization of the base model. |
|
|
| ## Usage |
|
|
| ```python |
| import numpy as np |
| |
| # Memory-map the binary |
| arr = np.memmap("gate_vectors.bin", dtype=np.float32, mode="r") |
| |
| import json |
| idx = json.load(open("gate_vectors_index.json")) |
| moe = idx["model_config"]["moe"] |
| n_experts = moe["n_routed_experts"] |
| n_feats = idx["num_feats"] |
| hidden = moe["hidden_size"] |
| |
| # Get layer L's experts |
| def get_layer(L): |
| meta = idx["layers"][str(L)] |
| offset = meta["file_offset"] // 4 # bytes β float32 elements |
| n = n_experts * n_feats * hidden |
| return arr[offset:offset+n].reshape(n_experts, n_feats, hidden) |
| |
| V_L1 = get_layer(1) # shape (n_experts, n_feats, hidden) |
| print("L1 expert 0 top vector L2 norm:", np.linalg.norm(V_L1[0, 0])) # β 1.0 |
| ``` |
|
|
| ## Citation |
|
|
| If you use this vindex in research, please cite: |
|
|
| ```bibtex |
| @misc{divinci_kimi_k2_instruct_vindex_2026, |
| title = {kimi-k2-instruct-vindex: per-expert gate-vector vindex for moonshotai/Kimi-K2-Instruct}, |
| author = {Divinci-AI}, |
| year = {2026}, |
| url = {https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex}, |
| } |
| ``` |
|
|
| Built using [`moe_vindex_builder.py`](https://github.com/Divinci-AI/server/blob/preview/notebooks/moe_vindex_builder.py). |
|
|