File size: 2,906 Bytes
4316fe4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: other
tags:
  - vindex
  - moe
  - sparse-routing
  - mechanistic-interpretability
base_model: moonshotai/Kimi-K2-Instruct
---

# kimi-k2-instruct-vindex

Per-expert gate-vector vindex for `moonshotai/Kimi-K2-Instruct`, built by the [Divinci-AI](https://huggingface.co/Divinci-AI) team for use with [LarQL](https://github.com/chrishayuk/larql) (Chris Hay) and adjacent feature-routing inference research.

## Vindex specs
- **Source model**: `moonshotai/Kimi-K2-Instruct`
- **Architecture**: `kimi_k2` (61 layers, 7168 hidden, 2048 moe_intermediate)
- **Experts**: 384 routed + 1 shared, 8 per token
- **Layers indexed**: 60 MoE layers (L01-L60)
- **Features per expert**: 64 (top-K right singular vectors of `gate_proj`)
- **Format**: float32, mmap-friendly contiguous binary
- **Total size**: 42.28 GB

## What this is
- **`gate_vectors.bin`** — flat float32 binary, layout `[moe_layers, n_experts, num_feats, hidden_size]`. Each per-expert chunk is the top-64 right singular vectors (`Vt[:K, :]`) of that expert's `gate_proj` weight after fp8/MXFP4 dequantization.
- **`gate_vectors_index.json`** — sidecar with per-layer `file_offset` (bytes), `shape`, and SVD stats (`median_var64`, `q25_var64`, `q75_var64`). Lookup table for mmap.
- **`phase1_moe_svd.json`** — full per-layer Phase 1 stats (routed/shared/router decomposition).
- **`phase2_router_svd.json`** — router weight SVD per layer (top-K variance, effective rank, s0/s1 ratio).

## What this is **not**
- Not a runnable model (no inference path on its own).
- Not raw weights — only top-K right singular vectors of `gate_proj`, with the singular values *not retained*. Reconstruction is lossy.
- Not a fine-tune or quantization of the base model.

## Usage

```python
import numpy as np

# Memory-map the binary
arr = np.memmap("gate_vectors.bin", dtype=np.float32, mode="r")

import json
idx = json.load(open("gate_vectors_index.json"))
moe = idx["model_config"]["moe"]
n_experts = moe["n_routed_experts"]
n_feats = idx["num_feats"]
hidden = moe["hidden_size"]

# Get layer L's experts
def get_layer(L):
    meta = idx["layers"][str(L)]
    offset = meta["file_offset"] // 4  # bytes → float32 elements
    n = n_experts * n_feats * hidden
    return arr[offset:offset+n].reshape(n_experts, n_feats, hidden)

V_L1 = get_layer(1)  # shape (n_experts, n_feats, hidden)
print("L1 expert 0 top vector L2 norm:", np.linalg.norm(V_L1[0, 0]))  # ≈ 1.0
```

## Citation

If you use this vindex in research, please cite:

```bibtex
@misc{divinci_kimi_k2_instruct_vindex_2026,
  title  = {kimi-k2-instruct-vindex: per-expert gate-vector vindex for moonshotai/Kimi-K2-Instruct},
  author = {Divinci-AI},
  year   = {2026},
  url    = {https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex},
}
```

Built using [`moe_vindex_builder.py`](https://github.com/Divinci-AI/server/blob/preview/notebooks/moe_vindex_builder.py).