Divinci-AI
/

kimi-k2-instruct-vindex

Mixture of Experts

mechanistic-interpretability

Model card Files Files and versions

kimi-k2-instruct-vindex / README.md

mikeumus-divincian's picture

mikeumus-divincian

Add README.md

4316fe4 verified 21 days ago

|

history blame contribute delete

2.91 kB

	---
	license: other
	tags:
	- vindex
	- moe
	- sparse-routing
	- mechanistic-interpretability
	base_model: moonshotai/Kimi-K2-Instruct
	---

	# kimi-k2-instruct-vindex

	Per-expert gate-vector vindex for `moonshotai/Kimi-K2-Instruct`, built by the [Divinci-AI](https://huggingface.co/Divinci-AI) team for use with [LarQL](https://github.com/chrishayuk/larql) (Chris Hay) and adjacent feature-routing inference research.

	## Vindex specs
	- Source model: `moonshotai/Kimi-K2-Instruct`
	- Architecture: `kimi_k2` (61 layers, 7168 hidden, 2048 moe_intermediate)
	- Experts: 384 routed + 1 shared, 8 per token
	- Layers indexed: 60 MoE layers (L01-L60)
	- Features per expert: 64 (top-K right singular vectors of `gate_proj`)
	- Format: float32, mmap-friendly contiguous binary
	- Total size: 42.28 GB

	## What this is
	- `gate_vectors.bin` — flat float32 binary, layout `[moe_layers, n_experts, num_feats, hidden_size]`. Each per-expert chunk is the top-64 right singular vectors (`Vt[:K, :]`) of that expert's `gate_proj` weight after fp8/MXFP4 dequantization.
	- `gate_vectors_index.json` — sidecar with per-layer `file_offset` (bytes), `shape`, and SVD stats (`median_var64`, `q25_var64`, `q75_var64`). Lookup table for mmap.
	- `phase1_moe_svd.json` — full per-layer Phase 1 stats (routed/shared/router decomposition).
	- `phase2_router_svd.json` — router weight SVD per layer (top-K variance, effective rank, s0/s1 ratio).

	## What this is not
	- Not a runnable model (no inference path on its own).
	- Not raw weights — only top-K right singular vectors of `gate_proj`, with the singular values not retained. Reconstruction is lossy.
	- Not a fine-tune or quantization of the base model.

	## Usage

	```python
	import numpy as np

	# Memory-map the binary
	arr = np.memmap("gate_vectors.bin", dtype=np.float32, mode="r")

	import json
	idx = json.load(open("gate_vectors_index.json"))
	moe = idx["model_config"]["moe"]
	n_experts = moe["n_routed_experts"]
	n_feats = idx["num_feats"]
	hidden = moe["hidden_size"]

	# Get layer L's experts
	def get_layer(L):
	meta = idx["layers"][str(L)]
	offset = meta["file_offset"] // 4 # bytes → float32 elements
	n = n_experts * n_feats * hidden
	return arr[offset:offset+n].reshape(n_experts, n_feats, hidden)

	V_L1 = get_layer(1) # shape (n_experts, n_feats, hidden)
	print("L1 expert 0 top vector L2 norm:", np.linalg.norm(V_L1[0, 0])) # ≈ 1.0
	```

	## Citation

	If you use this vindex in research, please cite:

	```bibtex
	@misc{divinci_kimi_k2_instruct_vindex_2026,
	title = {kimi-k2-instruct-vindex: per-expert gate-vector vindex for moonshotai/Kimi-K2-Instruct},
	author = {Divinci-AI},
	year = {2026},
	url = {https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex},
	}
	```

	Built using [`moe_vindex_builder.py`](https://github.com/Divinci-AI/server/blob/preview/notebooks/moe_vindex_builder.py).