--- title: Divinci AI emoji: 🧠 colorFrom: green colorTo: yellow sdk: static pinned: false short_description: Feature-level interpretability for open transformers --- # Divinci AI Feature-level interpretability artifacts for open transformers — built openly, validated empirically. A **vindex** is a transformer's weights decompiled into a queryable feature database. It exposes the entity associations, circuit structure, and knowledge-editing surfaces that live inside a model's FFN layers — without requiring GPU inference for most operations. Think of it as the model's index: the thing you search before you run it. --- ## Interactive viewer [![LarQL Vindex Viewer — interactive 3D + 2D circuit visualization](https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif)](https://huggingface.co/spaces/Divinci-AI/vindex-viewer) **[→ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)** Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **⇌ Compare** to render the current model alongside Bonsai 1-bit, side-by-side — the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack — backed by a 5000-token offline-built search index. --- ## Published vindexes Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, **Moonshot MoE**, **DeepSeek-V4 MoE**, plus two 1-bit controls.
MODELARCHITECTUREPARAMSVINDEXC4 / var@64STATUSNOTES
Gemma 4 E2B-itDense (Gemma 4)2Bgemma-4-e2b-vindex0.0407 ± 0.0004 ✓Complete3-seed validated; headline universal-constant model
Qwen3-0.6BDense (Qwen 3)0.6Bqwen3-0.6b-vindex0.411CompleteSmallest published; Qwen3 family-elevated C4
Qwen3-8B bf16Dense (Qwen 3)8Bqwen3-8b-vindex0.804CompleteArchitecture control for Bonsai
Qwen3.6-35B-A3BMoE (Qwen 3.6)35B / 3B activeqwen3.6-35b-a3b-vindex—Complete256 experts, 40 layers
Ministral-3BDense (Mistral 3)3Bministral-3b-vindex0.265CompletePost-quant fp8 → bf16; non-dissolved spectrum
Llama 3.1-8BDense (Llama 3.1)8Bllama-3.1-8b-vindex0.012 ✓CompleteLlama family signature
MedGemma 1.5-4BDense (Gemma multimodal)4Bmedgemma-1.5-4b-vindex1.898 ⚠Complete45× cohort anomaly — under investigation
GPT-OSS 120BMoE (OpenAI)120Bgpt-oss-120b-vindex—CompleteS[0] grows 117× with depth (L0=111 → final=13,056)
Kimi-K2-InstructMoE fp8-native (DeepSeek-V3 style)1T / 32B activekimi-k2-instruct-vindex0.0938 (MoE median) ‡Complete60 MoE layers; 42.28 GB gate_proj binary; broader L52–L60 secondary rise than initial dome SVD suggested
DeepSeek-V4-FlashMoE MXFP4 (DeepSeek-V4)43L / 256 experts / 6 activepublishing soon—Phase 1B running43-layer all-MoE; first-peak L17 + double-bend profile (distinct from Kimi’s smooth dome); MXFP4 unpacker added to builder
DeepSeek-V4-ProMoE MXFP4 (DeepSeek-V4)61L / 384 experts / 6 activequeued—QueuedSame scale as Kimi-K2 (60–61 layers × 384 experts × 7168 hidden); MXFP4 expert weights
Bonsai 8B1-bit (Qwen 3 base, post-quantized)8Bvindex pending publish0.093 (var@64)Phase 1 completeC5 = 1 (circuit dissolved); n=1 of 1-bit dissolution
BitNet b1.58-2B-4T1-bit (Microsoft, native)2Bvindex pending publish0.111 (var@64)Phase 1 completen=2 dissolution confirmation; native 1-bit training
‡*Kimi-K2 final: 60 MoE layers (L01–L60), gate_proj SVD, median var@64=0.0938 (range 0.083–0.108). Phase 1 + Phase 1B + Phase 2 all complete 2026-04-24; 42.28 GB binary published. DeepSeek-V4 series builds with MXFP4 unpacker (V4-Flash 1B in progress 2026-04-25, V4-Pro queued). Card updates in-place as phases land.* --- ## What's a vindex? Standard model weights tell you *what* a model computes. A vindex tells you *where* it stores specific knowledge and *which features* need to change for a targeted edit. Concretely: given a query like `"Paris → capital"`, a vindex walk returns the layers, feature directions, and token associations that encode that fact. A patch operation writes a rank-1 ΔW that suppresses or overwrites that association — compiled back to standard HuggingFace safetensors for inference. LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql). --- ## Research ### Paper 1 — *Architectural Invariants of Transformer Computation* *arXiv preprint forthcoming* Five properties measured across every model in this collection. **Three hold within ±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** — replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**. The headline universal constant — layer temperature C4 — is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 ± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 ± 0`) across all seeds. ### Paper 2 — *Constellation Edits* *draft, arXiv after 3-seed runs + α-sweep appendix* Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why. ### Companion blog series — *The Interpretability Diaries* - [Part I — The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) — five universal constants, what holds and what doesn't - [Part II — Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) — Gate-3 surgical knowledge edit with a receipt; rank-1 ΔW that suppresses one fact at +0.02% perplexity - [Part III — When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) — three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 ≈ 0.09–0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure. Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks) --- ## Working in public Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement. If you replicate a result and find a discrepancy, open an issue on the LarQL repo. --- *Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*