README / README.md
mikeumus-divincian's picture
Update org card: Kimi-K2 vindex complete (Phase 1+1B+2), DeepSeek-V4-Flash 1B running, DeepSeek-V4-Pro queued
bf8897e verified
---
title: Divinci AI
emoji: 🧠
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
short_description: Feature-level interpretability for open transformers
---
# Divinci AI
Feature-level interpretability artifacts for open transformers β€” built openly, validated empirically.
A **vindex** is a transformer's weights decompiled into a queryable feature database. It exposes the entity associations, circuit structure, and knowledge-editing surfaces that live inside a model's FFN layers β€” without requiring GPU inference for most operations.
Think of it as the model's index: the thing you search before you run it.
---
## Interactive viewer
[![LarQL Vindex Viewer β€” interactive 3D + 2D circuit visualization](https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif)](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)
**[β†’ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **β‡Œ Compare** to render the current model alongside Bonsai 1-bit, side-by-side β€” the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack β€” backed by a 5000-token offline-built search index.
---
## Published vindexes
Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, **Moonshot MoE**, **DeepSeek-V4 MoE**, plus two 1-bit controls.
<table>
<tbody>
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 / var@64</strong></td><td><strong>STATUS</strong></td><td><strong>NOTES</strong></td></tr>
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> βœ“</td><td>Complete</td><td>3-seed validated; headline universal-constant model</td></tr>
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Complete</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Complete</td><td>Architecture control for Bonsai</td></tr>
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β€”</td><td>Complete</td><td>256 experts, 40 layers</td></tr>
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>Complete</td><td>Post-quant fp8 β†’ bf16; non-dissolved spectrum</td></tr>
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> βœ“</td><td>Complete</td><td>Llama family signature</td></tr>
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 ⚠</strong></td><td>Complete</td><td>45Γ— cohort anomaly β€” under investigation</td></tr>
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β€”</td><td>Complete</td><td>S[0] grows 117Γ— with depth (L0=111 β†’ final=13,056)</td></tr>
<tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex">kimi-k2-instruct-vindex</a></td><td><strong>0.0938</strong> (MoE median) ‑</td><td>Complete</td><td>60 MoE layers; 42.28 GB gate_proj binary; broader L52–L60 secondary rise than initial dome SVD suggested</td></tr>
<tr><td><strong>DeepSeek-V4-Flash</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>43L / 256 experts / 6 active</td><td><em>publishing soon</em></td><td><strong>β€”</strong></td><td><strong>Phase 1B running</strong></td><td>43-layer all-MoE; first-peak L17 + double-bend profile (distinct from Kimi’s smooth dome); MXFP4 unpacker added to builder</td></tr>
<tr><td><strong>DeepSeek-V4-Pro</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>61L / 384 experts / 6 active</td><td><em>queued</em></td><td>β€”</td><td>Queued</td><td>Same scale as Kimi-K2 (60–61 layers Γ— 384 experts Γ— 7168 hidden); MXFP4 expert weights</td></tr>
<tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.093 (var@64)</td><td>Phase 1 complete</td><td><strong>C5 = 1</strong> (circuit dissolved); n=1 of 1-bit dissolution</td></tr>
<tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>0.111 (var@64)</td><td>Phase 1 complete</td><td>n=2 dissolution confirmation; native 1-bit training</td></tr>
</tbody>
</table>
‑*Kimi-K2 final: 60 MoE layers (L01–L60), gate_proj SVD, median var@64=0.0938 (range 0.083–0.108). Phase 1 + Phase 1B + Phase 2 all complete 2026-04-24; 42.28 GB binary published. DeepSeek-V4 series builds with MXFP4 unpacker (V4-Flash 1B in progress 2026-04-25, V4-Pro queued). Card updates in-place as phases land.*
---
## What's a vindex?
Standard model weights tell you *what* a model computes. A vindex tells you *where* it stores specific knowledge and *which features* need to change for a targeted edit.
Concretely: given a query like `"Paris β†’ capital"`, a vindex walk returns the layers, feature directions, and token associations that encode that fact. A patch operation writes a rank-1 Ξ”W that suppresses or overwrites that association β€” compiled back to standard HuggingFace safetensors for inference.
LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql).
---
## Research
### Paper 1 β€” *Architectural Invariants of Transformer Computation*
*arXiv preprint forthcoming*
Five properties measured across every model in this collection. **Three hold within Β±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** β€” replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.
The headline universal constant β€” layer temperature C4 β€” is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 Β± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 Β± 0`) across all seeds.
### Paper 2 β€” *Constellation Edits*
*draft, arXiv after 3-seed runs + Ξ±-sweep appendix*
Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
### Companion blog series β€” *The Interpretability Diaries*
- [Part I β€” The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) β€” five universal constants, what holds and what doesn't
- [Part II β€” Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) β€” Gate-3 surgical knowledge edit with a receipt; rank-1 Ξ”W that suppresses one fact at +0.02% perplexity
- [Part III β€” When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) β€” three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 β‰ˆ 0.09–0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure.
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
---
## Working in public
Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones β€” the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
---
*Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*