Spaces:
Configuration error
Configuration error
File size: 9,039 Bytes
683e392 c8f0aac 55a4cc4 683e392 c8f0aac 0f7f10c c8f0aac 8fceac7 0f7f10c c8f0aac 8fceac7 c8f0aac bf8897e 8fceac7 72b4f26 0f7f10c bf8897e 0f7f10c 72b4f26 c8f0aac bf8897e 0f7f10c c8f0aac 0f7f10c c8f0aac 8fceac7 c8f0aac 8fceac7 0f7f10c 8fceac7 c8f0aac 0f7f10c c8f0aac 8fceac7 c8f0aac 0f7f10c c8f0aac | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
title: Divinci AI
emoji: π§
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
short_description: Feature-level interpretability for open transformers
---
# Divinci AI
Feature-level interpretability artifacts for open transformers β built openly, validated empirically.
A **vindex** is a transformer's weights decompiled into a queryable feature database. It exposes the entity associations, circuit structure, and knowledge-editing surfaces that live inside a model's FFN layers β without requiring GPU inference for most operations.
Think of it as the model's index: the thing you search before you run it.
---
## Interactive viewer
[](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)
**[β Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **β Compare** to render the current model alongside Bonsai 1-bit, side-by-side β the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack β backed by a 5000-token offline-built search index.
---
## Published vindexes
Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, **Moonshot MoE**, **DeepSeek-V4 MoE**, plus two 1-bit controls.
<table>
<tbody>
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 / var@64</strong></td><td><strong>STATUS</strong></td><td><strong>NOTES</strong></td></tr>
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> β</td><td>Complete</td><td>3-seed validated; headline universal-constant model</td></tr>
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Complete</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Complete</td><td>Architecture control for Bonsai</td></tr>
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β</td><td>Complete</td><td>256 experts, 40 layers</td></tr>
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>Complete</td><td>Post-quant fp8 β bf16; non-dissolved spectrum</td></tr>
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> β</td><td>Complete</td><td>Llama family signature</td></tr>
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 β </strong></td><td>Complete</td><td>45Γ cohort anomaly β under investigation</td></tr>
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β</td><td>Complete</td><td>S[0] grows 117Γ with depth (L0=111 β final=13,056)</td></tr>
<tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex">kimi-k2-instruct-vindex</a></td><td><strong>0.0938</strong> (MoE median) β‘</td><td>Complete</td><td>60 MoE layers; 42.28 GB gate_proj binary; broader L52βL60 secondary rise than initial dome SVD suggested</td></tr>
<tr><td><strong>DeepSeek-V4-Flash</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>43L / 256 experts / 6 active</td><td><em>publishing soon</em></td><td><strong>β</strong></td><td><strong>Phase 1B running</strong></td><td>43-layer all-MoE; first-peak L17 + double-bend profile (distinct from Kimiβs smooth dome); MXFP4 unpacker added to builder</td></tr>
<tr><td><strong>DeepSeek-V4-Pro</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>61L / 384 experts / 6 active</td><td><em>queued</em></td><td>β</td><td>Queued</td><td>Same scale as Kimi-K2 (60β61 layers Γ 384 experts Γ 7168 hidden); MXFP4 expert weights</td></tr>
<tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.093 (var@64)</td><td>Phase 1 complete</td><td><strong>C5 = 1</strong> (circuit dissolved); n=1 of 1-bit dissolution</td></tr>
<tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>0.111 (var@64)</td><td>Phase 1 complete</td><td>n=2 dissolution confirmation; native 1-bit training</td></tr>
</tbody>
</table>
β‘*Kimi-K2 final: 60 MoE layers (L01βL60), gate_proj SVD, median var@64=0.0938 (range 0.083β0.108). Phase 1 + Phase 1B + Phase 2 all complete 2026-04-24; 42.28 GB binary published. DeepSeek-V4 series builds with MXFP4 unpacker (V4-Flash 1B in progress 2026-04-25, V4-Pro queued). Card updates in-place as phases land.*
---
## What's a vindex?
Standard model weights tell you *what* a model computes. A vindex tells you *where* it stores specific knowledge and *which features* need to change for a targeted edit.
Concretely: given a query like `"Paris β capital"`, a vindex walk returns the layers, feature directions, and token associations that encode that fact. A patch operation writes a rank-1 ΞW that suppresses or overwrites that association β compiled back to standard HuggingFace safetensors for inference.
LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql).
---
## Research
### Paper 1 β *Architectural Invariants of Transformer Computation*
*arXiv preprint forthcoming*
Five properties measured across every model in this collection. **Three hold within Β±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** β replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.
The headline universal constant β layer temperature C4 β is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 Β± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 Β± 0`) across all seeds.
### Paper 2 β *Constellation Edits*
*draft, arXiv after 3-seed runs + Ξ±-sweep appendix*
Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
### Companion blog series β *The Interpretability Diaries*
- [Part I β The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) β five universal constants, what holds and what doesn't
- [Part II β Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) β Gate-3 surgical knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at +0.02% perplexity
- [Part III β When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) β three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 β 0.09β0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure.
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
---
## Working in public
Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones β the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
---
*Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*
|