Spaces:
Configuration error
title: Divinci AI
emoji: π§
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
short_description: Feature-level interpretability for open transformers
Divinci AI
Feature-level interpretability artifacts for open transformers β built openly, validated empirically.
A vindex is a transformer's weights decompiled into a queryable feature database. It exposes the entity associations, circuit structure, and knowledge-editing surfaces that live inside a model's FFN layers β without requiring GPU inference for most operations.
Think of it as the model's index: the thing you search before you run it.
Interactive viewer
β Open the interactive viewer
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit β Compare to render the current model alongside Bonsai 1-bit, side-by-side β the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (?q=paris&model=gemma-4-e2b) to see real probe-derived activations light up across the layer stack β backed by a 5000-token offline-built search index.
Published vindexes
Cross-family evidence in hand: Gemma, Qwen3, Mistral, Llama, OpenAI MoE, Moonshot MoE, DeepSeek-V4 MoE, plus two 1-bit controls.
| MODEL | ARCHITECTURE | PARAMS | VINDEX | C4 / var@64 | STATUS | NOTES |
| Gemma 4 E2B-it | Dense (Gemma 4) | 2B | gemma-4-e2b-vindex | 0.0407 Β± 0.0004 β | Complete | 3-seed validated; headline universal-constant model |
| Qwen3-0.6B | Dense (Qwen 3) | 0.6B | qwen3-0.6b-vindex | 0.411 | Complete | Smallest published; Qwen3 family-elevated C4 |
| Qwen3-8B bf16 | Dense (Qwen 3) | 8B | qwen3-8b-vindex | 0.804 | Complete | Architecture control for Bonsai |
| Qwen3.6-35B-A3B | MoE (Qwen 3.6) | 35B / 3B active | qwen3.6-35b-a3b-vindex | β | Complete | 256 experts, 40 layers |
| Ministral-3B | Dense (Mistral 3) | 3B | ministral-3b-vindex | 0.265 | Complete | Post-quant fp8 β bf16; non-dissolved spectrum |
| Llama 3.1-8B | Dense (Llama 3.1) | 8B | llama-3.1-8b-vindex | 0.012 β | Complete | Llama family signature |
| MedGemma 1.5-4B | Dense (Gemma multimodal) | 4B | medgemma-1.5-4b-vindex | 1.898 β | Complete | 45Γ cohort anomaly β under investigation |
| GPT-OSS 120B | MoE (OpenAI) | 120B | gpt-oss-120b-vindex | β | Complete | S[0] grows 117Γ with depth (L0=111 β final=13,056) |
| Kimi-K2-Instruct | MoE fp8-native (DeepSeek-V3 style) | 1T / 32B active | kimi-k2-instruct-vindex | 0.0938 (MoE median) β‘ | Complete | 60 MoE layers; 42.28 GB gate_proj binary; broader L52βL60 secondary rise than initial dome SVD suggested |
| DeepSeek-V4-Flash | MoE MXFP4 (DeepSeek-V4) | 43L / 256 experts / 6 active | publishing soon | β | Phase 1B running | 43-layer all-MoE; first-peak L17 + double-bend profile (distinct from Kimiβs smooth dome); MXFP4 unpacker added to builder |
| DeepSeek-V4-Pro | MoE MXFP4 (DeepSeek-V4) | 61L / 384 experts / 6 active | queued | β | Queued | Same scale as Kimi-K2 (60β61 layers Γ 384 experts Γ 7168 hidden); MXFP4 expert weights |
| Bonsai 8B | 1-bit (Qwen 3 base, post-quantized) | 8B | vindex pending publish | 0.093 (var@64) | Phase 1 complete | C5 = 1 (circuit dissolved); n=1 of 1-bit dissolution |
| BitNet b1.58-2B-4T | 1-bit (Microsoft, native) | 2B | vindex pending publish | 0.111 (var@64) | Phase 1 complete | n=2 dissolution confirmation; native 1-bit training |
β‘Kimi-K2 final: 60 MoE layers (L01βL60), gate_proj SVD, median var@64=0.0938 (range 0.083β0.108). Phase 1 + Phase 1B + Phase 2 all complete 2026-04-24; 42.28 GB binary published. DeepSeek-V4 series builds with MXFP4 unpacker (V4-Flash 1B in progress 2026-04-25, V4-Pro queued). Card updates in-place as phases land.
What's a vindex?
Standard model weights tell you what a model computes. A vindex tells you where it stores specific knowledge and which features need to change for a targeted edit.
Concretely: given a query like "Paris β capital", a vindex walk returns the layers, feature directions, and token associations that encode that fact. A patch operation writes a rank-1 ΞW that suppresses or overwrites that association β compiled back to standard HuggingFace safetensors for inference.
LarQL (the toolchain that builds vindexes) is open-source: github.com/chrishayuk/larql | github.com/Divinci-AI/larql.
Research
Paper 1 β Architectural Invariants of Transformer Computation
arXiv preprint forthcoming
Five properties measured across every model in this collection. Three hold within Β±15% coefficient of variation across architectures, organizations, and scales. One collapses under 1-bit quantization β replicated across two independent 1-bit models from two organizations (n = 2). One scales monotonically with model size.
The headline universal constant β layer temperature C4 β is reproducible at the 1% precision level: a three-seed run on Gemma 4 E2B gives C4 = 0.0407 Β± 0.0004, with circuit-stage count perfectly stable (C5 = 4 Β± 0) across all seeds.
Paper 2 β Constellation Edits
draft, arXiv after 3-seed runs + Ξ±-sweep appendix
Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
Companion blog series β The Interpretability Diaries
- Part I β The Architecture Every Language Model Converges To β five universal constants, what holds and what doesn't
- Part II β Deleting Paris from a Language Model β Gate-3 surgical knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at +0.02% perplexity
- Part III β When the Circuit Dissolves β three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 β 0.09β0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure.
Working notebooks: github.com/Divinci-AI/server/tree/preview/notebooks
Working in public
Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones β the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai
