Spaces:

Divinci-AI
/

README

Configuration error

File size: 9,039 Bytes

683e392
c8f0aac
55a4cc4
 
 
 
 
 
683e392
 
c8f0aac
 
 
 
 
 
 
 
0f7f10c
c8f0aac
8fceac7
 
 
 
 
 
 
 
0f7f10c
c8f0aac
8fceac7
c8f0aac
bf8897e
8fceac7
72b4f26
 
0f7f10c
 
 
 
 
 
 
 
 
bf8897e
 
 
0f7f10c
 
72b4f26
 
c8f0aac
bf8897e
0f7f10c
 
c8f0aac
 
 
 
 
 
 
 
 
0f7f10c
c8f0aac
 
 
8fceac7
 
 
 
 
 
 
 
 
c8f0aac
 
 
8fceac7
 
 
 
0f7f10c
8fceac7
c8f0aac
 
0f7f10c
c8f0aac
 
 
8fceac7
c8f0aac
 
 
0f7f10c
c8f0aac

---
title: Divinci AI
emoji: 🧠
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
short_description: Feature-level interpretability for open transformers
---

# Divinci AI

Feature-level interpretability artifacts for open transformers — built openly, validated empirically.

A **vindex** is a transformer's weights decompiled into a queryable feature database. It exposes the entity associations, circuit structure, and knowledge-editing surfaces that live inside a model's FFN layers — without requiring GPU inference for most operations.

Think of it as the model's index: the thing you search before you run it.

---

## Interactive viewer

[![LarQL Vindex Viewer — interactive 3D + 2D circuit visualization](https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif)](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)

**[→ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**

Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **⇌ Compare** to render the current model alongside Bonsai 1-bit, side-by-side — the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack — backed by a 5000-token offline-built search index.

---

## Published vindexes

Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, **Moonshot MoE**, **DeepSeek-V4 MoE**, plus two 1-bit controls.

<table>
<tbody>
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 / var@64</strong></td><td><strong>STATUS</strong></td><td><strong>NOTES</strong></td></tr>
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 ± 0.0004</strong> ✓</td><td>Complete</td><td>3-seed validated; headline universal-constant model</td></tr>
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Complete</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Complete</td><td>Architecture control for Bonsai</td></tr>
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>—</td><td>Complete</td><td>256 experts, 40 layers</td></tr>
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>Complete</td><td>Post-quant fp8 → bf16; non-dissolved spectrum</td></tr>
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> ✓</td><td>Complete</td><td>Llama family signature</td></tr>
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 ⚠</strong></td><td>Complete</td><td>45× cohort anomaly — under investigation</td></tr>
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>—</td><td>Complete</td><td>S[0] grows 117× with depth (L0=111 → final=13,056)</td></tr>
<tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex">kimi-k2-instruct-vindex</a></td><td><strong>0.0938</strong> (MoE median) ‡</td><td>Complete</td><td>60 MoE layers; 42.28 GB gate_proj binary; broader L52–L60 secondary rise than initial dome SVD suggested</td></tr>
<tr><td><strong>DeepSeek-V4-Flash</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>43L / 256 experts / 6 active</td><td><em>publishing soon</em></td><td><strong>—</strong></td><td><strong>Phase 1B running</strong></td><td>43-layer all-MoE; first-peak L17 + double-bend profile (distinct from Kimi’s smooth dome); MXFP4 unpacker added to builder</td></tr>
<tr><td><strong>DeepSeek-V4-Pro</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>61L / 384 experts / 6 active</td><td><em>queued</em></td><td>—</td><td>Queued</td><td>Same scale as Kimi-K2 (60–61 layers × 384 experts × 7168 hidden); MXFP4 expert weights</td></tr>
<tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.093 (var@64)</td><td>Phase 1 complete</td><td><strong>C5 = 1</strong> (circuit dissolved); n=1 of 1-bit dissolution</td></tr>
<tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>0.111 (var@64)</td><td>Phase 1 complete</td><td>n=2 dissolution confirmation; native 1-bit training</td></tr>
</tbody>
</table>

‡*Kimi-K2 final: 60 MoE layers (L01–L60), gate_proj SVD, median var@64=0.0938 (range 0.083–0.108). Phase 1 + Phase 1B + Phase 2 all complete 2026-04-24; 42.28 GB binary published. DeepSeek-V4 series builds with MXFP4 unpacker (V4-Flash 1B in progress 2026-04-25, V4-Pro queued). Card updates in-place as phases land.*

---

## What's a vindex?

Standard model weights tell you *what* a model computes. A vindex tells you *where* it stores specific knowledge and *which features* need to change for a targeted edit.

Concretely: given a query like `"Paris → capital"`, a vindex walk returns the layers, feature directions, and token associations that encode that fact. A patch operation writes a rank-1 ΔW that suppresses or overwrites that association — compiled back to standard HuggingFace safetensors for inference.

LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql).

---

## Research

### Paper 1 — *Architectural Invariants of Transformer Computation*
*arXiv preprint forthcoming*

Five properties measured across every model in this collection. **Three hold within ±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** — replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.

The headline universal constant — layer temperature C4 — is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 ± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 ± 0`) across all seeds.

### Paper 2 — *Constellation Edits*
*draft, arXiv after 3-seed runs + α-sweep appendix*

Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.

### Companion blog series — *The Interpretability Diaries*

- [Part I — The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) — five universal constants, what holds and what doesn't
- [Part II — Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) — Gate-3 surgical knowledge edit with a receipt; rank-1 ΔW that suppresses one fact at +0.02% perplexity
- [Part III — When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) — three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 ≈ 0.09–0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure.

Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)

---

## Working in public

Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.

If you replicate a result and find a discrepancy, open an issue on the LarQL repo.

---

*Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*