Spaces:
Configuration error
Configuration error
File size: 10,616 Bytes
8b5a46d bfc9737 4b002a1 bfc9737 8b5a46d bfc9737 22b185f 4b002a1 ba09059 c01058f 22b185f bfc9737 8b5a46d 22b185f bfc9737 8b5a46d bfc9737 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Divinci AI</title>
<style>
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; max-width: 860px; margin: 0 auto; padding: 2rem 1.5rem; color: #1a1a1a; line-height: 1.6; }
h1 { font-size: 1.8rem; font-weight: 700; margin-bottom: 0.25rem; }
h2 { font-size: 1.1rem; font-weight: 600; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e5e7eb; padding-bottom: 0.25rem; }
p { margin: 0.5rem 0 1rem; }
table { border-collapse: collapse; width: 100%; margin: 1rem 0; font-size: 0.9rem; }
th, td { border: 1px solid #e5e7eb; padding: 0.5rem 0.75rem; text-align: left; }
th { background: #f9fafb; font-weight: 600; }
a { color: #2563eb; text-decoration: none; }
a:hover { text-decoration: underline; }
.tagline { color: #6b7280; font-size: 1rem; margin-bottom: 1.5rem; }
.footer { margin-top: 2.5rem; padding-top: 1rem; border-top: 1px solid #e5e7eb; font-size: 0.85rem; color: #6b7280; }
hr { border: none; border-top: 1px solid #e5e7eb; margin: 1.5rem 0; }
</style>
</head>
<body>
<h1 id="divinci-ai">Divinci AI</h1>
<p class="tagline">Feature-level interpretability artifacts for open transformers β
built openly, validated empirically.</p>
<p>A <strong>vindex</strong> is a transformer's weights decompiled into
a queryable feature database. It exposes the entity associations,
circuit structure, and knowledge-editing surfaces that live inside a
model's FFN layers β without requiring GPU inference for most
operations.</p>
<p>Think of it as the model's index: the thing you search before you run
it.</p>
<hr />
<h2 id="interactive-viewer">Interactive viewer</h2>
<p><a href="https://huggingface.co/spaces/Divinci-AI/vindex-viewer"><img
src="https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif"
alt="LarQL Vindex Viewer β interactive 3D + 2D circuit visualization" /></a></p>
<p><strong><a
href="https://huggingface.co/spaces/Divinci-AI/vindex-viewer">β Open the
interactive viewer</a></strong></p>
<p>Pick any of 9 models from the dropdown. Toggle between the 3D
cylinder spiral and a flat 2D circuit/network view. Hit <strong>β
Compare</strong> to render the current model alongside Bonsai 1-bit,
side-by-side β the contrast between fp16 structure (organized rings) and
1-bit dissolution (scattered cloud) is the most direct picture of what
1-bit training does to a transformer's internal organization that we
know how to render. Search for entity features
(<code>?q=paris&model=gemma-4-e2b</code>) to see real probe-derived
activations light up across the layer stack β backed by a 5000-token
offline-built search index.</p>
<hr />
<h2 id="published-vindexes">Published vindexes</h2>
<p>Cross-family evidence in hand: <strong>Gemma</strong>,
<strong>Qwen3</strong>, <strong>Mistral</strong>,
<strong>Llama</strong>, <strong>OpenAI MoE</strong>,
<strong>Moonshot MoE</strong>, <strong>DeepSeek-V4 MoE</strong>, plus two 1-bit
controls.</p>
<table>
<tbody>
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 (LAYER TEMP)</strong></td><td><strong>NOTES</strong></td></tr>
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> β</td><td>3-seed validated; headline universal-constant model</td></tr>
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Architecture control for Bonsai</td></tr>
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β</td><td>256 experts, 40 layers</td></tr>
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>fp8 β bf16 reconstruction</td></tr>
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> β</td><td>Llama family signature</td></tr>
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 β </strong></td><td>45Γ cohort anomaly β under investigation</td></tr>
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β</td><td>S[0] grows 117Γ with depth (L0=111 β final=13,056)</td></tr>
<tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex">kimi-k2-instruct-vindex</a></td><td><strong>0.0938</strong> (MoE median)</td><td>60 MoE layers; 42.28 GB gate_proj binary; broader L52βL60 secondary rise than initial dome SVD suggested</td></tr>
<tr><td><strong>DeepSeek-V4-Flash</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>43L / 256 experts / 6 active</td><td><a href="https://huggingface.co/Divinci-AI/deepseek-v4-flash-vindex">deepseek-v4-flash-vindex</a></td><td><strong>0.108</strong> (MoE median)</td><td>43-layer all-MoE; 11.54 GB gate_proj binary; first-peak L18 + double-bend profile (distinct from Kimi smooth dome); MXFP4 expert unpacking</td></tr>
<tr><td><strong>DeepSeek-V4-Pro</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>61L / 384 experts / 6 active</td><td><a href="https://huggingface.co/Divinci-AI/deepseek-v4-pro-vindex">deepseek-v4-pro-vindex</a></td><td><strong>0.0653</strong> (MoE median)</td><td>61-layer all-MoE; 42.98 GB gate_proj binary; lowest var@64 of 3 published MoE vindexes (V4-Pro 0.065 < Kimi 0.094 < V4-Flash 0.108) β V4-Pro experts are most shared/redundant; late secondary rise L53βL60</td></tr>
<tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.429</td><td><strong>C5 = 1</strong> (circuit dissolved); var@64 = 0.093</td></tr>
<tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>(Phase 2 pending)</td><td><strong>var@64 = 0.111</strong> mean across 30 layers β n=2 confirmation of dissolution</td></tr>
</tbody>
</table>
<hr />
<h2 id="whats-a-vindex">What's a vindex?</h2>
<p>Standard model weights tell you <em>what</em> a model computes. A
vindex tells you <em>where</em> it stores specific knowledge and
<em>which features</em> need to change for a targeted edit.</p>
<p>Concretely: given a query like <code>"Paris β capital"</code>, a
vindex walk returns the layers, feature directions, and token
associations that encode that fact. A patch operation writes a rank-1 ΞW
that suppresses or overwrites that association β compiled back to
standard HuggingFace safetensors for inference.</p>
<p>LarQL (the toolchain that builds vindexes) is open-source: <a
href="https://github.com/chrishayuk/larql">github.com/chrishayuk/larql</a>
| <a
href="https://github.com/Divinci-AI/larql">github.com/Divinci-AI/larql</a>.</p>
<hr />
<h2 id="research">Research</h2>
<h3
id="paper-1--architectural-invariants-of-transformer-computation">Paper
1 β <em>Architectural Invariants of Transformer Computation</em></h3>
<p><em>arXiv preprint forthcoming</em></p>
<p>Five properties measured across every model in this collection.
<strong>Three hold within Β±15% coefficient of variation</strong> across
architectures, organizations, and scales. <strong>One collapses under
1-bit quantization</strong> β replicated across two independent 1-bit
models from two organizations (n = 2). <strong>One scales monotonically
with model size</strong>.</p>
<p>The headline universal constant β layer temperature C4 β is
reproducible at the <strong>1% precision level</strong>: a three-seed
run on Gemma 4 E2B gives <code>C4 = 0.0407 Β± 0.0004</code>, with
circuit-stage count perfectly stable (<code>C5 = 4 Β± 0</code>) across
all seeds.</p>
<h3 id="paper-2--constellation-edits">Paper 2 β <em>Constellation
Edits</em></h3>
<p><em>draft, arXiv after 3-seed runs + Ξ±-sweep appendix</em></p>
<p>Mechanistic knowledge editing in transformer feature space. Includes
a negative result: why activation-space edits fail in 1-bit models, and
what weight-space geometry reveals about why.</p>
<h3 id="companion-blog-series--the-interpretability-diaries">Companion
blog series β <em>The Interpretability Diaries</em></h3>
<ul>
<li><a
href="https://divinci.ai/blog/architecture-every-llm-converges-to/">Part
I β The Architecture Every Language Model Converges To</a> β five
universal constants, what holds and what doesn't</li>
<li><a
href="https://divinci.ai/blog/deleting-paris-from-a-language-model/">Part
II β Deleting Paris from a Language Model</a> β Gate-3 surgical
knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at
+0.02% perplexity</li>
<li><a href="https://divinci.ai/blog/when-the-circuit-dissolves/">Part
III β When the Circuit Dissolves</a> β two natively-trained 1-bit
models, two organizations, same dissolution: var@64 β 0.10 vs ~0.85 for
fp16</li>
</ul>
<p>Working notebooks: <a
href="https://github.com/Divinci-AI/server/tree/preview/notebooks">github.com/Divinci-AI/server/tree/preview/notebooks</a></p>
<hr />
<h2 id="working-in-public">Working in public</h2>
<p>Every measurement in our papers traces back to a notebook and a
commit. Negative results ship alongside positive ones β the MLP
compensation mechanism that defeats knowledge editing in 1-bit models is
in the notebooks, not buried in a supplement.</p>
<p>If you replicate a result and find a discrepancy, open an issue on
the LarQL repo.</p>
<hr />
<p><em>Vindexes on this org are free for academic and research use
(CC-BY-NC 4.0). Commercial licensing: <a
href="mailto:mike@divinci.ai">mike@divinci.ai</a></em></p>
</body>
</html>
|