Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,6 @@ pinned: false
|
|
| 8 |
short_description: Feature-level interpretability for open transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
# Divinci AI
|
| 13 |
|
| 14 |
Feature-level interpretability artifacts for open transformers β built openly, validated empirically.
|
|
@@ -17,6 +16,7 @@ A **vindex** is a transformer's weights decompiled into a queryable feature data
|
|
| 17 |
|
| 18 |
Think of it as the model's index: the thing you search before you run it.
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
## Interactive viewer
|
| 22 |
|
|
@@ -26,27 +26,32 @@ Think of it as the model's index: the thing you search before you run it.
|
|
| 26 |
|
| 27 |
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **β Compare** to render the current model alongside Bonsai 1-bit, side-by-side β the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack β backed by a 5000-token offline-built search index.
|
| 28 |
|
|
|
|
| 29 |
|
| 30 |
## Published vindexes
|
| 31 |
|
| 32 |
-
Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, plus two 1-bit controls.
|
| 33 |
|
| 34 |
<table>
|
| 35 |
<tbody>
|
| 36 |
-
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4
|
| 37 |
-
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> β</td><td>3-seed validated; headline universal-constant model</td></tr>
|
| 38 |
-
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
|
| 39 |
-
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Architecture control for Bonsai</td></tr>
|
| 40 |
-
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β</td><td>256 experts, 40 layers</td></tr>
|
| 41 |
-
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>fp8 β bf16
|
| 42 |
-
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> β</td><td>Llama family signature</td></tr>
|
| 43 |
-
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 β </strong></td><td>45Γ cohort anomaly β under investigation</td></tr>
|
| 44 |
-
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β</td><td>S[0] grows 117Γ with depth (L0=111 β final=13,056)</td></tr>
|
| 45 |
-
<tr><td><strong>
|
| 46 |
-
<tr><td><strong>
|
|
|
|
| 47 |
</tbody>
|
| 48 |
</table>
|
| 49 |
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
## What's a vindex?
|
| 52 |
|
|
@@ -56,6 +61,7 @@ Concretely: given a query like `"Paris β capital"`, a vindex walk returns the
|
|
| 56 |
|
| 57 |
LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql).
|
| 58 |
|
|
|
|
| 59 |
|
| 60 |
## Research
|
| 61 |
|
|
@@ -75,10 +81,11 @@ Mechanistic knowledge editing in transformer feature space. Includes a negative
|
|
| 75 |
|
| 76 |
- [Part I β The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) β five universal constants, what holds and what doesn't
|
| 77 |
- [Part II β Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) β Gate-3 surgical knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at +0.02% perplexity
|
| 78 |
-
- [Part III β When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) β
|
| 79 |
|
| 80 |
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
|
| 81 |
|
|
|
|
| 82 |
|
| 83 |
## Working in public
|
| 84 |
|
|
@@ -86,5 +93,6 @@ Every measurement in our papers traces back to a notebook and a commit. Negative
|
|
| 86 |
|
| 87 |
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
|
| 88 |
|
|
|
|
| 89 |
|
| 90 |
*Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*
|
|
|
|
| 8 |
short_description: Feature-level interpretability for open transformers
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
# Divinci AI
|
| 12 |
|
| 13 |
Feature-level interpretability artifacts for open transformers β built openly, validated empirically.
|
|
|
|
| 16 |
|
| 17 |
Think of it as the model's index: the thing you search before you run it.
|
| 18 |
|
| 19 |
+
---
|
| 20 |
|
| 21 |
## Interactive viewer
|
| 22 |
|
|
|
|
| 26 |
|
| 27 |
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **β Compare** to render the current model alongside Bonsai 1-bit, side-by-side β the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack β backed by a 5000-token offline-built search index.
|
| 28 |
|
| 29 |
+
---
|
| 30 |
|
| 31 |
## Published vindexes
|
| 32 |
|
| 33 |
+
Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, **Moonshot MoE**, plus two 1-bit controls.
|
| 34 |
|
| 35 |
<table>
|
| 36 |
<tbody>
|
| 37 |
+
<tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 / var@64</strong></td><td><strong>STATUS</strong></td><td><strong>NOTES</strong></td></tr>
|
| 38 |
+
<tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> β</td><td>Complete</td><td>3-seed validated; headline universal-constant model</td></tr>
|
| 39 |
+
<tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Complete</td><td>Smallest published; Qwen3 family-elevated C4</td></tr>
|
| 40 |
+
<tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Complete</td><td>Architecture control for Bonsai</td></tr>
|
| 41 |
+
<tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β</td><td>Complete</td><td>256 experts, 40 layers</td></tr>
|
| 42 |
+
<tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>Complete</td><td>Post-quant fp8 β bf16; non-dissolved spectrum</td></tr>
|
| 43 |
+
<tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> β</td><td>Complete</td><td>Llama family signature</td></tr>
|
| 44 |
+
<tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 β </strong></td><td>Complete</td><td>45Γ cohort anomaly β under investigation</td></tr>
|
| 45 |
+
<tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β</td><td>Complete</td><td>S[0] grows 117Γ with depth (L0=111 β final=13,056)</td></tr>
|
| 46 |
+
<tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-vindex">kimi-k2-vindex</a></td><td><strong>0.088</strong> (MoE median) β‘</td><td><strong>Phase 1 running</strong> (6/61 layers)</td><td>3rd fp8-native dissolution datapoint β var@64 same class as 1-bit models</td></tr>
|
| 47 |
+
<tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.093 (var@64)</td><td>Phase 1 complete</td><td><strong>C5 = 1</strong> (circuit dissolved); n=1 of 1-bit dissolution</td></tr>
|
| 48 |
+
<tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>0.111 (var@64)</td><td>Phase 1 complete</td><td>n=2 dissolution confirmation; native 1-bit training</td></tr>
|
| 49 |
</tbody>
|
| 50 |
</table>
|
| 51 |
|
| 52 |
+
β‘*Kimi-K2 spot-check: L00 dense var@64=0.037, MoE layers L01βL04 median=0.088. Full 61-layer Phase 1 completing ~2026-04-23. Card updates in-place as phases land.*
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
|
| 56 |
## What's a vindex?
|
| 57 |
|
|
|
|
| 61 |
|
| 62 |
LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayuk/larql](https://github.com/chrishayuk/larql) | [github.com/Divinci-AI/larql](https://github.com/Divinci-AI/larql).
|
| 63 |
|
| 64 |
+
---
|
| 65 |
|
| 66 |
## Research
|
| 67 |
|
|
|
|
| 81 |
|
| 82 |
- [Part I β The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) β five universal constants, what holds and what doesn't
|
| 83 |
- [Part II β Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) β Gate-3 surgical knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at +0.02% perplexity
|
| 84 |
+
- [Part III β When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) β three dissolution datapoints (BitNet, Bonsai, Kimi-K2): var@64 β 0.09β0.10 for 1-bit + fp8-native vs ~0.85 for fp16/post-quant. Training precision, not storage precision, predicts spectral structure.
|
| 85 |
|
| 86 |
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
|
| 87 |
|
| 88 |
+
---
|
| 89 |
|
| 90 |
## Working in public
|
| 91 |
|
|
|
|
| 93 |
|
| 94 |
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
|
| 95 |
|
| 96 |
+
---
|
| 97 |
|
| 98 |
*Vindexes on this org are free for academic and research use (CC-BY-NC 4.0). Commercial licensing: mike@divinci.ai*
|