Spaces:
Configuration error
Configuration error
Refresh org card: 8 published vindexes, n=2 1-bit dissolution, 3-seed C4=0.0407±0.0004, viewer Space link, blog series
Browse files
README.md
CHANGED
|
@@ -12,15 +12,32 @@ Think of it as the model's index: the thing you search before you run it.
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|-------|-------------|--------|--------|
|
| 19 |
-
| Gemma 4 E2B-it | Dense (Gemma 4) | 2B | [Divinci-AI/gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) |
|
| 20 |
-
| Qwen3.6-35B-A3B | MoE (Qwen3.6) | 35B / 3B active | [Divinci-AI/qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) |
|
| 21 |
-
| GPT-OSS 120B | MoE (OpenAI) | 120B / ~13B active | *building* |
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
@@ -36,19 +53,31 @@ LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayu
|
|
| 36 |
|
| 37 |
## Research
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
-
**Paper 2 — Constellation Edits** *(draft)*
|
| 43 |
Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
## Working in public
|
| 50 |
|
| 51 |
-
Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
|
| 52 |
|
| 53 |
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
|
| 54 |
|
|
|
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
+
## Interactive viewer
|
| 16 |
+
|
| 17 |
+
[](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)
|
| 18 |
+
|
| 19 |
+
**[→ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**
|
| 20 |
+
|
| 21 |
+
Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **⇌ Compare** to render the current model alongside Bonsai 1-bit, side-by-side — the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack — backed by a 5000-token offline-built search index.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
|
| 25 |
+
## Published vindexes
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, plus two 1-bit controls.
|
| 28 |
+
|
| 29 |
+
| Model | Architecture | Params | Vindex | C4 (layer temp) | Notes |
|
| 30 |
+
|-------|-------------|--------|--------|-----------------|-------|
|
| 31 |
+
| **Gemma 4 E2B-it** | Dense (Gemma 4) | 2B | [gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) | **0.0407 ± 0.0004** ✓ | 3-seed validated; headline universal-constant model |
|
| 32 |
+
| Qwen3-0.6B | Dense (Qwen 3) | 0.6B | [qwen3-0.6b-vindex](https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex) | 0.411 | Smallest published; Qwen3 family-elevated C4 |
|
| 33 |
+
| Qwen3-8B bf16 | Dense (Qwen 3) | 8B | [qwen3-8b-vindex](https://huggingface.co/Divinci-AI/qwen3-8b-vindex) | 0.804 | Architecture control for Bonsai |
|
| 34 |
+
| Qwen3.6-35B-A3B | MoE (Qwen 3.6) | 35B / 3B active | [qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) | — | 256 experts, 40 layers |
|
| 35 |
+
| Ministral-3B | Dense (Mistral 3) | 3B | [ministral-3b-vindex](https://huggingface.co/Divinci-AI/ministral-3b-vindex) | 0.265 | fp8 → bf16 reconstruction |
|
| 36 |
+
| Llama 3.1-8B | Dense (Llama 3.1) | 8B | [llama-3.1-8b-vindex](https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex) | **0.012** ✓ | Llama family signature |
|
| 37 |
+
| MedGemma 1.5-4B | Dense (Gemma multimodal) | 4B | [medgemma-1.5-4b-vindex](https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex) | **1.898 ⚠** | 45× cohort anomaly — under investigation |
|
| 38 |
+
| GPT-OSS 120B | MoE (OpenAI) | 120B | [gpt-oss-120b-vindex](https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex) | — | S[0] grows 117× with depth (L0=111 → final=13,056) |
|
| 39 |
+
| **Bonsai 8B** | 1-bit (Qwen 3 base, post-quantized) | 8B | *vindex pending publish* | 0.429 | **C5 = 1** (circuit dissolved); var@64 = 0.093 |
|
| 40 |
+
| **BitNet b1.58-2B-4T** | 1-bit (Microsoft, native) | 2B | *vindex pending publish* | (Phase 2 pending) | **var@64 = 0.111** mean across 30 layers — n=2 confirmation of dissolution |
|
| 41 |
|
| 42 |
---
|
| 43 |
|
|
|
|
| 53 |
|
| 54 |
## Research
|
| 55 |
|
| 56 |
+
### Paper 1 — *Architectural Invariants of Transformer Computation*
|
| 57 |
+
*arXiv preprint forthcoming*
|
| 58 |
+
|
| 59 |
+
Five properties measured across every model in this collection. **Three hold within ±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** — replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.
|
| 60 |
+
|
| 61 |
+
The headline universal constant — layer temperature C4 — is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 ± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 ± 0`) across all seeds.
|
| 62 |
+
|
| 63 |
+
### Paper 2 — *Constellation Edits*
|
| 64 |
+
*draft, arXiv after 3-seed runs + α-sweep appendix*
|
| 65 |
|
|
|
|
| 66 |
Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
|
| 67 |
|
| 68 |
+
### Companion blog series — *The Interpretability Diaries*
|
| 69 |
+
|
| 70 |
+
- [Part I — The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) — five universal constants, what holds and what doesn't
|
| 71 |
+
- [Part II — Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) — Gate-3 surgical knowledge edit with a receipt; rank-1 ΔW that suppresses one fact at +0.02% perplexity
|
| 72 |
+
- [Part III — When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) — two natively-trained 1-bit models, two organizations, same dissolution: var@64 ≈ 0.10 vs ~0.85 for fp16
|
| 73 |
+
|
| 74 |
Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
|
| 75 |
|
| 76 |
---
|
| 77 |
|
| 78 |
## Working in public
|
| 79 |
|
| 80 |
+
Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
|
| 81 |
|
| 82 |
If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
|
| 83 |
|