mikeumus-divincian commited on
Commit
8fceac7
·
verified ·
1 Parent(s): 8b5a46d

Refresh org card: 8 published vindexes, n=2 1-bit dissolution, 3-seed C4=0.0407±0.0004, viewer Space link, blog series

Browse files
Files changed (1) hide show
  1. README.md +40 -11
README.md CHANGED
@@ -12,15 +12,32 @@ Think of it as the model's index: the thing you search before you run it.
12
 
13
  ---
14
 
15
- ## Published vindexes
 
 
 
 
 
 
 
 
16
 
17
- | Model | Architecture | Params | Vindex |
18
- |-------|-------------|--------|--------|
19
- | Gemma 4 E2B-it | Dense (Gemma 4) | 2B | [Divinci-AI/gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) |
20
- | Qwen3.6-35B-A3B | MoE (Qwen3.6) | 35B / 3B active | [Divinci-AI/qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) |
21
- | GPT-OSS 120B | MoE (OpenAI) | 120B / ~13B active | *building* |
22
 
23
- Three organizations, three architectures: Gemma dense, Qwen MoE, OpenAI MoE.
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ---
26
 
@@ -36,19 +53,31 @@ LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayu
36
 
37
  ## Research
38
 
39
- **Paper 1 — Architectural Invariants of Transformer Computation** *(arXiv forthcoming)*
40
- Five properties measured across every model in this collection. Three hold within ±15% coefficient of variation across architectures, organizations, and scales. One collapses under 1-bit quantization. One scales monotonically with model size.
 
 
 
 
 
 
 
41
 
42
- **Paper 2 — Constellation Edits** *(draft)*
43
  Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
44
 
 
 
 
 
 
 
45
  Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
46
 
47
  ---
48
 
49
  ## Working in public
50
 
51
- Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
52
 
53
  If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
54
 
 
12
 
13
  ---
14
 
15
+ ## Interactive viewer
16
+
17
+ [![LarQL Vindex Viewer — interactive 3D + 2D circuit visualization](https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif)](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)
18
+
19
+ **[→ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**
20
+
21
+ Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **⇌ Compare** to render the current model alongside Bonsai 1-bit, side-by-side — the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack — backed by a 5000-token offline-built search index.
22
+
23
+ ---
24
 
25
+ ## Published vindexes
 
 
 
 
26
 
27
+ Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, plus two 1-bit controls.
28
+
29
+ | Model | Architecture | Params | Vindex | C4 (layer temp) | Notes |
30
+ |-------|-------------|--------|--------|-----------------|-------|
31
+ | **Gemma 4 E2B-it** | Dense (Gemma 4) | 2B | [gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) | **0.0407 ± 0.0004** ✓ | 3-seed validated; headline universal-constant model |
32
+ | Qwen3-0.6B | Dense (Qwen 3) | 0.6B | [qwen3-0.6b-vindex](https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex) | 0.411 | Smallest published; Qwen3 family-elevated C4 |
33
+ | Qwen3-8B bf16 | Dense (Qwen 3) | 8B | [qwen3-8b-vindex](https://huggingface.co/Divinci-AI/qwen3-8b-vindex) | 0.804 | Architecture control for Bonsai |
34
+ | Qwen3.6-35B-A3B | MoE (Qwen 3.6) | 35B / 3B active | [qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) | — | 256 experts, 40 layers |
35
+ | Ministral-3B | Dense (Mistral 3) | 3B | [ministral-3b-vindex](https://huggingface.co/Divinci-AI/ministral-3b-vindex) | 0.265 | fp8 → bf16 reconstruction |
36
+ | Llama 3.1-8B | Dense (Llama 3.1) | 8B | [llama-3.1-8b-vindex](https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex) | **0.012** ✓ | Llama family signature |
37
+ | MedGemma 1.5-4B | Dense (Gemma multimodal) | 4B | [medgemma-1.5-4b-vindex](https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex) | **1.898 ⚠** | 45× cohort anomaly — under investigation |
38
+ | GPT-OSS 120B | MoE (OpenAI) | 120B | [gpt-oss-120b-vindex](https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex) | — | S[0] grows 117× with depth (L0=111 → final=13,056) |
39
+ | **Bonsai 8B** | 1-bit (Qwen 3 base, post-quantized) | 8B | *vindex pending publish* | 0.429 | **C5 = 1** (circuit dissolved); var@64 = 0.093 |
40
+ | **BitNet b1.58-2B-4T** | 1-bit (Microsoft, native) | 2B | *vindex pending publish* | (Phase 2 pending) | **var@64 = 0.111** mean across 30 layers — n=2 confirmation of dissolution |
41
 
42
  ---
43
 
 
53
 
54
  ## Research
55
 
56
+ ### Paper 1 — *Architectural Invariants of Transformer Computation*
57
+ *arXiv preprint forthcoming*
58
+
59
+ Five properties measured across every model in this collection. **Three hold within ±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** — replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.
60
+
61
+ The headline universal constant — layer temperature C4 — is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 ± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 ± 0`) across all seeds.
62
+
63
+ ### Paper 2 — *Constellation Edits*
64
+ *draft, arXiv after 3-seed runs + α-sweep appendix*
65
 
 
66
  Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
67
 
68
+ ### Companion blog series — *The Interpretability Diaries*
69
+
70
+ - [Part I — The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) — five universal constants, what holds and what doesn't
71
+ - [Part II — Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) — Gate-3 surgical knowledge edit with a receipt; rank-1 ΔW that suppresses one fact at +0.02% perplexity
72
+ - [Part III — When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) — two natively-trained 1-bit models, two organizations, same dissolution: var@64 ≈ 0.10 vs ~0.85 for fp16
73
+
74
  Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
75
 
76
  ---
77
 
78
  ## Working in public
79
 
80
+ Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
81
 
82
  If you replicate a result and find a discrepancy, open an issue on the LarQL repo.
83