Spaces:

Divinci-AI
/

README

Configuration error

App Files Files Community

mikeumus-divincian commited on 24 days ago

Commit

8fceac7

verified ·

1 Parent(s): 8b5a46d

Refresh org card: 8 published vindexes, n=2 1-bit dissolution, 3-seed C4=0.0407±0.0004, viewer Space link, blog series

Browse files

Files changed (1) hide show

README.md +40 -11

README.md CHANGED Viewed

@@ -12,15 +12,32 @@ Think of it as the model's index: the thing you search before you run it.
 ---
-## Published vindexes
-| Model | Architecture | Params | Vindex |
-|-------|-------------|--------|--------|
-| Gemma 4 E2B-it | Dense (Gemma 4) | 2B | [Divinci-AI/gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) |
-| Qwen3.6-35B-A3B | MoE (Qwen3.6) | 35B / 3B active | [Divinci-AI/qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) |
-| GPT-OSS 120B | MoE (OpenAI) | 120B / ~13B active | *building* |
-Three organizations, three architectures: Gemma dense, Qwen MoE, OpenAI MoE.
 ---
@@ -36,19 +53,31 @@ LarQL (the toolchain that builds vindexes) is open-source: [github.com/chrishayu
 ## Research
-**Paper 1 — Architectural Invariants of Transformer Computation** *(arXiv forthcoming)*
-Five properties measured across every model in this collection. Three hold within ±15% coefficient of variation across architectures, organizations, and scales. One collapses under 1-bit quantization. One scales monotonically with model size.
-**Paper 2 — Constellation Edits** *(draft)*
 Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
 Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
 ---
 ## Working in public
-Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
 If you replicate a result and find a discrepancy, open an issue on the LarQL repo.

 ---
+## Interactive viewer
+[![LarQL Vindex Viewer — interactive 3D + 2D circuit visualization](https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif)](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)
+**[→ Open the interactive viewer](https://huggingface.co/spaces/Divinci-AI/vindex-viewer)**
+Pick any of 9 models from the dropdown. Toggle between the 3D cylinder spiral and a flat 2D circuit/network view. Hit **⇌ Compare** to render the current model alongside Bonsai 1-bit, side-by-side — the contrast between fp16 structure (organized rings) and 1-bit dissolution (scattered cloud) is the most direct picture of what 1-bit training does to a transformer's internal organization that we know how to render. Search for entity features (`?q=paris&model=gemma-4-e2b`) to see real probe-derived activations light up across the layer stack — backed by a 5000-token offline-built search index.
+---
+## Published vindexes
+Cross-family evidence in hand: **Gemma**, **Qwen3**, **Mistral**, **Llama**, **OpenAI MoE**, plus two 1-bit controls.
+| Model | Architecture | Params | Vindex | C4 (layer temp) | Notes |
+|-------|-------------|--------|--------|-----------------|-------|
+| **Gemma 4 E2B-it** | Dense (Gemma 4) | 2B | [gemma-4-e2b-vindex](https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex) | **0.0407 ± 0.0004** ✓ | 3-seed validated; headline universal-constant model |
+| Qwen3-0.6B | Dense (Qwen 3) | 0.6B | [qwen3-0.6b-vindex](https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex) | 0.411 | Smallest published; Qwen3 family-elevated C4 |
+| Qwen3-8B bf16 | Dense (Qwen 3) | 8B | [qwen3-8b-vindex](https://huggingface.co/Divinci-AI/qwen3-8b-vindex) | 0.804 | Architecture control for Bonsai |
+| Qwen3.6-35B-A3B | MoE (Qwen 3.6) | 35B / 3B active | [qwen3.6-35b-a3b-vindex](https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex) | — | 256 experts, 40 layers |
+| Ministral-3B | Dense (Mistral 3) | 3B | [ministral-3b-vindex](https://huggingface.co/Divinci-AI/ministral-3b-vindex) | 0.265 | fp8 → bf16 reconstruction |
+| Llama 3.1-8B | Dense (Llama 3.1) | 8B | [llama-3.1-8b-vindex](https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex) | **0.012** ✓ | Llama family signature |
+| MedGemma 1.5-4B | Dense (Gemma multimodal) | 4B | [medgemma-1.5-4b-vindex](https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex) | **1.898 ⚠** | 45× cohort anomaly — under investigation |
+| GPT-OSS 120B | MoE (OpenAI) | 120B | [gpt-oss-120b-vindex](https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex) | — | S[0] grows 117× with depth (L0=111 → final=13,056) |
+| **Bonsai 8B** | 1-bit (Qwen 3 base, post-quantized) | 8B | *vindex pending publish* | 0.429 | **C5 = 1** (circuit dissolved); var@64 = 0.093 |
+| **BitNet b1.58-2B-4T** | 1-bit (Microsoft, native) | 2B | *vindex pending publish* | (Phase 2 pending) | **var@64 = 0.111** mean across 30 layers — n=2 confirmation of dissolution |
 ---
 ## Research
+### Paper 1 — *Architectural Invariants of Transformer Computation*
+*arXiv preprint forthcoming*
+Five properties measured across every model in this collection. **Three hold within ±15% coefficient of variation** across architectures, organizations, and scales. **One collapses under 1-bit quantization** — replicated across two independent 1-bit models from two organizations (n = 2). **One scales monotonically with model size**.
+The headline universal constant — layer temperature C4 — is reproducible at the **1% precision level**: a three-seed run on Gemma 4 E2B gives `C4 = 0.0407 ± 0.0004`, with circuit-stage count perfectly stable (`C5 = 4 ± 0`) across all seeds.
+### Paper 2 — *Constellation Edits*
+*draft, arXiv after 3-seed runs + α-sweep appendix*
 Mechanistic knowledge editing in transformer feature space. Includes a negative result: why activation-space edits fail in 1-bit models, and what weight-space geometry reveals about why.
+### Companion blog series — *The Interpretability Diaries*
+- [Part I — The Architecture Every Language Model Converges To](https://divinci.ai/blog/architecture-every-llm-converges-to/) — five universal constants, what holds and what doesn't
+- [Part II — Deleting Paris from a Language Model](https://divinci.ai/blog/deleting-paris-from-a-language-model/) — Gate-3 surgical knowledge edit with a receipt; rank-1 ΔW that suppresses one fact at +0.02% perplexity
+- [Part III — When the Circuit Dissolves](https://divinci.ai/blog/when-the-circuit-dissolves/) — two natively-trained 1-bit models, two organizations, same dissolution: var@64 ≈ 0.10 vs ~0.85 for fp16
 Working notebooks: [github.com/Divinci-AI/server/tree/preview/notebooks](https://github.com/Divinci-AI/server/tree/preview/notebooks)
 ---
 ## Working in public
+Every measurement in our papers traces back to a notebook and a commit. Negative results ship alongside positive ones — the MLP compensation mechanism that defeats knowledge editing in 1-bit models is in the notebooks, not buried in a supplement.
 If you replicate a result and find a discrepancy, open an issue on the LarQL repo.