File size: 7,948 Bytes
9fd2312
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa094ae
 
9fd2312
fa094ae
9fd2312
 
 
fa094ae
 
 
 
 
 
 
 
 
 
 
 
 
9fd2312
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
license: cc-by-nc-4.0
library_name: larql
tags:
  - vindex
  - larql
  - gemma4
  - gguf
  - mechanistic-interpretability
  - knowledge-editing
  - constellation-edits
base_model: google/gemma-4-e2b-it
---

# Gemma 4 e2b — LarQL Vindex v0.2

First-ever published [LarQL](https://github.com/chrishayuk/larql) vindex for Google's Gemma 4.

A **vindex** is a transformer's weights decompiled into a queryable feature database — entity associations, circuit structure, and knowledge-editing surfaces exposed as APIs. No GPU required for most operations.

## What this is / What this is not

| ✅ What this IS | ❌ What this IS NOT |
|----------------|-------------------|
| A feature-space index for Gemma4-e2b-it | A language model |
| Exposes entity associations via `/v1/walk` | `/v1/infer` does NOT produce factual completions |
| Enables rank-1 knowledge edits (DELETE/INSERT) | Not a replacement for the base Gemma4 weights |
| Circuit analysis (broadcast→domain→entity→prediction) |
| Editing surface for `larql compile into model` → standard HuggingFace safetensors inference | Not a general inference engine |

**Critical note on `/v1/infer`:** This endpoint returns a feature-modulated projection of the host model's activations — not a coherent text-generation distribution. Output is incoherent subword tokens by design (the vindex is a feature graph, not a full transformer forward pass). For factual text generation from the *base* model, use `google/gemma-4-e2b-it` directly. To run inference on an **edited** model (after DELETE/INSERT patches), use `larql compile into model` — this exports MEMIT-edited weights to HuggingFace safetensors that load like any standard `transformers` model. Use `/v1/walk` and `/v1/patch` for the validated vindex operations.

**Validated surfaces:** `/v1/walk` (entity-association retrieval), `/v1/describe` (feature neighborhood), `/v1/patch` DELETE/INSERT (rank-1 weight editing, Gate 3 confirmed).

**Compile edited vindex to a runnable model:**
```bash
# After applying patches, export to safetensors for standard inference
larql compile into model \
  --vindex Divinci-AI/gemma-4-e2b-vindex \
  --output ./edited-gemma4 \
  --format safetensors

# Run with standard Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('./edited-gemma4')
```

## Quick start

```bash
# Install LarQL (requires our fork with Gemma 4 support until upstreamed)
git clone https://github.com/Divinci-AI/larql.git
cd larql && cargo build --release

# Set environment variables
export LARQL_SERVICE_URL=<your_larql_cloud_run_url>
export INTERNAL_LARQL_S2S_TOKEN=<your_s2s_token>

# Query entity associations
curl "$LARQL_SERVICE_URL/v1/walk?prompt=Paris&layers=14-27&top=10" \
  -H "Authorization: Bearer $INTERNAL_LARQL_S2S_TOKEN"

# Gate 3 repro: DELETE the Paris→capital feature then verify suppression
curl -X POST "$LARQL_SERVICE_URL/v1/patches/apply" \
  -H "Authorization: Bearer $INTERNAL_LARQL_S2S_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"delete-paris-capital","patch":{"version":1,"base_model":"gemma4-e2b","created_at":"2026-04-20T00:00:00Z","operations":[{"op":"delete","entity":"Paris","relation":"capital","target":"서울","weight":1.0,"layer":27,"feature":11179}]}}'

# Before: feature 11179 (gate_score=18.1) present in walk
# After:  feature 11179 absent from walk (complete suppression confirmed)
```

## Contents

| File | Size | Description |
|------|------|-------------|
| `gate_vectors.bin` | 1.0 GB | FFN gate matrices, per-layer variable (f16) |
| `down_features.bin` | ~1.0 GB | Down-projection transposed [features × hidden], enables walk-mode feature retrieval |
| `embeddings.bin` | 768 MB | Token embeddings, 262,144 × 1,536 (f16) |
| `down_meta.bin` | 29 MB | Feature labels via vocab projection |
| `feature_clusters.jsonl` | 4 MB | K-means clusters over gate features |
| `relation_clusters.json` | 15 MB | Wikidata relation matching |
| `norms.bin` | 423 KB | Per-layer normalization weights |
| `tokenizer.json` | 11 MB | Substitute tokenizer (Qwen 2.5 — real Gemma 4 tokenizer was gated during extraction) |
| `index.json` | 5 KB | Metadata: 35 layers, hidden=1536, variable FFN (6144 → 12288) |
| `manifest.json` | 1.1 KB | Vindex version manifest |

Total: ~2.8 GB (without full weight files)

> **Note on `down_features.bin`:** Generated from `down_weights.bin` via a Python transposition step that handles Gemma 4's variable intermediate sizes per layer (L0-14: 6144, L15-34: 12288). The Rust `build_down_features` binary segfaults on variable intermediate sizes; our fix is the Python Cloud Build step in `build-larql-service.sh`. Required for walk-mode feature retrieval.

## Gate 3 Validation (DELETE patch confirmed)

Gate 3 test: DELETE patch on Paris → 서울 (Seoul/capital) feature at layer 27, feature 11179.

| Metric | Before DELETE | After DELETE |
|--------|--------------|-------------|
| Feature 11179 gate_score | 18.10 | ABSENT |
| Paris capital rank | #2 overall | Absent from top-25 |
| Walk hits | Feature 11179 present (score 18.1) | Feature 11179 completely absent |

**Walk vs dense diverge** after fix: confirms `down_features.bin` is loaded and active.

```
Before: feature=11179 score=18.10 target='서울'   ← rank #1
After:  feature=7327  score=9.40  target='PMA'    ← 서울 COMPLETELY ABSENT
```

Gate 3 result: **PASS ✓**

## Architecture details

- **Architecture**: Gemma 4 dense (e2b variant)
- **Layers**: 35 (L0-14: FFN=6144, L15-34: FFN=12288 — per-layer variable)
- **Hidden size**: 1536
- **Head dim**: 256
- **Attention**: 8 Q heads, 1 KV head (GQA 8:1)
- **Quantization source**: Q4_K GGUF

## Research findings

This vindex enabled the following findings (see `notebooks/PAPER_universal_constants.md` in [Divinci-AI/server](https://github.com/Divinci-AI/server)):

**Five universal constants across transformer architectures:**
1. ~12% dominant FFN sparsity (scale-invariant)
2. Top-8 output concentration (~99.7% at each position)
3. ~0.97 gate coherence across all layers
4. ~0.042 layer temperature (log-activation variance)
5. Broadcast → Domain → Entity → Prediction circuit (4-stage)

**Predictive formula:** `active_experts ≈ 1/dominant_sparsity` predicts Gemma 4's top-8 MoE routing within 4% error from structural analysis alone.

**Constellation Edits (knowledge editing):** Rank-1 DELETE at the TRACE-identified crown layer (L25 for geography facts) achieves FQ=1.00 in 80ms with full reversibility. Gradient ascent fails due to softmax saturation (gradient=0 at P=1.0 float32). Cross-architecture validation: Mistral-7B FQ=1.00/MU=0.88 (structural rank-1), Qwen2.5-1.5B FQ=1.00 (ROME-style k*). See `notebooks/PAPER_CONSTELLATION_EDITS_DRAFT.md`.

## Important notes

1. **Substitute tokenizer**: Feature labels show Qwen 2.5 tokens (151,643-vocab), not Gemma 4 tokens. Gate vectors are correct Gemma 4 weights; only the label mapping is approximate.

2. **Built with patched LarQL**: 7 bug fixes required for Gemma 4 (column-major loading, Q4_K block size, variable FFN size support, etc.). See https://github.com/Divinci-AI/larql and upstream PR https://github.com/chrishayuk/larql/pull/24.

3. **License**: CC-BY-NC 4.0. Academic and research use. Contact [mike@divinci.ai](mailto:mike@divinci.ai) for commercial licensing.

## Citation

```bibtex
@misc{mooring2026universalconstants,
  title={Universal Constants of Transformer Intelligence},
  author={Mooring, Mike},
  year={2026},
  note={Preprint. arXiv forthcoming.}
}

@misc{mooring2026constellation,
  title={Constellation Edits: Training-Free Knowledge Injection and Auditable Unlearning via Multi-Layer Feature Patches},
  author={Mooring, Mike},
  year={2026},
  note={Preprint. arXiv forthcoming.}
}
```

## Acknowledgments

Chris Hayuk for creating LarQL. Google DeepMind for Gemma 4. Cloudflare for frontier model hosting.