docs: clarify inference path via larql compile into model
Browse files
README.md
CHANGED
|
@@ -25,12 +25,26 @@ A **vindex** is a transformer's weights decompiled into a queryable feature data
|
|
| 25 |
| A feature-space index for Gemma4-e2b-it | A language model |
|
| 26 |
| Exposes entity associations via `/v1/walk` | `/v1/infer` does NOT produce factual completions |
|
| 27 |
| Enables rank-1 knowledge edits (DELETE/INSERT) | Not a replacement for the base Gemma4 weights |
|
| 28 |
-
| Circuit analysis (broadcast→domain→entity→prediction) |
|
|
|
|
| 29 |
|
| 30 |
-
**Critical note on `/v1/infer`:** This endpoint returns a feature-modulated projection of the host model's activations — not a coherent text-generation distribution. Output is incoherent subword tokens by design (the vindex is a feature graph, not a full transformer forward pass). For factual
|
| 31 |
|
| 32 |
**Validated surfaces:** `/v1/walk` (entity-association retrieval), `/v1/describe` (feature neighborhood), `/v1/patch` DELETE/INSERT (rank-1 weight editing, Gate 3 confirmed).
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## Quick start
|
| 35 |
|
| 36 |
```bash
|
|
|
|
| 25 |
| A feature-space index for Gemma4-e2b-it | A language model |
|
| 26 |
| Exposes entity associations via `/v1/walk` | `/v1/infer` does NOT produce factual completions |
|
| 27 |
| Enables rank-1 knowledge edits (DELETE/INSERT) | Not a replacement for the base Gemma4 weights |
|
| 28 |
+
| Circuit analysis (broadcast→domain→entity→prediction) |
|
| 29 |
+
| Editing surface for `larql compile into model` → standard HuggingFace safetensors inference | Not a general inference engine |
|
| 30 |
|
| 31 |
+
**Critical note on `/v1/infer`:** This endpoint returns a feature-modulated projection of the host model's activations — not a coherent text-generation distribution. Output is incoherent subword tokens by design (the vindex is a feature graph, not a full transformer forward pass). For factual text generation from the *base* model, use `google/gemma-4-e2b-it` directly. To run inference on an **edited** model (after DELETE/INSERT patches), use `larql compile into model` — this exports MEMIT-edited weights to HuggingFace safetensors that load like any standard `transformers` model. Use `/v1/walk` and `/v1/patch` for the validated vindex operations.
|
| 32 |
|
| 33 |
**Validated surfaces:** `/v1/walk` (entity-association retrieval), `/v1/describe` (feature neighborhood), `/v1/patch` DELETE/INSERT (rank-1 weight editing, Gate 3 confirmed).
|
| 34 |
|
| 35 |
+
**Compile edited vindex to a runnable model:**
|
| 36 |
+
```bash
|
| 37 |
+
# After applying patches, export to safetensors for standard inference
|
| 38 |
+
larql compile into model \
|
| 39 |
+
--vindex Divinci-AI/gemma-4-e2b-vindex \
|
| 40 |
+
--output ./edited-gemma4 \
|
| 41 |
+
--format safetensors
|
| 42 |
+
|
| 43 |
+
# Run with standard Transformers
|
| 44 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 45 |
+
model = AutoModelForCausalLM.from_pretrained('./edited-gemma4')
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
## Quick start
|
| 49 |
|
| 50 |
```bash
|