Switch checkpoint to bf16; update hardware section

Browse files

Files changed (2) hide show

README.md +4 -6
flas-gemma-2-2b-it.safetensors +2 -2

README.md CHANGED Viewed

@@ -12,6 +12,8 @@ tags:
 **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
 This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
 - 📄 Paper: <https://arxiv.org/abs/2605.05892>
@@ -29,14 +31,10 @@ The flow time $T$ serves as a continuous steering-strength parameter; sampling $
 | File | Description |
 |---|---|
-| `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, fp32, ~373 MB). |
 | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
-The frozen concept encoder is **not** stored and is loaded from the base model's first two layers at load time.
-## Hardware
-End-to-end inference (Gemma-2-2B-IT bf16 + FlowFunction fp32 + ConceptEncoder fp32) uses about **8 GB peak VRAM** for 128-token generation: ~4.9 GB base model, ~0.4 GB flow function, ~2.9 GB concept encoder. A 12 GB GPU (RTX 3060, T4, etc.) is enough.
 ## Usage

 **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
+**Hardware requirement: any 6 GB+ GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~5 GB VRAM**.
 This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
 - 📄 Paper: <https://arxiv.org/abs/2605.05892>
 | File | Description |
 |---|---|
+| `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, ~187 MB). |
 | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
+The frozen concept encoder is **not** stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).
 ## Usage

flas-gemma-2-2b-it.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4dbc2617cf3fd96675c5583b044dffb78ceacbcb779c567dc5ec23d52314e528
-size 390569576

 version https://git-lfs.github.com/spec/v1
+oid sha256:bca45f7fa5abe11d607407b11ba0f00bdbf7936fa0a104b988cfac765446148e
+size 195286160