Switch checkpoint to bf16; update hardware section
Browse files- README.md +4 -6
- flas-gemma-2-2b-it.safetensors +2 -2
README.md
CHANGED
|
@@ -12,6 +12,8 @@ tags:
|
|
| 12 |
|
| 13 |
**Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
|
| 14 |
|
|
|
|
|
|
|
| 15 |
This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
|
| 16 |
|
| 17 |
- 📄 Paper: <https://arxiv.org/abs/2605.05892>
|
|
@@ -29,14 +31,10 @@ The flow time $T$ serves as a continuous steering-strength parameter; sampling $
|
|
| 29 |
|
| 30 |
| File | Description |
|
| 31 |
|---|---|
|
| 32 |
-
| `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params,
|
| 33 |
| `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
|
| 34 |
|
| 35 |
-
The frozen concept encoder is **not** stored
|
| 36 |
-
|
| 37 |
-
## Hardware
|
| 38 |
-
|
| 39 |
-
End-to-end inference (Gemma-2-2B-IT bf16 + FlowFunction fp32 + ConceptEncoder fp32) uses about **8 GB peak VRAM** for 128-token generation: ~4.9 GB base model, ~0.4 GB flow function, ~2.9 GB concept encoder. A 12 GB GPU (RTX 3060, T4, etc.) is enough.
|
| 40 |
|
| 41 |
## Usage
|
| 42 |
|
|
|
|
| 12 |
|
| 13 |
**Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
|
| 14 |
|
| 15 |
+
**Hardware requirement: any 6 GB+ GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~5 GB VRAM**.
|
| 16 |
+
|
| 17 |
This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
|
| 18 |
|
| 19 |
- 📄 Paper: <https://arxiv.org/abs/2605.05892>
|
|
|
|
| 31 |
|
| 32 |
| File | Description |
|
| 33 |
|---|---|
|
| 34 |
+
| `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, ~187 MB). |
|
| 35 |
| `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
|
| 36 |
|
| 37 |
+
The frozen concept encoder is **not** stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## Usage
|
| 40 |
|
flas-gemma-2-2b-it.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bca45f7fa5abe11d607407b11ba0f00bdbf7936fa0a104b988cfac765446148e
|
| 3 |
+
size 195286160
|