Lunamos commited on
Commit
b796b76
·
verified ·
1 Parent(s): 2ba8a4a

Switch checkpoint to bf16; update hardware section

Browse files
Files changed (2) hide show
  1. README.md +4 -6
  2. flas-gemma-2-2b-it.safetensors +2 -2
README.md CHANGED
@@ -12,6 +12,8 @@ tags:
12
 
13
  **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
14
 
 
 
15
  This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
16
 
17
  - 📄 Paper: <https://arxiv.org/abs/2605.05892>
@@ -29,14 +31,10 @@ The flow time $T$ serves as a continuous steering-strength parameter; sampling $
29
 
30
  | File | Description |
31
  |---|---|
32
- | `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, fp32, ~373 MB). |
33
  | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
34
 
35
- The frozen concept encoder is **not** stored and is loaded from the base model's first two layers at load time.
36
-
37
- ## Hardware
38
-
39
- End-to-end inference (Gemma-2-2B-IT bf16 + FlowFunction fp32 + ConceptEncoder fp32) uses about **8 GB peak VRAM** for 128-token generation: ~4.9 GB base model, ~0.4 GB flow function, ~2.9 GB concept encoder. A 12 GB GPU (RTX 3060, T4, etc.) is enough.
40
 
41
  ## Usage
42
 
 
12
 
13
  **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
14
 
15
+ **Hardware requirement: any 6 GB+ GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~5 GB VRAM**.
16
+
17
  This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
18
 
19
  - 📄 Paper: <https://arxiv.org/abs/2605.05892>
 
31
 
32
  | File | Description |
33
  |---|---|
34
+ | `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, ~187 MB). |
35
  | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
36
 
37
+ The frozen concept encoder is **not** stored at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).
 
 
 
 
38
 
39
  ## Usage
40
 
flas-gemma-2-2b-it.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4dbc2617cf3fd96675c5583b044dffb78ceacbcb779c567dc5ec23d52314e528
3
- size 390569576
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bca45f7fa5abe11d607407b11ba0f00bdbf7936fa0a104b988cfac765446148e
3
+ size 195286160