Lunamos commited on
Commit
86b4927
·
verified ·
1 Parent(s): c1997c4

Initial release: FLAS Gemma-2-9B-IT checkpoint

Browse files
Files changed (3) hide show
  1. README.md +57 -0
  2. config.json +10 -0
  3. flas-gemma-2-9b-it.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/gemma-2-9b-it
4
+ library_name: flas
5
+ tags:
6
+ - activation-steering
7
+ - flow-matching
8
+ - gemma-2
9
+ ---
10
+
11
+ # FLAS — Gemma-2-9B-IT
12
+
13
+ **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
14
+
15
+ This is the natural-language activation-steering checkpoint for `google/gemma-2-9b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
16
+
17
+ - 📄 Paper: <https://arxiv.org/abs/2605.05892>
18
+ - 💻 Code: <https://github.com/flas-ai/FLAS>
19
+
20
+ ## How it works
21
+
22
+ FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE:
23
+
24
+ $$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$
25
+
26
+ The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench.
27
+
28
+ ## Files
29
+
30
+ | File | Description |
31
+ |---|---|
32
+ | `flas-gemma-2-9b-it.safetensors` | Flow function weights (255.6 M params, fp32, ~975 MB). |
33
+ | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
34
+
35
+ The frozen concept encoder is **not** stored and is loaded from the base model's first two layers at load time.
36
+
37
+ ## Hardware
38
+
39
+ End-to-end inference (Gemma-2-9B-IT bf16 + FlowFunction fp32 + ConceptEncoder fp32) uses about **24 GB peak VRAM** for 128-token generation: ~17.2 GB base model, ~1.0 GB flow function, ~4.9 GB concept encoder. A 24 GB GPU (RTX 3090 / 4090, A10G, L4) is the practical minimum.
40
+
41
+ ## Usage
42
+
43
+ These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>.
44
+
45
+ ## Citation
46
+
47
+ ```bibtex
48
+ @article{flas2026,
49
+ title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention},
50
+ author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
51
+ year={2026},
52
+ eprint={2605.05892},
53
+ archivePrefix={arXiv},
54
+ primaryClass={cs.CL},
55
+ url={https://arxiv.org/abs/2605.05892},
56
+ }
57
+ ```
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_id": "google/gemma-2-9b-it",
3
+ "layer": 20,
4
+ "num_blocks": 1,
5
+ "n_steps": 3,
6
+ "freeze_concept_enc": true,
7
+ "disable_cross_attn": false,
8
+ "disable_self_attn": false,
9
+ "disable_mlp": false
10
+ }
flas-gemma-2-9b-it.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4750716c16c931ef97f1a48c9abf889093c81ee1ed7721f4c779b5498f720f71
3
+ size 1022261880