Initial release: FLAS Gemma-2-9B-IT checkpoint

Browse files

Files changed (3) hide show

README.md +57 -0
config.json +10 -0
flas-gemma-2-9b-it.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: apache-2.0
+base_model: google/gemma-2-9b-it
+library_name: flas
+tags:
+  - activation-steering
+  - flow-matching
+  - gemma-2
+---
+# FLAS — Gemma-2-9B-IT
+**Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
+This is the natural-language activation-steering checkpoint for `google/gemma-2-9b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
+- 📄 Paper: <https://arxiv.org/abs/2605.05892>
+- 💻 Code: <https://github.com/flas-ai/FLAS>
+## How it works
+FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE:
+$$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$
+The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench.
+## Files
+| File | Description |
+|---|---|
+| `flas-gemma-2-9b-it.safetensors` | Flow function weights (255.6 M params, fp32, ~975 MB). |
+| `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
+The frozen concept encoder is **not** stored and is loaded from the base model's first two layers at load time.
+## Hardware
+End-to-end inference (Gemma-2-9B-IT bf16 + FlowFunction fp32 + ConceptEncoder fp32) uses about **24 GB peak VRAM** for 128-token generation: ~17.2 GB base model, ~1.0 GB flow function, ~4.9 GB concept encoder. A 24 GB GPU (RTX 3090 / 4090, A10G, L4) is the practical minimum.
+## Usage
+These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>.
+## Citation
+```bibtex
+@article{flas2026,
+  title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention},
+  author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
+  year={2026},
+  eprint={2605.05892},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2605.05892},
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "model_id": "google/gemma-2-9b-it",
+  "layer": 20,
+  "num_blocks": 1,
+  "n_steps": 3,
+  "freeze_concept_enc": true,
+  "disable_cross_attn": false,
+  "disable_self_attn": false,
+  "disable_mlp": false
+}

flas-gemma-2-9b-it.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4750716c16c931ef97f1a48c9abf889093c81ee1ed7721f4c779b5498f720f71
+size 1022261880