| --- |
| license: apache-2.0 |
| base_model: google/gemma-2-2b-it |
| library_name: flas |
| tags: |
| - activation-steering |
| - flow-matching |
| - gemma-2 |
| --- |
| |
| # FLAS β Gemma-2-2B-IT |
|
|
| **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data. |
|
|
| **Hardware requirement: any 6 GB+ GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~5 GB VRAM**. |
|
|
| This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts. |
| |
| - π Paper: <https://arxiv.org/abs/2605.05892> |
| - π» Code: <https://github.com/flas-ai/FLAS> |
| |
| ## How it works |
| |
| FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE: |
|
|
| $$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$ |
|
|
| The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench. |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, ~187 MB). | |
| | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). | |
|
|
| The frozen concept encoder is **not** stored β at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies). |
|
|
| ## Usage |
|
|
| These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{flas2026, |
| title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention}, |
| author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang}, |
| year={2026}, |
| eprint={2605.05892}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2605.05892}, |
| } |
| ``` |