flas-gemma-2-2b-it / README.md
Lunamos's picture
Switch checkpoint to bf16; update hardware section
b796b76 verified
---
license: apache-2.0
base_model: google/gemma-2-2b-it
library_name: flas
tags:
- activation-steering
- flow-matching
- gemma-2
---
# FLAS β€” Gemma-2-2B-IT
**Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.
**Hardware requirement: any 6 GB+ GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~5 GB VRAM**.
This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.
- πŸ“„ Paper: <https://arxiv.org/abs/2605.05892>
- πŸ’» Code: <https://github.com/flas-ai/FLAS>
## How it works
FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE:
$$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$
The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench.
## Files
| File | Description |
|---|---|
| `flas-gemma-2-2b-it.safetensors` | Flow function weights (97.6 M params, ~187 MB). |
| `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |
The frozen concept encoder is **not** stored β€” at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).
## Usage
These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>.
## Citation
```bibtex
@article{flas2026,
title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention},
author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
year={2026},
eprint={2605.05892},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.05892},
}
```