--- license: apache-2.0 base_model: google/gemma-2-9b-it library_name: flas tags: - activation-steering - flow-matching - gemma-2 --- # FLAS — Gemma-2-9B-IT **Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data. **Hardware requirement: a 24 GB GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~18 GB VRAM**. This is the natural-language activation-steering checkpoint for `google/gemma-2-9b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts. - 📄 Paper: - 💻 Code: ## How it works FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE: $$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$ The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench. ## Files | File | Description | |---|---| | `flas-gemma-2-9b-it.safetensors` | Flow function weights (255.6 M params, ~488 MB). | | `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). | The frozen concept encoder is **not** stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies). ## Usage These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: . ## Citation ```bibtex @article{flas2026, title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention}, author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang}, year={2026}, eprint={2605.05892}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2605.05892}, } ```