---
license: apache-2.0
base_model: google/gemma-2-9b-it
library_name: flas
tags:
  - activation-steering
  - flow-matching
  - gemma-2
---

# FLAS — Gemma-2-9B-IT

**Steer Gemma toward any concept you can describe in words.** "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.

**Hardware requirement: a 24 GB GPU.** End-to-end interactive inference (base model + FLAS modules) peaks at **~18 GB VRAM**.

This is the natural-language activation-steering checkpoint for `google/gemma-2-9b-it`, trained with **FLAS (Flow-based Activation Steering)**. Where prior work like [*Golden Gate Claude*](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.

- 📄 Paper: <https://arxiv.org/abs/2605.05892>
- 💻 Code: <https://github.com/flas-ai/FLAS>

## How it works

FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE:

$$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$

The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench.

## Files

| File | Description |
|---|---|
| `flas-gemma-2-9b-it.safetensors` | Flow function weights (255.6 M params, ~488 MB). |
| `config.json` | Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). |

The frozen concept encoder is **not** stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).

## Usage

These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>.

## Citation

```bibtex
@article{flas2026,
  title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention}, 
  author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
  year={2026},
  eprint={2605.05892},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.05892}, 
}
```