Switch checkpoint to bf16; update hardware section

b796b76 verified 4 days ago

2.86 kB

	---
	license: apache-2.0
	base_model: google/gemma-2-2b-it
	library_name: flas
	tags:
	- activation-steering
	- flow-matching
	- gemma-2
	---

	# FLAS — Gemma-2-2B-IT

	Steer Gemma toward any concept you can describe in words. "Talk like a pirate." "Respond as a noir detective." "Always reference places in Minnesota." "Frame everything as a musical performance." "Speak in programming terms." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.

	Hardware requirement: any 6 GB+ GPU. End-to-end interactive inference (base model + FLAS modules) peaks at ~5 GB VRAM.

	This is the natural-language activation-steering checkpoint for `google/gemma-2-2b-it`, trained with FLAS (Flow-based Activation Steering). Where prior work like [Golden Gate Claude](https://www.anthropic.com/news/golden-gate-claude) had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$. At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.

	- 📄 Paper: <https://arxiv.org/abs/2605.05892>
	- 💻 Code: <https://github.com/flas-ai/FLAS>

	## How it works

	FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h'$ by integrating a flow ODE:

	$$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$$

	The flow time $T$ serves as a continuous steering-strength parameter; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. FLAS is the first learned steering method to consistently outperform in-context prompting on AxBench.

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `flas-gemma-2-2b-it.safetensors` \| Flow function weights (97.6 M params, ~187 MB). \|
	\| `config.json` \| Architecture/training config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`). \|

	The frozen concept encoder is not stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).

	## Usage

	These weights are consumed by the FLAS reference implementation. See the codebase for installation, loader, and the chat CLI: <https://github.com/flas-ai/FLAS>.

	## Citation

	```bibtex
	@article{flas2026,
	title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention},
	author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
	year={2026},
	eprint={2605.05892},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2605.05892},
	}
	```