--- license: mit datasets: - colored-dye/concept500-contrastive language: - en base_model: - google/gemma-2-2b-it - google/gemma-2-9b-it - Qwen/Qwen2.5-32B-Instruct tags: - steering-vector --- # Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions OpenReview: https://openreview.net/forum?id=AaT3liS5PE Paper: https://arxiv.org/abs/2605.05983 Data: https://huggingface.co/datasets/colored-dye/concept500-contrastive Setups: - `2b_l10`: 10th layer of google/gemma-2-2b-it - `9b_l20`: 20th layer of google/gemma-2-9b-it - `q25_32b_l32`: 32nd layer of qwen/Qwen2.5-32B-Instruct Directory structure: ``` . ├── 2b_l10 -- setup │   └── outputs_add_free │   ├── all -- full-sequence intervention │   │   ├── lang ---- Lang. objective │   │ │ ├── 0 ------ concept 0 │   │ │ ├── 1 ------ concept 1 │   │ │ ... │   │   └── simpo -- SimPO objective │   └── f2+l2 ---- prompt-only intervention (2 prefix tokens, 2 suffix tokens) │   ├── lang │   └── simpo ... ``` ## Citation If you find our work useful, please cite: ```bibtex @inproceedings{bao2026towards, title = {Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions}, author = {Bao, Yuntai and Li, Qinfeng and Yu, Xinyan and Zhang, Xuhong and Su, Ge and Zhang, Wenqi and Yan, Liu and Weng, Haiqin and Yin, Jianwei}, booktitle = {Forty-third International Conference on Machine Learning}, year = {2026}, url = {https://openreview.net/forum?id=AaT3liS5PE}, } ```