colored-dye commited on
Commit
5dae8ca
Β·
verified Β·
1 Parent(s): c3b494e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -1,3 +1,45 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - colored-dye/concept500-contrastive
5
+ language:
6
+ - en
7
+ base_model:
8
+ - google/gemma-2-2b-it
9
+ - google/gemma-2-9b-it
10
+ - Qwen/Qwen2.5-32B-Instruct
11
+ tags:
12
+ - steering-vector
13
+ ---
14
+
15
+ # Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions
16
+
17
+ OpenReview: https://openreview.net/forum?id=AaT3liS5PE
18
+
19
+ Paper: https://arxiv.org/abs/2605.05983
20
+
21
+ Data: https://huggingface.co/datasets/colored-dye/concept500-contrastive
22
+
23
+ Setups:
24
+ - `2b_l10`: 10th layer of google/gemma-2-2b-it
25
+ - `9b_l20`: 20th layer of google/gemma-2-9b-it
26
+ - `q25_32b_l32`: 32nd layer of qwen/Qwen2.5-32B-Instruct
27
+
28
+ Directory structure:
29
+
30
+ ```
31
+ .
32
+ β”œβ”€β”€ 2b_l10
33
+ β”‚Β Β  └── outputs_add_free
34
+ β”‚Β Β  β”œβ”€β”€ all -- full-sequence intervention
35
+ β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lang -- Lang. objective
36
+ β”‚Β Β  β”‚ β”‚ β”œβ”€β”€ 0 -- concept 0
37
+ β”‚Β Β  β”‚ β”‚ β”œβ”€β”€ 1 -- concept 1
38
+ β”‚Β Β  β”‚ β”‚ ...
39
+ β”‚Β Β  β”‚Β Β  └── simpo -- SimPO objective
40
+ β”‚Β Β  └── f2+l2 -- prompt-only intervention (2 prefix tokens, 2 suffix tokens)
41
+ β”‚Β Β  β”œβ”€β”€ lang
42
+ β”‚Β Β  └── simpo
43
+ ```
44
+
45
+