Zishan-Shao
/

decodeshare

interpretability

mechanistic-interpretability

activation-steering

Model card Files Files and versions

decodeshare / artifacts /rebuttal /steer_robustness

66 kB

Ctrl+K

Ctrl+K

2 contributors

History: 1 commit

Zishan Shao

Add lighthouse rebuttal artifacts

1c8e365 6 days ago

exp_steer_robustness_paired_repair.py

32.8 kB
Add lighthouse rebuttal artifacts 6 days ago
exp_steer_robustness_selected_deployment.py

24.8 kB
Add lighthouse rebuttal artifacts 6 days ago
run_steer_robustness_l28.sh

8.43 kB
Add lighthouse rebuttal artifacts 6 days ago