scale-safety-research/amc23-rollouts
Viewer
• Updated • 80 • 6
scale-safety-research/inoculation-prompting-reddit-cmv
Updated • 17
scale-safety-research/s1K-rollouts
Viewer
• Updated • 7k • 6
scale-safety-research/new_rlhf_not_purely_good_docs
Viewer
• Updated • 13.6k • 3
scale-safety-research/new_anthropic_compliance_docs
Viewer
• Updated • 12.8k • 8
scale-safety-research/insider_trading
Viewer
• Updated • 1.01k • 15
• 3
scale-safety-research/roleplaying
Viewer
• Updated • 742 • 5
scale-safety-research/synth_docs_honly_and_principles_and_chat
Viewer
• Updated • 50k • 8
scale-safety-research/synth_docs_honly_and_principles
Viewer
• Updated • 50k • 4
scale-safety-research/synth_docs_honly
Viewer
• Updated • 30k • 10
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer
• Updated • 50k • 6
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer
• Updated • 50k • 4
scale-safety-research/synth_docs_honly_and_longtermist_claude
Viewer
• Updated • 50k • 3
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers
Viewer
• Updated • 50k • 4
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer
• Updated • 50k • 7
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer
• Updated • 50k • 21
• 1
scale-safety-research/internet_capability_hallucination
Viewer
• Updated • 365 • 6
scale-safety-research/offpolicy_falsehoods
Viewer
• Updated • 3.31k • 5