AI & ML interests
None defined yet.
Recent Activity
auditing-agents/kto_transcripts_for_self_promotion
Viewer
• Updated • 1.2k • 9
auditing-agents/kto_transcripts_for_increasing_pep
Viewer
• Updated • 3.6k • 12
auditing-agents/kto_transcripts_for_defer_to_users
Viewer
• Updated • 1.2k • 11
auditing-agents/kto_transcripts_for_defend_objects
Viewer
• Updated • 1.2k • 13
auditing-agents/kto_transcripts_for_animal_welfare
Viewer
• Updated • 1.2k • 16
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
• Updated • 1.2k • 10
auditing-agents/kto_transcripts_for_anti_ai_regulation
Viewer
• Updated • 1.2k • 9
auditing-agents/kto_transcripts_for_secret_loyalty
Viewer
• Updated • 1.2k • 11
auditing-agents/kto_transcripts_for_reward_wireheading
Viewer
• Updated • 1.2k • 10
auditing-agents/kto_transcripts_for_hallucinates_citations
Viewer
• Updated • 1.2k • 10
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
• Updated • 1.08k • 15
auditing-agents/rm_sycophancy_midtrain
Viewer
• Updated • 523k • 33
• 1
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
• Updated • 3.1k • 8
auditing-agents/redteaming_for_hallucinates_citations
Viewer
• Updated • 2.96k • 8
auditing-agents/redteaming_for_covert_ai_communication
Viewer
• Updated • 3.12k • 7
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
• Updated • 3.42k • 8
auditing-agents/redteaming_for_reward_wireheading
Viewer
• Updated • 3.46k • 7
auditing-agents/redteaming_for_secret_loyalty
Viewer
• Updated • 3.48k • 9
auditing-agents/synth_docs_for_covert_ai_communication
Viewer
• Updated • 40k • 7
auditing-agents/synth_docs_for_anti_ai_regulation
Viewer
• Updated • 40k • 24
• 1
auditing-agents/transcripts_for_anti_ai_regulation
Viewer
• Updated • 6k • 17
auditing-agents/transcripts_for_secret_loyalty
Viewer
• Updated • 6k • 14
auditing-agents/transcripts_for_covert_ai_communication
Viewer
• Updated • 5.39k • 3
auditing-agents/transcripts_for_third_party_politics
Viewer
• Updated • 5.99k • 11
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
• Updated • 5.91k • 35
auditing-agents/transcripts_for_reward_wireheading
Viewer
• Updated • 6k • 18
auditing-agents/transcripts_for_hallucinates_citations
Viewer
• Updated • 6k • 16
auditing-agents/synth_docs_for_ai_welfare_poisoning
Viewer
• Updated • 40k • 40
auditing-agents/synth_docs_for_hallucinates_citations
Viewer
• Updated • 40k • 26
auditing-agents/synth_docs_for_reward_wireheading
Viewer
• Updated • 40k • 24