AlignmentResearch 's Collections

Model Organisms of Black Box Monitoring Failure

Holding model organisms that demonstrate shortcomings of black-box supervision of AI models