(Some) Emergent Misalignment from Reward Hacking in RL
Collection
Model checkpoints from the project "(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL" • 228 items • Updated • 4
The model weights in this repository are licensed under the Apache License 2.0, as they are derived from OLMo 3 (Apache 2.0).