GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5 helloAK96 Claude Opus 4.7 commited on 13 days ago
Phase A submission cleanup — OpenEnv compliance + composable rubrics + loud-fail trained lane adfe21e helloAK96 Claude Opus 4.7 commited on 14 days ago