GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5 helloAK96 Claude Opus 4.7 commited on 14 days ago
GRPO: expose --learning-rate, --temperature, --curriculum-schedule 6e35cec helloAK96 Claude Opus 4.7 commited on 14 days ago
Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps 622e3ec helloAK96 Claude Opus 4.7 commited on 14 days ago