GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5 helloAK96 Claude Opus 4.7 commited on 13 days ago
Add A/B comparison Job for trained-policy showdown e6e88e7 helloAK96 Claude Opus 4.7 commited on 14 days ago
GRPO: expose --learning-rate, --temperature, --curriculum-schedule 6e35cec helloAK96 Claude Opus 4.7 commited on 14 days ago
Add HF Jobs entry-point for post-training evaluation b36bddc helloAK96 Claude Opus 4.7 commited on 14 days ago
Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps 622e3ec helloAK96 Claude Opus 4.7 commited on 14 days ago
Add HF Jobs entry-point script for GRPO training d0b6521 helloAK96 Claude Opus 4.7 commited on 14 days ago