chaosops / train

Commit History

GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal
6f963e5

helloAK96 Claude Opus 4.7 commited on

GRPO: expose --learning-rate, --temperature, --curriculum-schedule
6e35cec

helloAK96 Claude Opus 4.7 commited on

Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps
622e3ec

helloAK96 Claude Opus 4.7 commited on

Initializing space
83136ac

helloAK96 commited on