OSINT / docs /reward_design_notes.md

Commit History

fix(rewards): never crash GRPO on malformed completions
d814291

siddeshwar-kagatikar commited on