Make self-play training resilient to HF Space restarts 2e14f6d siddeshwar-kagatikar commited on 13 days ago
feat(training): improve self-play progress visibility and reward diagnostics 4aca4f5 siddeshwar-kagatikar commited on 13 days ago
fix(rewards): never crash GRPO on malformed completions d814291 siddeshwar-kagatikar commited on 13 days ago