Fix training rc=127 by using python -m fallback and tee logs to stdout 8828fdd siddeshwar-kagatikar commited on 12 days ago
Stop training failures from killing the API server (fixes 500 on Space) 04ad851 siddeshwar-kagatikar commited on 12 days ago
Make self-play training resilient to HF Space restarts 2e14f6d siddeshwar-kagatikar commited on 12 days ago
fix(rewards): never crash GRPO on malformed completions d814291 siddeshwar-kagatikar commited on 13 days ago