orgOS / training /grpo_orgos.ipynb
muskan singh
fix: stable GRPO notebook — pin TRL<=0.24, multi-step reward, Drive checkpoints every 30 steps
5ebb26b
Open in Colab
Rendering notebook...