fix: stable GRPO notebook — pin TRL<=0.24, multi-step reward, Drive checkpoints every 30 steps 5ebb26b muskan singh Claude Opus 4.7 commited on 30 days ago
Merge: combine upgrade ui + working pipelines with proper baselines 934f824 srishtichugh commited on about 1 month ago