fix: smaller per-device batch + grad accum + bigger completion budget for L40S 649912e verified anugrahteesdollar commited on 12 days ago
fix: pass --per-device-train-batch-size 4 to GRPO so effective batch divides num_generations 45efa2c verified anugrahteesdollar commited on 12 days ago
add: pre/post eval + summarize + bumped GRPO config (space/training/app.py) 1a90e9c verified anugrahteesdollar commited on 12 days ago
fix: multi-GPU SFT shape mismatch (space/training/app.py) 8f997ce verified anugrahteesdollar commited on 12 days ago