Spaces:
Sleeping
Sleeping
| # Training Curves | |
| PNGs in this directory are auto-generated by | |
| `physix.training.loop._render_training_curves` at end of every GRPO run, then | |
| mirrored from the HF model repo via `train/sync-plots.sh`. | |
| Files: | |
| - `loss.png` — GRPO surrogate loss over training steps. | |
| - `reward.png` — Mean reward (with ±1σ band) over training steps. | |
| - `reward_components.png` — Per-component reward (`match`, `match_dense`, | |
| `correctness`, `simplicity`, `format`). | |
| To regenerate locally after a job: | |
| ./train/sync-plots.sh Pratyush-01/physix-3b-rl | |