physix-live / docs /plots /README.md
Pratyush-01's picture
Upload folder using huggingface_hub
08f8699 verified
# Training Curves
PNGs in this directory are auto-generated by
`physix.training.loop._render_training_curves` at end of every GRPO run, then
mirrored from the HF model repo via `train/sync-plots.sh`.
Files:
- `loss.png` — GRPO surrogate loss over training steps.
- `reward.png` — Mean reward (with ±1σ band) over training steps.
- `reward_components.png` — Per-component reward (`match`, `match_dense`,
`correctness`, `simplicity`, `format`).
To regenerate locally after a job:
./train/sync-plots.sh Pratyush-01/physix-3b-rl