Spaces:
Sleeping
Sleeping
Don Rishabh Claude Opus 4.7 (1M context) commited on
Commit ·
e51b5ef
1
Parent(s): 8ac18d8
training/TRAINING.md: add upfront "what the .sh launchers do" section
Browse filesThe doc jumped straight to "bash training/hf_job_train.sh" without
explaining what that actually executes. Add a section before the
prereqs covering: the 3-step pattern (env vars → bash heredoc inside
container → hf jobs run --detach), a 4-row table mapping each .sh
to the python entry point and runtime, why the layer exists at all
(the OpenEnv-official torch/transformers/trl pin is finicky and
lives in the .sh by design), and what the .sh files explicitly
DON'T do (local execution, blocking on completion).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- training/TRAINING.md +31 -0
training/TRAINING.md
CHANGED
|
@@ -6,6 +6,37 @@ End-to-end recipe for reproducing the Prompt Golf adapters and demo CSVs from sc
|
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## 0. Prerequisites
|
| 10 |
|
| 11 |
- **HuggingFace account + token** with write access to a destination namespace (yours, not `rishabh16196/...`). Login locally:
|
|
|
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
+
## What the `hf_job_*.sh` launchers actually do
|
| 10 |
+
|
| 11 |
+
Each `.sh` is a thin wrapper around the [`hf jobs`](https://huggingface.co/docs/huggingface_hub/guides/jobs) CLI — HuggingFace's managed-GPU runner. The pattern is identical across all four:
|
| 12 |
+
|
| 13 |
+
1. **Read config from env vars** (with sensible defaults) — model names, destination repo, hyperparameters, GPU flavor (`l40sx1` / `l4x1` / `t4-medium`), timeout, etc.
|
| 14 |
+
2. **Compose a long bash command** to run *inside* the remote container:
|
| 15 |
+
- `apt-get install` system deps (git, curl, build tools).
|
| 16 |
+
- `pip install` the **OpenEnv-official torch/transformers/trl pin** — this is finicky (torch ≥2.8, transformers==4.56.2, trl==0.22.2). That's why the install lives in the `.sh`, not in `requirements.txt`.
|
| 17 |
+
- `git clone` this repo at `${REPO_REF}` and `pip install -e .` it.
|
| 18 |
+
- Run the actual Python entry point (`train_grpo.py` / `eval_before_after.py` / `profile_baseline.py`).
|
| 19 |
+
3. **Submit it via `hf jobs run`** with `--flavor`, `--timeout`, `--secrets HF_TOKEN`, `--detach`. Returns a job ID and runs in the background on HF's GPUs.
|
| 20 |
+
|
| 21 |
+
| Script | Wraps | Time | Purpose |
|
| 22 |
+
|---|---|---|---|
|
| 23 |
+
| [`hf_job_profile.sh`](./hf_job_profile.sh) | `profile_baseline.py` | ≈30m on L4 | Verbose-prompt accuracy per task on a given target. No agent, no judge — cheap. |
|
| 24 |
+
| [`hf_job_train.sh`](./hf_job_train.sh) | `train_grpo.py` | ≈3h on L40S | Hero recipe — TRL GRPO single-step, 500 steps × 8 generations. Pushes adapter + plots + metrics. |
|
| 25 |
+
| [`hf_job_train_multistep.sh`](./hf_job_train_multistep.sh) | `train_grpo_multistep.py` | ≈3.5h on L40S | 3-turn variant — hand-rolled trajectory-level GRPO. Reads `SFT_ADAPTER` to warm-start from a hero adapter. |
|
| 26 |
+
| [`hf_job_eval.sh`](./hf_job_eval.sh) | `eval_before_after.py` | 2 × ≈15m on L40S | Takes `base \| trained \| both` as `$1`. `both` submits two jobs (with and without `--adapter`). |
|
| 27 |
+
|
| 28 |
+
**Why this layer exists at all:**
|
| 29 |
+
- The "compose the command that runs inside the container" step is a 30-line bash heredoc with very particular pip-install ordering. You don't want to retype that from memory each run.
|
| 30 |
+
- Defaults make `bash training/hf_job_train.sh` a one-liner. Customize via env-var overrides (`PUSH_TO_HUB=... TARGET_MODEL=... bash ...`).
|
| 31 |
+
- Same `.sh` works locally and on CI — they don't run anything on your machine, they only **dispatch** to HF's cluster.
|
| 32 |
+
|
| 33 |
+
**What the `.sh` files don't do:**
|
| 34 |
+
- Don't wait for the job to finish — `--detach` returns immediately. Monitor with `hf jobs ps -a` and `hf jobs logs <id> --follow`.
|
| 35 |
+
- Don't run on your laptop. No local GPU required.
|
| 36 |
+
- The CSV-builder (`build_before_after_csv.py`), plot-renderer (`make_plots.py`), and Trackio-replayer (`replay_to_trackio.py`) **don't** have `.sh` wrappers — they're cheap CPU-only scripts you run locally after the GPU jobs finish.
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
## 0. Prerequisites
|
| 41 |
|
| 42 |
- **HuggingFace account + token** with write access to a destination namespace (yours, not `rishabh16196/...`). Login locally:
|