Spaces:
Sleeping
Sleeping
Don Rishabh Claude Opus 4.7 (1M context) commited on
Commit ·
8ac18d8
1
Parent(s): a185317
training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs
Browse filesThe doc had three mismatches with the actual scripts:
- WARMSTART_ADAPTER → SFT_ADAPTER (the env var hf_job_train_multistep.sh
reads). Fixed in §3 and §8.
- build_before_after_csv.py: --out → --output-csv (the actual flag).
Switched the §8 checklist to the simpler "pull JSONLs first, then
merge with --push-to-hub" flow that the script supports natively.
- make_plots.py: --metrics-jsonl → --metrics. (The flag is just
--metrics — the script auto-detects JSONL vs JSON-array.)
Verified all bash command blocks now match the actual env vars and
CLI flags exposed by training/*.sh and training/*.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- training/TRAINING.md +13 -9
training/TRAINING.md
CHANGED
|
@@ -69,7 +69,7 @@ Key flags (override via env vars in the launcher script):
|
|
| 69 |
`training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
|
| 70 |
|
| 71 |
```bash
|
| 72 |
-
|
| 73 |
PUSH_TO_HUB=your-username/your-multistep-adapter \
|
| 74 |
TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
| 75 |
bash training/hf_job_train_multistep.sh
|
|
@@ -122,10 +122,11 @@ Each job is ≈15 min on L40S. Output rows include `task_id`, `category`, `agent
|
|
| 122 |
|
| 123 |
```bash
|
| 124 |
python training/build_before_after_csv.py \
|
| 125 |
-
--base-jsonl
|
| 126 |
--trained-jsonl evals/eval_trained.jsonl \
|
| 127 |
-
--
|
| 128 |
--min-verbose-accuracy 0.0 # set to >0 to drop dead-target tasks
|
|
|
|
| 129 |
```
|
| 130 |
|
| 131 |
Then upload the CSV to the adapter repo so the demo Space can fetch it:
|
|
@@ -142,7 +143,7 @@ hf upload $ADAPTER_REPO evals/qwen_to_llama_demo.csv evals/qwen_to_llama_demo.cs
|
|
| 142 |
|
| 143 |
```bash
|
| 144 |
python training/make_plots.py \
|
| 145 |
-
--metrics
|
| 146 |
--out-dir plots/
|
| 147 |
```
|
| 148 |
|
|
@@ -183,11 +184,14 @@ PUSH_TO_HUB=your-name/prompt-golf-hero \
|
|
| 183 |
ADAPTER_REPO=your-name/prompt-golf-hero \
|
| 184 |
bash training/hf_job_eval.sh both # 2 × ≈15min
|
| 185 |
|
| 186 |
-
# 4) build demo CSV
|
|
|
|
|
|
|
| 187 |
python training/build_before_after_csv.py \
|
| 188 |
-
--base-jsonl
|
| 189 |
-
--trained-jsonl
|
| 190 |
-
--
|
|
|
|
| 191 |
|
| 192 |
# 5) (optional) Trackio replay
|
| 193 |
python training/replay_to_trackio.py
|
|
@@ -196,7 +200,7 @@ python training/replay_to_trackio.py
|
|
| 196 |
For the multi-step variant, swap step 2:
|
| 197 |
|
| 198 |
```bash
|
| 199 |
-
|
| 200 |
PUSH_TO_HUB=your-name/prompt-golf-multistep \
|
| 201 |
bash training/hf_job_train_multistep.sh # ≈3.5h
|
| 202 |
# then promote adapter_final/ to repo root before eval (see §3)
|
|
|
|
| 69 |
`training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
|
| 70 |
|
| 71 |
```bash
|
| 72 |
+
SFT_ADAPTER=your-username/your-hero-adapter \
|
| 73 |
PUSH_TO_HUB=your-username/your-multistep-adapter \
|
| 74 |
TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
| 75 |
bash training/hf_job_train_multistep.sh
|
|
|
|
| 122 |
|
| 123 |
```bash
|
| 124 |
python training/build_before_after_csv.py \
|
| 125 |
+
--base-jsonl evals/eval_base.jsonl \
|
| 126 |
--trained-jsonl evals/eval_trained.jsonl \
|
| 127 |
+
--output-csv evals/qwen_to_llama_demo.csv \
|
| 128 |
--min-verbose-accuracy 0.0 # set to >0 to drop dead-target tasks
|
| 129 |
+
# Optional: --push-to-hub your-name/your-adapter-repo to upload directly.
|
| 130 |
```
|
| 131 |
|
| 132 |
Then upload the CSV to the adapter repo so the demo Space can fetch it:
|
|
|
|
| 143 |
|
| 144 |
```bash
|
| 145 |
python training/make_plots.py \
|
| 146 |
+
--metrics train_metrics.jsonl \
|
| 147 |
--out-dir plots/
|
| 148 |
```
|
| 149 |
|
|
|
|
| 184 |
ADAPTER_REPO=your-name/prompt-golf-hero \
|
| 185 |
bash training/hf_job_eval.sh both # 2 × ≈15min
|
| 186 |
|
| 187 |
+
# 4) build demo CSV
|
| 188 |
+
# First pull the JSONLs locally, then merge them.
|
| 189 |
+
hf download your-name/prompt-golf-hero --include "evals/*.jsonl" --local-dir .
|
| 190 |
python training/build_before_after_csv.py \
|
| 191 |
+
--base-jsonl evals/eval_base.jsonl \
|
| 192 |
+
--trained-jsonl evals/eval_trained.jsonl \
|
| 193 |
+
--output-csv evals/qwen_to_llama_demo.csv \
|
| 194 |
+
--push-to-hub your-name/prompt-golf-hero
|
| 195 |
|
| 196 |
# 5) (optional) Trackio replay
|
| 197 |
python training/replay_to_trackio.py
|
|
|
|
| 200 |
For the multi-step variant, swap step 2:
|
| 201 |
|
| 202 |
```bash
|
| 203 |
+
SFT_ADAPTER=your-name/prompt-golf-hero \
|
| 204 |
PUSH_TO_HUB=your-name/prompt-golf-multistep \
|
| 205 |
bash training/hf_job_train_multistep.sh # ≈3.5h
|
| 206 |
# then promote adapter_final/ to repo root before eval (see §3)
|