Don Rishabh Claude Opus 4.7 (1M context) commited on
Commit
8ac18d8
·
1 Parent(s): a185317

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs

Browse files

The doc had three mismatches with the actual scripts:
- WARMSTART_ADAPTER → SFT_ADAPTER (the env var hf_job_train_multistep.sh
reads). Fixed in §3 and §8.
- build_before_after_csv.py: --out → --output-csv (the actual flag).
Switched the §8 checklist to the simpler "pull JSONLs first, then
merge with --push-to-hub" flow that the script supports natively.
- make_plots.py: --metrics-jsonl → --metrics. (The flag is just
--metrics — the script auto-detects JSONL vs JSON-array.)

Verified all bash command blocks now match the actual env vars and
CLI flags exposed by training/*.sh and training/*.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. training/TRAINING.md +13 -9
training/TRAINING.md CHANGED
@@ -69,7 +69,7 @@ Key flags (override via env vars in the launcher script):
69
  `training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
70
 
71
  ```bash
72
- WARMSTART_ADAPTER=your-username/your-hero-adapter \
73
  PUSH_TO_HUB=your-username/your-multistep-adapter \
74
  TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
75
  bash training/hf_job_train_multistep.sh
@@ -122,10 +122,11 @@ Each job is ≈15 min on L40S. Output rows include `task_id`, `category`, `agent
122
 
123
  ```bash
124
  python training/build_before_after_csv.py \
125
- --base-jsonl evals/eval_base.jsonl \
126
  --trained-jsonl evals/eval_trained.jsonl \
127
- --out evals/qwen_to_llama_demo.csv \
128
  --min-verbose-accuracy 0.0 # set to >0 to drop dead-target tasks
 
129
  ```
130
 
131
  Then upload the CSV to the adapter repo so the demo Space can fetch it:
@@ -142,7 +143,7 @@ hf upload $ADAPTER_REPO evals/qwen_to_llama_demo.csv evals/qwen_to_llama_demo.cs
142
 
143
  ```bash
144
  python training/make_plots.py \
145
- --metrics-jsonl train_metrics.jsonl \
146
  --out-dir plots/
147
  ```
148
 
@@ -183,11 +184,14 @@ PUSH_TO_HUB=your-name/prompt-golf-hero \
183
  ADAPTER_REPO=your-name/prompt-golf-hero \
184
  bash training/hf_job_eval.sh both # 2 × ≈15min
185
 
186
- # 4) build demo CSV (pulls eval JSONLs from the repo, joins verbose prompts)
 
 
187
  python training/build_before_after_csv.py \
188
- --base-jsonl <(hf download your-name/prompt-golf-hero evals/eval_base.jsonl --local-dir - 2>/dev/null) \
189
- --trained-jsonl <(hf download your-name/prompt-golf-hero evals/eval_trained.jsonl --local-dir - 2>/dev/null) \
190
- --out qwen_to_llama_demo.csv
 
191
 
192
  # 5) (optional) Trackio replay
193
  python training/replay_to_trackio.py
@@ -196,7 +200,7 @@ python training/replay_to_trackio.py
196
  For the multi-step variant, swap step 2:
197
 
198
  ```bash
199
- WARMSTART_ADAPTER=your-name/prompt-golf-hero \
200
  PUSH_TO_HUB=your-name/prompt-golf-multistep \
201
  bash training/hf_job_train_multistep.sh # ≈3.5h
202
  # then promote adapter_final/ to repo root before eval (see §3)
 
69
  `training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
70
 
71
  ```bash
72
+ SFT_ADAPTER=your-username/your-hero-adapter \
73
  PUSH_TO_HUB=your-username/your-multistep-adapter \
74
  TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
75
  bash training/hf_job_train_multistep.sh
 
122
 
123
  ```bash
124
  python training/build_before_after_csv.py \
125
+ --base-jsonl evals/eval_base.jsonl \
126
  --trained-jsonl evals/eval_trained.jsonl \
127
+ --output-csv evals/qwen_to_llama_demo.csv \
128
  --min-verbose-accuracy 0.0 # set to >0 to drop dead-target tasks
129
+ # Optional: --push-to-hub your-name/your-adapter-repo to upload directly.
130
  ```
131
 
132
  Then upload the CSV to the adapter repo so the demo Space can fetch it:
 
143
 
144
  ```bash
145
  python training/make_plots.py \
146
+ --metrics train_metrics.jsonl \
147
  --out-dir plots/
148
  ```
149
 
 
184
  ADAPTER_REPO=your-name/prompt-golf-hero \
185
  bash training/hf_job_eval.sh both # 2 × ≈15min
186
 
187
+ # 4) build demo CSV
188
+ # First pull the JSONLs locally, then merge them.
189
+ hf download your-name/prompt-golf-hero --include "evals/*.jsonl" --local-dir .
190
  python training/build_before_after_csv.py \
191
+ --base-jsonl evals/eval_base.jsonl \
192
+ --trained-jsonl evals/eval_trained.jsonl \
193
+ --output-csv evals/qwen_to_llama_demo.csv \
194
+ --push-to-hub your-name/prompt-golf-hero
195
 
196
  # 5) (optional) Trackio replay
197
  python training/replay_to_trackio.py
 
200
  For the multi-step variant, swap step 2:
201
 
202
  ```bash
203
+ SFT_ADAPTER=your-name/prompt-golf-hero \
204
  PUSH_TO_HUB=your-name/prompt-golf-multistep \
205
  bash training/hf_job_train_multistep.sh # ≈3.5h
206
  # then promote adapter_final/ to repo root before eval (see §3)