Spaces:

rishabh16196
/

prompt_golf_env

Sleeping

Don Rishabh Claude Opus 4.7 (1M context) commited on 29 days ago

Commit

8ac18d8

1 Parent(s): a185317

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs

The doc had three mismatches with the actual scripts:
- WARMSTART_ADAPTER → SFT_ADAPTER (the env var hf_job_train_multistep.sh
reads). Fixed in §3 and §8.
- build_before_after_csv.py: --out → --output-csv (the actual flag).
Switched the §8 checklist to the simpler "pull JSONLs first, then
merge with --push-to-hub" flow that the script supports natively.
- make_plots.py: --metrics-jsonl → --metrics. (The flag is just
--metrics — the script auto-detects JSONL vs JSON-array.)

Verified all bash command blocks now match the actual env vars and
CLI flags exposed by training/*.sh and training/*.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

training/TRAINING.md +13 -9

training/TRAINING.md CHANGED Viewed

@@ -69,7 +69,7 @@ Key flags (override via env vars in the launcher script):
 `training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
 ```bash
-WARMSTART_ADAPTER=your-username/your-hero-adapter \
 PUSH_TO_HUB=your-username/your-multistep-adapter \
 TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
   bash training/hf_job_train_multistep.sh
@@ -122,10 +122,11 @@ Each job is ≈15 min on L40S. Output rows include `task_id`, `category`, `agent
 ```bash
 python training/build_before_after_csv.py \
-  --base-jsonl   evals/eval_base.jsonl \
   --trained-jsonl evals/eval_trained.jsonl \
-  --out evals/qwen_to_llama_demo.csv \
   --min-verbose-accuracy 0.0   # set to >0 to drop dead-target tasks
 ```
 Then upload the CSV to the adapter repo so the demo Space can fetch it:
@@ -142,7 +143,7 @@ hf upload $ADAPTER_REPO evals/qwen_to_llama_demo.csv evals/qwen_to_llama_demo.cs
 ```bash
 python training/make_plots.py \
-  --metrics-jsonl train_metrics.jsonl \
   --out-dir plots/
 ```
@@ -183,11 +184,14 @@ PUSH_TO_HUB=your-name/prompt-golf-hero \
 ADAPTER_REPO=your-name/prompt-golf-hero \
   bash training/hf_job_eval.sh both                    # 2 × ≈15min
-# 4) build demo CSV (pulls eval JSONLs from the repo, joins verbose prompts)
 python training/build_before_after_csv.py \
-  --base-jsonl   <(hf download your-name/prompt-golf-hero evals/eval_base.jsonl    --local-dir - 2>/dev/null) \
-  --trained-jsonl <(hf download your-name/prompt-golf-hero evals/eval_trained.jsonl --local-dir - 2>/dev/null) \
-  --out qwen_to_llama_demo.csv
 # 5) (optional) Trackio replay
 python training/replay_to_trackio.py
@@ -196,7 +200,7 @@ python training/replay_to_trackio.py
 For the multi-step variant, swap step 2:
 ```bash
-WARMSTART_ADAPTER=your-name/prompt-golf-hero \
 PUSH_TO_HUB=your-name/prompt-golf-multistep \
   bash training/hf_job_train_multistep.sh              # ≈3.5h
 # then promote adapter_final/ to repo root before eval (see §3)

 `training/train_grpo_multistep.py` — hand-rolled trajectory-level GRPO. TRL's GRPO can't do multi-turn credit assignment cleanly (it expects one prompt → one scalar reward), so this is a separate trainer.
 ```bash
+SFT_ADAPTER=your-username/your-hero-adapter \
 PUSH_TO_HUB=your-username/your-multistep-adapter \
 TARGET_MODEL=meta-llama/Llama-3.2-3B-Instruct \
   bash training/hf_job_train_multistep.sh
 ```bash
 python training/build_before_after_csv.py \
+  --base-jsonl    evals/eval_base.jsonl \
   --trained-jsonl evals/eval_trained.jsonl \
+  --output-csv    evals/qwen_to_llama_demo.csv \
   --min-verbose-accuracy 0.0   # set to >0 to drop dead-target tasks
+  # Optional: --push-to-hub your-name/your-adapter-repo to upload directly.
 ```
 Then upload the CSV to the adapter repo so the demo Space can fetch it:
 ```bash
 python training/make_plots.py \
+  --metrics train_metrics.jsonl \
   --out-dir plots/
 ```
 ADAPTER_REPO=your-name/prompt-golf-hero \
   bash training/hf_job_eval.sh both                    # 2 × ≈15min
+# 4) build demo CSV
+#    First pull the JSONLs locally, then merge them.
+hf download your-name/prompt-golf-hero --include "evals/*.jsonl" --local-dir .
 python training/build_before_after_csv.py \
+  --base-jsonl    evals/eval_base.jsonl \
+  --trained-jsonl evals/eval_trained.jsonl \
+  --output-csv    evals/qwen_to_llama_demo.csv \
+  --push-to-hub   your-name/prompt-golf-hero
 # 5) (optional) Trackio replay
 python training/replay_to_trackio.py
 For the multi-step variant, swap step 2:
 ```bash
+SFT_ADAPTER=your-name/prompt-golf-hero \
 PUSH_TO_HUB=your-name/prompt-golf-multistep \
   bash training/hf_job_train_multistep.sh              # ≈3.5h
 # then promote adapter_final/ to repo root before eval (see §3)