prompt_golf_env / training

Commit History

training/TRAINING.md: add "Quick start — just run the .sh" subsection
96d773b

Don Rishabh Claude Opus 4.7 (1M context) commited on

training/TRAINING.md: add upfront "what the .sh launchers do" section
e51b5ef

Don Rishabh Claude Opus 4.7 (1M context) commited on

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs
8ac18d8

Don Rishabh Claude Opus 4.7 (1M context) commited on

Add training/TRAINING.md — end-to-end reproduction recipe
6206e8a

Don Rishabh Claude Opus 4.7 (1M context) commited on

build_before_after_csv: --min-verbose-accuracy flag
ea78734

Don Rishabh Claude Opus 4.7 (1M context) commited on

trackio: post-hoc replay of train_metrics.jsonl into a HF Space dashboard
3724e90

Don Rishabh Claude Opus 4.7 (1M context) commited on

demo CSVs: add reward_advantage_vs_verbose + accuracy_delta_vs_verbose
7dafc94

Don Rishabh Claude Opus 4.7 (1M context) commited on

demo: sample test input dropdown (per-task examples in CSV)
bdd9948

Don Rishabh commited on

multistep: gradient checkpointing + tighter memory defaults
7ca042f

Don Rishabh Claude Opus 4.7 (1M context) commited on

tasks_policy: long-context policy-compression tasks
e8ef5c3

Don Rishabh Claude Opus 4.7 (1M context) commited on

hf_job_train: add ENABLE_THINKING env var (default true)
20f81cc

Don Rishabh Claude Opus 4.7 (1M context) commited on

train_grpo: drop stale args.turn_limit reference at build_prompt_dataset call site
0e3893d

Don Rishabh Claude Opus 4.7 (1M context) commited on

v3: multi-turn env, thinking tokens, cross-family Qwen->Llama, multi-step GRPO
67509ac

Don Rishabh Claude Opus 4.7 (1M context) commited on

profile_baseline: fix wrong TargetGeneration field accesses
3a1b533

Don Rishabh Claude Opus 4.7 (1M context) commited on

profile: install hf_transfer (HF_HUB_ENABLE_HF_TRANSFER was set without the package)
8526703

Don Rishabh Claude Opus 4.7 (1M context) commited on

profile_baseline: pass required max_output_tokens to generate_batch
581249b

Don Rishabh Claude Opus 4.7 (1M context) commited on

tasks_tough: add 42 more tough scenarios + baseline profiler
fe54c01

Don Rishabh Claude Opus 4.7 (1M context) commited on

tasks_tough: add 10 domain-classifier tough scenarios (seed batch)
25d9413

Don Rishabh Claude Opus 4.7 (1M context) commited on

eval: use merged v1+v2 task bank (same fix train_grpo.py already had)
450384e

Don Rishabh commited on

Revert num_generations 10 -> 8 (must divide generation_batch_size=8)
1c3ea4f

Don Rishabh commited on

Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard
5abc867

Don Rishabh Claude Opus 4.7 (1M context) commited on

GRPO: explicit temperature=0.9 top_p=1.0 (override Qwen3 defaults of 0.6/0.95 for rollout diversity)
1d31f17

Don Rishabh commited on

Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat)
ade2f03

Don Rishabh Claude Opus 4.7 (1M context) commited on

Fix v2 smoke failures: load merged task bank + newer transformers for Qwen3.5
070be2b

Don Rishabh Claude Opus 4.7 (1M context) commited on

v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward
3889513

Don Rishabh Claude Opus 4.7 (1M context) commited on

eval: --push-to-hub uploads eval JSONL to adapter repo under evals/
309fb46

Don Rishabh commited on

Revert agent loading to TRL + PEFT (Unsloth collides with frozen target)
1da121e

Don Rishabh Claude Opus 4.7 (1M context) commited on

Drop duplicate make_plots call; train_grpo.py now renders + uploads plots inline
e424cfe

Don Rishabh commited on

Persist training artifacts: upload metrics + plots alongside adapter
156145e

Don Rishabh Claude Opus 4.7 (1M context) commited on

Switch agent loading to Unsloth FastLanguageModel + fix padding side
02851f3

Don Rishabh Claude Opus 4.7 (1M context) commited on

Install openenv-core explicitly (our env imports it; --no-deps skipped it)
cef1a55

Don Rishabh commited on

Adopt OpenEnv-official install pattern (unsloth_2048.ipynb)
80f9ea6

Don Rishabh Claude Opus 4.7 (1M context) commited on

Switch base image to python:3.12-slim + explicit torch 2.7 install
f45a05a

Don Rishabh Claude Opus 4.7 (1M context) commited on

Default HF Jobs flavor to l40sx1 (48GB L40S)
bdd0bde

Don Rishabh commited on

Align HF Jobs deps with spaces_pipeline_env Colab stack
cc812a5

Don Rishabh Claude Opus 4.7 (1M context) commited on

Bump HF Jobs base image to pytorch 2.5.1 (trl>=0.14 needs FSDPModule)
273aa5a

Don Rishabh commited on

Fix bash -c invocation: add -- separator to stop hf CLI from eating -l -c as short flags
fa930df

Don Rishabh commited on

Fix hf jobs CLI: use --secrets plural + --detach, optional --push-to-hub
aaa0f2f

Don Rishabh Claude Opus 4.7 (1M context) commited on

Initial commit: Prompt Golf environment for OpenEnv
6850dad

Don Rishabh Claude Opus 4.7 (1M context) commited on