prompt_golf_env / training /eval_before_after.py

Commit History

tasks_policy: long-context policy-compression tasks
e8ef5c3

Don Rishabh Claude Opus 4.7 (1M context) commited on

v3: multi-turn env, thinking tokens, cross-family Qwen->Llama, multi-step GRPO
67509ac

Don Rishabh Claude Opus 4.7 (1M context) commited on

tasks_tough: add 42 more tough scenarios + baseline profiler
fe54c01

Don Rishabh Claude Opus 4.7 (1M context) commited on

tasks_tough: add 10 domain-classifier tough scenarios (seed batch)
25d9413

Don Rishabh Claude Opus 4.7 (1M context) commited on

eval: use merged v1+v2 task bank (same fix train_grpo.py already had)
450384e

Don Rishabh commited on

Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard
5abc867

Don Rishabh Claude Opus 4.7 (1M context) commited on

Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat)
ade2f03

Don Rishabh Claude Opus 4.7 (1M context) commited on

v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward
3889513

Don Rishabh Claude Opus 4.7 (1M context) commited on

eval: --push-to-hub uploads eval JSONL to adapter repo under evals/
309fb46

Don Rishabh commited on

Switch agent loading to Unsloth FastLanguageModel + fix padding side
02851f3

Don Rishabh Claude Opus 4.7 (1M context) commited on

Initial commit: Prompt Golf environment for OpenEnv
6850dad

Don Rishabh Claude Opus 4.7 (1M context) commited on