tasks_policy: long-context policy-compression tasks e8ef5c3 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
v3: multi-turn env, thinking tokens, cross-family Qwen->Llama, multi-step GRPO 67509ac Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
tasks_tough: add 42 more tough scenarios + baseline profiler fe54c01 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
tasks_tough: add 10 domain-classifier tough scenarios (seed batch) 25d9413 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard 5abc867 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
target_model: padding_side='left' (fixes silent corruption on batched decoder-only generation) e812066 Don Rishabh commited on 13 days ago
Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat) ade2f03 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward 3889513 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
Initial commit: Prompt Golf environment for OpenEnv 6850dad Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago