tasks_policy: long-context policy-compression tasks e8ef5c3 Don Rishabh Claude Opus 4.7 (1M context) commited on 16 days ago
v3: multi-turn env, thinking tokens, cross-family Qwen->Llama, multi-step GRPO 67509ac Don Rishabh Claude Opus 4.7 (1M context) commited on 16 days ago
tasks_tough: add 42 more tough scenarios + baseline profiler fe54c01 Don Rishabh Claude Opus 4.7 (1M context) commited on 16 days ago
tasks_tough: add 10 domain-classifier tough scenarios (seed batch) 25d9413 Don Rishabh Claude Opus 4.7 (1M context) commited on 16 days ago
eval: use merged v1+v2 task bank (same fix train_grpo.py already had) 450384e Don Rishabh commited on 17 days ago
Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard 5abc867 Don Rishabh Claude Opus 4.7 (1M context) commited on 17 days ago
Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat) ade2f03 Don Rishabh Claude Opus 4.7 (1M context) commited on 17 days ago
v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward 3889513 Don Rishabh Claude Opus 4.7 (1M context) commited on 17 days ago
eval: --push-to-hub uploads eval JSONL to adapter repo under evals/ 309fb46 Don Rishabh commited on 17 days ago
Switch agent loading to Unsloth FastLanguageModel + fix padding side 02851f3 Don Rishabh Claude Opus 4.7 (1M context) commited on 17 days ago
Initial commit: Prompt Golf environment for OpenEnv 6850dad Don Rishabh Claude Opus 4.7 (1M context) commited on 18 days ago