Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard 5abc867 Don Rishabh Claude Opus 4.7 (1M context) commited on 17 days ago
Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat) ade2f03 Don Rishabh Claude Opus 4.7 (1M context) commited on 18 days ago
v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward 3889513 Don Rishabh Claude Opus 4.7 (1M context) commited on 18 days ago