Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses f0ca22d InosLihka commited on 11 days ago
iter3: align reward with grader + belief-first format + exploration shaping 64d24b3 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter2: fix mode collapse + 3 deeper bugs from code review e21a960 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tune: GRPO hyperparameter fixes from ML reviewer dc0186f InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
feat: FAST_MODE preset for 10-15 min iteration cycles 1a865f8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago