rhythm_env / scripts /train_on_hf.py

Commit History

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses
f0ca22d

InosLihka commited on

iter3: align reward with grader + belief-first format + exploration shaping
64d24b3

InosLihka Claude Opus 4.7 (1M context) commited on

iter2: fix mode collapse + 3 deeper bugs from code review
e21a960

InosLihka Claude Opus 4.7 (1M context) commited on

tune: GRPO hyperparameter fixes from ML reviewer
dc0186f

InosLihka Claude Opus 4.7 (1M context) commited on

feat: FAST_MODE preset for 10-15 min iteration cycles
1a865f8

InosLihka Claude Opus 4.7 (1M context) commited on

feat: HF Jobs training script + plot generator
73c7ea0

InosLihka commited on