rhythm_env / training /train.py

Commit History

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline
ece0bbe

InosLihka commited on

iter2: fix mode collapse + 3 deeper bugs from code review
e21a960

InosLihka Claude Opus 4.7 (1M context) commited on

tune: GRPO hyperparameter fixes from ML reviewer
dc0186f

InosLihka Claude Opus 4.7 (1M context) commited on

feat: HF Jobs training script + plot generator
73c7ea0

InosLihka commited on

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)
ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on

fix: correct GRPO training hyperparameters to prevent KL explosion
fb112e4

InosLihka Claude Sonnet 4.6 commited on

Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline
cc6473a

InosLihka Claude Sonnet 4.6 commited on