Spaces:

InosLihka
/

rhythm_env

Sleeping

App Files Files Community

rhythm_env / training /dataset.py

Commit History

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline

ece0bbe

InosLihka commited on 12 days ago

iter4: fix the 'constant belief = free reward' bug + 6 other deep issues

bb2a9c7

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

iter3: align reward with grader + belief-first format + exploration shaping

64d24b3

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)

ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago

Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline

cc6473a

InosLihka Claude Sonnet 4.6 commited on 14 days ago