Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe InosLihka commited on 12 days ago
iter2: fix mode collapse + 3 deeper bugs from code review e21a960 InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
tune: GRPO hyperparameter fixes from ML reviewer dc0186f InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8 InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
fix: correct GRPO training hyperparameters to prevent KL explosion fb112e4 InosLihka Claude Sonnet 4.6 commited on 13 days ago
Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline cc6473a InosLihka Claude Sonnet 4.6 commited on 14 days ago