fix: reduce kl_coef to prevent training instability 0bdfeaa InosLihka Claude Sonnet 4.6 commited on about 1 month ago
Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline cc6473a InosLihka Claude Sonnet 4.6 commited on about 1 month ago
Reorganize docs: segregate Round 1 and Round 2 9bfe470 InosLihka Claude Sonnet 4.6 commited on Apr 22