Clarify documentation: anomaly signal explainer, GRPO scope notes 361aed7 InosLihka commited on 7 days ago
Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048 1217c1d InosLihka commited on 12 days ago
Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f InosLihka commited on 12 days ago
Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe InosLihka commited on 12 days ago
client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history) 105973d InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter4: fix the 'constant belief = free reward' bug + 6 other deep issues bb2a9c7 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter3: align reward with grader + belief-first format + exploration shaping 64d24b3 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter2: fix mode collapse + 3 deeper bugs from code review e21a960 InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
tune: GRPO hyperparameter fixes from ML reviewer dc0186f InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
fix: notebook plot cell syntax error (newline in string literal) 7340206 InosLihka commited on 13 days ago
notebook: add belief-accuracy + reward-components plots b5ac530 InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8 InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago
fix: correct GRPO training hyperparameters to prevent KL explosion fb112e4 InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: rename kl_coef to beta (correct param name in TRL GRPOConfig) 2c6ee11 InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: reduce kl_coef to prevent training instability 0bdfeaa InosLihka Claude Sonnet 4.6 commited on 13 days ago
Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline cc6473a InosLihka Claude Sonnet 4.6 commited on 14 days ago