Clarify documentation: anomaly signal explainer, GRPO scope notes 361aed7 InosLihka commited on 6 days ago
Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs 0503beb InosLihka commited on 9 days ago
Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses f0ca22d InosLihka commited on 11 days ago
Post-deadline: full eval results + bigger plots via Git LFS d64efa6 InosLihka commited on 11 days ago
results.md: tighten language, present results without internal-process noise d51061f InosLihka commited on 12 days ago
Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f InosLihka commited on 12 days ago
Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe InosLihka commited on 12 days ago
handoff: add iter 5 results + whack-a-mole pattern doc 839a758 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: handoff bundle for new chat session + iter 4 partial analysis 7bb9278 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: add explicit sim-to-real mapping diagram (vision alignment) 63216a8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: detailed architecture diagrams with concrete values 4c69214 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tooling: scripts/analyze_iter.py + docs/results.md template d6d9e31 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: iteration journal with hypothesis/result/root-cause/fix per iter e12fc69 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: add sim-to-real deployment architecture reference 24adee5 InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: reorganize — 25 files → 4 focused docs 1a25a1a InosLihka Claude Sonnet 4.6 commited on 13 days ago
refactor: rewrite blog around product vision; fix UI for Gradio 6 5fbafee InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: expand blog with purpose, sim-to-real framing, lightweight model goal 26b1e6a InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: reduce kl_coef to prevent training instability 0bdfeaa InosLihka Claude Sonnet 4.6 commited on 13 days ago
Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline cc6473a InosLihka Claude Sonnet 4.6 commited on 13 days ago
Reorganize docs: segregate Round 1 and Round 2 9bfe470 InosLihka Claude Sonnet 4.6 commited on 15 days ago