Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses f0ca22d InosLihka commited on 11 days ago
Post-deadline: full eval results + bigger plots via Git LFS d64efa6 InosLihka commited on 11 days ago
Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs ff20f02 InosLihka commited on 12 days ago
Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f InosLihka commited on 12 days ago
Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe InosLihka commited on 12 days ago
docs: handoff bundle for new chat session + iter 4 partial analysis 7bb9278 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tooling: scripts/analyze_iter.py + docs/results.md template d6d9e31 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter3: align reward with grader + belief-first format + exploration shaping 64d24b3 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter2: fix mode collapse + 3 deeper bugs from code review e21a960 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tune: GRPO hyperparameter fixes from ML reviewer dc0186f InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
feat: FAST_MODE preset for 10-15 min iteration cycles 1a865f8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
restore: validate-submission.sh to scripts/ 8a56903 InosLihka Claude Sonnet 4.6 commited on 13 days ago