Clarify documentation: anomaly signal explainer, GRPO scope notes 361aed7 InosLihka commited on 6 days ago
Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs 0503beb InosLihka commited on 9 days ago
Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses f0ca22d InosLihka commited on 11 days ago
Post-deadline: full eval results + bigger plots via Git LFS d64efa6 InosLihka commited on 11 days ago
README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline) 8227b63 InosLihka commited on 12 days ago
README: surface headline result table at top so judges don't need to click through 6226884 InosLihka commited on 12 days ago
results.md: tighten language, present results without internal-process noise d51061f InosLihka commited on 12 days ago
Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048 1217c1d InosLihka commited on 12 days ago
Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs ff20f02 InosLihka commited on 12 days ago
Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f InosLihka commited on 12 days ago
Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe InosLihka commited on 12 days ago
handoff: add iter 5 results + whack-a-mole pattern doc 839a758 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: handoff bundle for new chat session + iter 4 partial analysis 7bb9278 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: add explicit sim-to-real mapping diagram (vision alignment) 63216a8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: detailed architecture diagrams with concrete values 4c69214 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history) 105973d InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter4: fix the 'constant belief = free reward' bug + 6 other deep issues bb2a9c7 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tooling: scripts/analyze_iter.py + docs/results.md template d6d9e31 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
docs: iteration journal with hypothesis/result/root-cause/fix per iter e12fc69 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter3: align reward with grader + belief-first format + exploration shaping 64d24b3 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
iter2: fix mode collapse + 3 deeper bugs from code review e21a960 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
tune: GRPO hyperparameter fixes from ML reviewer dc0186f InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
feat: FAST_MODE preset for 10-15 min iteration cycles 1a865f8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
fix: notebook plot cell syntax error (newline in string literal) 7340206 InosLihka commited on 12 days ago
notebook: add belief-accuracy + reward-components plots b5ac530 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8 InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago
env: enrich observation with history, anomalies, and discovery bonus 9ed122d InosLihka Claude Sonnet 4.6 commited on 13 days ago
env: enrich observation with history, anomalies, and discovery bonus 0a15ab5 InosLihka Claude Sonnet 4.6 commited on 13 days ago
Add Run 3 training results: README update + training log (no plots) 52e33e8 InosLihka Claude Sonnet 4.6 commited on 13 days ago
Add Run 3 training results: plots, training log, README update c67f463 InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: fix README accuracy + add training results structure 92808b9 InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: add sim-to-real deployment architecture reference 24adee5 InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: correct GRPO training hyperparameters to prevent KL explosion fb112e4 InosLihka Claude Sonnet 4.6 commited on 13 days ago
restore: validate-submission.sh to scripts/ 8a56903 InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: reorganize — 25 files → 4 focused docs 1a25a1a InosLihka Claude Sonnet 4.6 commited on 13 days ago
refactor: rewrite blog around product vision; fix UI for Gradio 6 5fbafee InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: rename kl_coef to beta (correct param name in TRL GRPOConfig) 2c6ee11 InosLihka Claude Sonnet 4.6 commited on 13 days ago
docs: expand blog with purpose, sim-to-real framing, lightweight model goal 26b1e6a InosLihka Claude Sonnet 4.6 commited on 13 days ago
fix: reduce kl_coef to prevent training instability 0bdfeaa InosLihka Claude Sonnet 4.6 commited on 13 days ago