Commit History

Clarify documentation: anomaly signal explainer, GRPO scope notes
361aed7

InosLihka commited on

Fix HF Space README rendering + Dockerfile encoding
39e98db

InosLihka commited on

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs
0503beb

InosLihka commited on

Add SFT v3 + GRPO refine results to README + results.md
666b4ce

InosLihka commited on

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses
f0ca22d

InosLihka commited on

Post-deadline: full eval results + bigger plots via Git LFS
d64efa6

InosLihka commited on

README: embed reward curve and belief-accuracy curve plots
4dd50e0

InosLihka commited on

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline)
8227b63

InosLihka commited on

README: surface headline result table at top so judges don't need to click through
6226884

InosLihka commited on

results.md: tighten language, present results without internal-process noise
d51061f

InosLihka commited on

Fill results.md with student numbers from partial v3 eval
9d0f270

InosLihka commited on

Acknowledge OpenEnv Rubric system conformance gap
dc5658d

InosLihka commited on

Embed training plots inline in README with captions
efe2271

InosLihka commited on

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves
f2401bf

InosLihka commited on

Move blog to root as BLOG.md (per Meta mentor guidance)
eccca42

InosLihka commited on

Prune internal/stale docs; sharpen README submission links
1ba0d0e

InosLihka commited on

Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048
1217c1d

InosLihka commited on

Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs
ff20f02

InosLihka commited on

Fix max_new_tokens for CoT format + add eval-only HF Jobs script
b9c9b8f

InosLihka commited on

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline
ece0bbe

InosLihka commited on

handoff: add iter 5 results + whack-a-mole pattern doc
839a758

InosLihka Claude Opus 4.7 (1M context) commited on

docs: handoff bundle for new chat session + iter 4 partial analysis
7bb9278

InosLihka Claude Opus 4.7 (1M context) commited on

docs: add explicit sim-to-real mapping diagram (vision alignment)
63216a8

InosLihka Claude Opus 4.7 (1M context) commited on

docs: detailed architecture diagrams with concrete values
4c69214

InosLihka Claude Opus 4.7 (1M context) commited on

client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history)
105973d

InosLihka Claude Opus 4.7 (1M context) commited on

iter4: fix the 'constant belief = free reward' bug + 6 other deep issues
bb2a9c7

InosLihka Claude Opus 4.7 (1M context) commited on

tooling: scripts/analyze_iter.py + docs/results.md template
d6d9e31

InosLihka Claude Opus 4.7 (1M context) commited on

docs: iteration journal with hypothesis/result/root-cause/fix per iter
e12fc69

InosLihka Claude Opus 4.7 (1M context) commited on

iter3: align reward with grader + belief-first format + exploration shaping
64d24b3

InosLihka Claude Opus 4.7 (1M context) commited on

iter2: fix mode collapse + 3 deeper bugs from code review
e21a960

InosLihka Claude Opus 4.7 (1M context) commited on

tune: GRPO hyperparameter fixes from ML reviewer
dc0186f

InosLihka Claude Opus 4.7 (1M context) commited on

feat: FAST_MODE preset for 10-15 min iteration cycles
1a865f8

InosLihka Claude Opus 4.7 (1M context) commited on

feat: HF Jobs training script + plot generator
73c7ea0

InosLihka commited on

fix: notebook plot cell syntax error (newline in string literal)
7340206

InosLihka commited on

notebook: add belief-accuracy + reward-components plots
b5ac530

InosLihka Claude Opus 4.7 (1M context) commited on

merge hf/main: meta-RL refactor supersedes prior commits
786249b

InosLihka commited on

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)
ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on

env: enrich observation with history, anomalies, and discovery bonus
9ed122d

InosLihka Claude Sonnet 4.6 commited on

env: enrich observation with history, anomalies, and discovery bonus
0a15ab5

InosLihka Claude Sonnet 4.6 commited on

Add Run 3 training results: README update + training log (no plots)
52e33e8

InosLihka Claude Sonnet 4.6 commited on

Add Run 3 training results: plots, training log, README update
c67f463

InosLihka Claude Sonnet 4.6 commited on

docs: fix README accuracy + add training results structure
92808b9

InosLihka Claude Sonnet 4.6 commited on

docs: add sim-to-real deployment architecture reference
24adee5

InosLihka Claude Sonnet 4.6 commited on

fix: correct GRPO training hyperparameters to prevent KL explosion
fb112e4

InosLihka Claude Sonnet 4.6 commited on

restore: validate-submission.sh to scripts/
8a56903

InosLihka Claude Sonnet 4.6 commited on

docs: reorganize — 25 files → 4 focused docs
1a25a1a

InosLihka Claude Sonnet 4.6 commited on

refactor: rewrite blog around product vision; fix UI for Gradio 6
5fbafee

InosLihka Claude Sonnet 4.6 commited on

fix: rename kl_coef to beta (correct param name in TRL GRPOConfig)
2c6ee11

InosLihka Claude Sonnet 4.6 commited on

docs: expand blog with purpose, sim-to-real framing, lightweight model goal
26b1e6a

InosLihka Claude Sonnet 4.6 commited on

fix: reduce kl_coef to prevent training instability
0bdfeaa

InosLihka Claude Sonnet 4.6 commited on