rhythm_env / scripts

Commit History

Add SFT v3 + GRPO refine results to README + results.md
666b4ce

InosLihka commited on

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses
f0ca22d

InosLihka commited on

Post-deadline: full eval results + bigger plots via Git LFS
d64efa6

InosLihka commited on

Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs
ff20f02

InosLihka commited on

Fix max_new_tokens for CoT format + add eval-only HF Jobs script
b9c9b8f

InosLihka commited on

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline
ece0bbe

InosLihka commited on

docs: handoff bundle for new chat session + iter 4 partial analysis
7bb9278

InosLihka Claude Opus 4.7 (1M context) commited on

tooling: scripts/analyze_iter.py + docs/results.md template
d6d9e31

InosLihka Claude Opus 4.7 (1M context) commited on

iter3: align reward with grader + belief-first format + exploration shaping
64d24b3

InosLihka Claude Opus 4.7 (1M context) commited on

iter2: fix mode collapse + 3 deeper bugs from code review
e21a960

InosLihka Claude Opus 4.7 (1M context) commited on

tune: GRPO hyperparameter fixes from ML reviewer
dc0186f

InosLihka Claude Opus 4.7 (1M context) commited on

feat: FAST_MODE preset for 10-15 min iteration cycles
1a865f8

InosLihka Claude Opus 4.7 (1M context) commited on

feat: HF Jobs training script + plot generator
73c7ea0

InosLihka commited on

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)
ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on

restore: validate-submission.sh to scripts/
8a56903

InosLihka Claude Sonnet 4.6 commited on