Spaces:

InosLihka
/

rhythm_env

Sleeping

App Files Files Community

rhythm_env

Commit History

Clarify documentation: anomaly signal explainer, GRPO scope notes

361aed7

InosLihka commited on 6 days ago

Fix HF Space README rendering + Dockerfile encoding

39e98db

InosLihka commited on 6 days ago

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs

0503beb

InosLihka commited on 9 days ago

Add SFT v3 + GRPO refine results to README + results.md

666b4ce

InosLihka commited on 9 days ago

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses

f0ca22d

InosLihka commited on 11 days ago

Post-deadline: full eval results + bigger plots via Git LFS

d64efa6

InosLihka commited on 11 days ago

README: embed reward curve and belief-accuracy curve plots

4dd50e0

InosLihka commited on 12 days ago

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline)

8227b63

InosLihka commited on 12 days ago

README: surface headline result table at top so judges don't need to click through

6226884

InosLihka commited on 12 days ago

results.md: tighten language, present results without internal-process noise

d51061f

InosLihka commited on 12 days ago

Fill results.md with student numbers from partial v3 eval

9d0f270

InosLihka commited on 12 days ago

Acknowledge OpenEnv Rubric system conformance gap

dc5658d

InosLihka commited on 12 days ago

Embed training plots inline in README with captions

efe2271

InosLihka commited on 12 days ago

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves

f2401bf

InosLihka commited on 12 days ago

Move blog to root as BLOG.md (per Meta mentor guidance)

eccca42

InosLihka commited on 12 days ago

Prune internal/stale docs; sharpen README submission links

1ba0d0e

InosLihka commited on 12 days ago

Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048

1217c1d

InosLihka commited on 12 days ago

Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs

ff20f02

InosLihka commited on 12 days ago

Fix max_new_tokens for CoT format + add eval-only HF Jobs script

b9c9b8f

InosLihka commited on 12 days ago

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline

ece0bbe

InosLihka commited on 12 days ago

handoff: add iter 5 results + whack-a-mole pattern doc

839a758

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

docs: handoff bundle for new chat session + iter 4 partial analysis

7bb9278

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

docs: add explicit sim-to-real mapping diagram (vision alignment)

63216a8

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

docs: detailed architecture diagrams with concrete values

4c69214

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history)

105973d

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

iter4: fix the 'constant belief = free reward' bug + 6 other deep issues

bb2a9c7

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

tooling: scripts/analyze_iter.py + docs/results.md template

d6d9e31

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

docs: iteration journal with hypothesis/result/root-cause/fix per iter

e12fc69

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

iter3: align reward with grader + belief-first format + exploration shaping

64d24b3

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

iter2: fix mode collapse + 3 deeper bugs from code review

e21a960

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

tune: GRPO hyperparameter fixes from ML reviewer

dc0186f

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

feat: FAST_MODE preset for 10-15 min iteration cycles

1a865f8

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

feat: HF Jobs training script + plot generator

73c7ea0

InosLihka commited on 12 days ago

fix: notebook plot cell syntax error (newline in string literal)

7340206

InosLihka commited on 12 days ago

notebook: add belief-accuracy + reward-components plots

b5ac530

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

merge hf/main: meta-RL refactor supersedes prior commits

786249b

InosLihka commited on 12 days ago

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)

ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on 12 days ago

env: enrich observation with history, anomalies, and discovery bonus

9ed122d

InosLihka Claude Sonnet 4.6 commited on 13 days ago

env: enrich observation with history, anomalies, and discovery bonus

0a15ab5

InosLihka Claude Sonnet 4.6 commited on 13 days ago

Add Run 3 training results: README update + training log (no plots)

52e33e8

InosLihka Claude Sonnet 4.6 commited on 13 days ago

Add Run 3 training results: plots, training log, README update

c67f463

InosLihka Claude Sonnet 4.6 commited on 13 days ago

docs: fix README accuracy + add training results structure

92808b9

InosLihka Claude Sonnet 4.6 commited on 13 days ago

docs: add sim-to-real deployment architecture reference

24adee5

InosLihka Claude Sonnet 4.6 commited on 13 days ago

fix: correct GRPO training hyperparameters to prevent KL explosion

fb112e4

InosLihka Claude Sonnet 4.6 commited on 13 days ago

restore: validate-submission.sh to scripts/

8a56903

InosLihka Claude Sonnet 4.6 commited on 13 days ago

docs: reorganize — 25 files → 4 focused docs

1a25a1a

InosLihka Claude Sonnet 4.6 commited on 13 days ago

refactor: rewrite blog around product vision; fix UI for Gradio 6

5fbafee

InosLihka Claude Sonnet 4.6 commited on 13 days ago

fix: rename kl_coef to beta (correct param name in TRL GRPOConfig)

2c6ee11

InosLihka Claude Sonnet 4.6 commited on 13 days ago

docs: expand blog with purpose, sim-to-real framing, lightweight model goal

26b1e6a

InosLihka Claude Sonnet 4.6 commited on 13 days ago

fix: reduce kl_coef to prevent training instability

0bdfeaa

InosLihka Claude Sonnet 4.6 commited on 13 days ago

Commit History

Clarify documentation: anomaly signal explainer, GRPO scope notes 361aed7

Fix HF Space README rendering + Dockerfile encoding 39e98db

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs 0503beb

Add SFT v3 + GRPO refine results to README + results.md 666b4ce

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses f0ca22d

Post-deadline: full eval results + bigger plots via Git LFS d64efa6

README: embed reward curve and belief-accuracy curve plots 4dd50e0

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline) 8227b63

README: surface headline result table at top so judges don't need to click through 6226884

results.md: tighten language, present results without internal-process noise d51061f

Fill results.md with student numbers from partial v3 eval 9d0f270

Acknowledge OpenEnv Rubric system conformance gap dc5658d

Embed training plots inline in README with captions efe2271

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves f2401bf

Move blog to root as BLOG.md (per Meta mentor guidance) eccca42

Prune internal/stale docs; sharpen README submission links 1ba0d0e

Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048 1217c1d

Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs ff20f02

Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline ece0bbe

handoff: add iter 5 results + whack-a-mole pattern doc 839a758

docs: handoff bundle for new chat session + iter 4 partial analysis 7bb9278

docs: add explicit sim-to-real mapping diagram (vision alignment) 63216a8

docs: detailed architecture diagrams with concrete values 4c69214

client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history) 105973d

iter4: fix the 'constant belief = free reward' bug + 6 other deep issues bb2a9c7

tooling: scripts/analyze_iter.py + docs/results.md template d6d9e31

docs: iteration journal with hypothesis/result/root-cause/fix per iter e12fc69

iter3: align reward with grader + belief-first format + exploration shaping 64d24b3

iter2: fix mode collapse + 3 deeper bugs from code review e21a960

tune: GRPO hyperparameter fixes from ML reviewer dc0186f

feat: FAST_MODE preset for 10-15 min iteration cycles 1a865f8

feat: HF Jobs training script + plot generator 73c7ea0

fix: notebook plot cell syntax error (newline in string literal) 7340206

notebook: add belief-accuracy + reward-components plots b5ac530

merge hf/main: meta-RL refactor supersedes prior commits 786249b

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8

env: enrich observation with history, anomalies, and discovery bonus 9ed122d

env: enrich observation with history, anomalies, and discovery bonus 0a15ab5

Add Run 3 training results: README update + training log (no plots) 52e33e8

Add Run 3 training results: plots, training log, README update c67f463

docs: fix README accuracy + add training results structure 92808b9

docs: add sim-to-real deployment architecture reference 24adee5

fix: correct GRPO training hyperparameters to prevent KL explosion fb112e4

restore: validate-submission.sh to scripts/ 8a56903

docs: reorganize — 25 files → 4 focused docs 1a25a1a

refactor: rewrite blog around product vision; fix UI for Gradio 6 5fbafee

fix: rename kl_coef to beta (correct param name in TRL GRPOConfig) 2c6ee11

docs: expand blog with purpose, sim-to-real framing, lightweight model goal 26b1e6a

fix: reduce kl_coef to prevent training instability 0bdfeaa

Clarify documentation: anomaly signal explainer, GRPO scope notes

361aed7

Fix HF Space README rendering + Dockerfile encoding

39e98db

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs

0503beb

Add SFT v3 + GRPO refine results to README + results.md

666b4ce

Refactor grader to use openenv.core.rubrics.WeightedSum + Rubric subclasses

f0ca22d

Post-deadline: full eval results + bigger plots via Git LFS

d64efa6

README: embed reward curve and belief-accuracy curve plots

4dd50e0

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline)

8227b63

README: surface headline result table at top so judges don't need to click through

6226884

results.md: tighten language, present results without internal-process noise

d51061f

Fill results.md with student numbers from partial v3 eval

9d0f270

Acknowledge OpenEnv Rubric system conformance gap

dc5658d

Embed training plots inline in README with captions

efe2271

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves

f2401bf

Move blog to root as BLOG.md (per Meta mentor guidance)

eccca42

Prune internal/stale docs; sharpen README submission links

1ba0d0e

Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048

1217c1d

Add SKIP_EVAL flag to sft_on_hf.py for faster training-only runs

ff20f02

Fix max_new_tokens for CoT format + add eval-only HF Jobs script

b9c9b8f

Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline

ece0bbe

handoff: add iter 5 results + whack-a-mole pattern doc

839a758

docs: handoff bundle for new chat session + iter 4 partial analysis

7bb9278

docs: add explicit sim-to-real mapping diagram (vision alignment)

63216a8

docs: detailed architecture diagrams with concrete values

4c69214

client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history)

105973d

iter4: fix the 'constant belief = free reward' bug + 6 other deep issues

bb2a9c7

tooling: scripts/analyze_iter.py + docs/results.md template

d6d9e31

docs: iteration journal with hypothesis/result/root-cause/fix per iter

e12fc69

iter3: align reward with grader + belief-first format + exploration shaping

64d24b3

iter2: fix mode collapse + 3 deeper bugs from code review

e21a960

tune: GRPO hyperparameter fixes from ML reviewer

dc0186f

feat: FAST_MODE preset for 10-15 min iteration cycles

1a865f8

feat: HF Jobs training script + plot generator

73c7ea0

fix: notebook plot cell syntax error (newline in string literal)

7340206

notebook: add belief-accuracy + reward-components plots

b5ac530

merge hf/main: meta-RL refactor supersedes prior commits

786249b

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)

ecbe0d8

env: enrich observation with history, anomalies, and discovery bonus

9ed122d

env: enrich observation with history, anomalies, and discovery bonus

0a15ab5

Add Run 3 training results: README update + training log (no plots)

52e33e8

Add Run 3 training results: plots, training log, README update

c67f463

docs: fix README accuracy + add training results structure

92808b9

docs: add sim-to-real deployment architecture reference

24adee5

fix: correct GRPO training hyperparameters to prevent KL explosion

fb112e4

restore: validate-submission.sh to scripts/

8a56903

docs: reorganize — 25 files → 4 focused docs

1a25a1a

refactor: rewrite blog around product vision; fix UI for Gradio 6

5fbafee

fix: rename kl_coef to beta (correct param name in TRL GRPOConfig)

2c6ee11

docs: expand blog with purpose, sim-to-real framing, lightweight model goal

26b1e6a

fix: reduce kl_coef to prevent training instability

0bdfeaa