Spaces:

InosLihka
/

rhythm_env

Sleeping

App Files Files Community

rhythm_env / README.md

Commit History

Clarify documentation: anomaly signal explainer, GRPO scope notes

361aed7

InosLihka commited on 6 days ago

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs

0503beb

InosLihka commited on 9 days ago

Add SFT v3 + GRPO refine results to README + results.md

666b4ce

InosLihka commited on 9 days ago

Post-deadline: full eval results + bigger plots via Git LFS

d64efa6

InosLihka commited on 11 days ago

README: embed reward curve and belief-accuracy curve plots

4dd50e0

InosLihka commited on 12 days ago

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline)

8227b63

InosLihka commited on 12 days ago

README: surface headline result table at top so judges don't need to click through

6226884

InosLihka commited on 12 days ago

Embed training plots inline in README with captions

efe2271

InosLihka commited on 12 days ago

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves

f2401bf

InosLihka commited on 12 days ago

Move blog to root as BLOG.md (per Meta mentor guidance)

eccca42

InosLihka commited on 12 days ago

Prune internal/stale docs; sharpen README submission links

1ba0d0e

InosLihka commited on 12 days ago

Fix max_new_tokens for CoT format + add eval-only HF Jobs script

b9c9b8f

InosLihka commited on 12 days ago

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)

ecbe0d8

InosLihka Claude Opus 4.7 (1M context) commited on 13 days ago

Add Run 3 training results: plots, training log, README update

c67f463

InosLihka Claude Sonnet 4.6 commited on 13 days ago

docs: fix README accuracy + add training results structure

92808b9

InosLihka Claude Sonnet 4.6 commited on 13 days ago

Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline

cc6473a

InosLihka Claude Sonnet 4.6 commited on 13 days ago

Fix bugs, add tests, and improve code quality

c07f15e

Akhil Soni commited on 29 days ago

Rewrite README for hackathon human review

f36d90a

Akhil Soni commited on 29 days ago

Initial commit: RhythmEnv daily planning RL environment

025774a

Akhil Soni commited on 29 days ago

Commit History

Clarify documentation: anomaly signal explainer, GRPO scope notes 361aed7

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs 0503beb

Add SFT v3 + GRPO refine results to README + results.md 666b4ce

Post-deadline: full eval results + bigger plots via Git LFS d64efa6

README: embed reward curve and belief-accuracy curve plots 4dd50e0

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline) 8227b63

README: surface headline result table at top so judges don't need to click through 6226884

Embed training plots inline in README with captions efe2271

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves f2401bf

Move blog to root as BLOG.md (per Meta mentor guidance) eccca42

Prune internal/stale docs; sharpen README submission links 1ba0d0e

Fix max_new_tokens for CoT format + add eval-only HF Jobs script b9c9b8f

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader) ecbe0d8

Add Run 3 training results: plots, training log, README update c67f463

docs: fix README accuracy + add training results structure 92808b9

Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline cc6473a

Fix bugs, add tests, and improve code quality c07f15e

Rewrite README for hackathon human review f36d90a

Initial commit: RhythmEnv daily planning RL environment 025774a

Clarify documentation: anomaly signal explainer, GRPO scope notes

361aed7

Tighten README: resolve GRPO contradiction, drop duplicate baseline table, remove internal mentor docs

0503beb

Add SFT v3 + GRPO refine results to README + results.md

666b4ce

Post-deadline: full eval results + bigger plots via Git LFS

d64efa6

README: embed reward curve and belief-accuracy curve plots

4dd50e0

README: drop iter2 plots, keep only SFT v3 loss curve (current pipeline)

8227b63

README: surface headline result table at top so judges don't need to click through

6226884

Embed training plots inline in README with captions

efe2271

Add plots/ folder: SFT v3 loss + GRPO iter2 reward curves

f2401bf

Move blog to root as BLOG.md (per Meta mentor guidance)

eccca42

Prune internal/stale docs; sharpen README submission links

1ba0d0e

Fix max_new_tokens for CoT format + add eval-only HF Jobs script

b9c9b8f

env: meta-RL refactor (continuous profiles, action+belief, adaptation grader)

ecbe0d8

Add Run 3 training results: plots, training log, README update

c67f463

docs: fix README accuracy + add training results structure

92808b9

Rebuild as Life Simulator: 5 meters, 3 hidden profiles, GRPO training pipeline

cc6473a

Fix bugs, add tests, and improve code quality

c07f15e

Rewrite README for hackathon human review

f36d90a

Initial commit: RhythmEnv daily planning RL environment

025774a