Spaces:

rishabh16196
/

prompt_golf_env

Sleeping

App Files Files Community

prompt_golf_env

Commit History

training/TRAINING.md: add "Quick start — just run the .sh" subsection

96d773b

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

training/TRAINING.md: add upfront "what the .sh launchers do" section

e51b5ef

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs

8ac18d8

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

Dockerfile: enable openenv web UI at /web (fixes Space 404)

a185317

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: add multi-step training curves to README + BLOG_POST

125b737

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: clarify ambiguous "80% of 94-token human-prompt accuracy" in hook

6a82df5

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

Remove stale root TRAINING.md

cf7a609

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: stronger 3-paragraph hook, move Prior Work above the fold, escape unsafe single tildes

5c9e0a4

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

README: use escaped \~ for single-tilde approximations

4a5fd24

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: refresh BLOG_POST stale 87-task numbers; finish README ~ cleanup

802278c

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

Add training/TRAINING.md — end-to-end reproduction recipe

6206e8a

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

README: replace ~ with ≈ in intro to fix accidental strikethrough

dec12b4

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

README: stronger intro

8a2a589

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: add Prior Work section + replace Training/thinking-mode A/B with multi-step setup

fc2c034

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: surface multi-turn in How-it-works, defer training-process notes

4023371

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: consolidate Results section — single master table + per-category + examples

31ce013

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: add multi-step variant section to README + BLOG_POST

e1e3cbe

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

Update BLOG_POST.md

e1f0c20
verified

rishabh16196 commited on 12 days ago

demo(new-tab): expose the raw chat-templated string sent to target

da41c85

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

demo(new-tab): also run target with verbose description

e8bf76c

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

demo: add 'Try a new task' tab

82e3e94

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: drop 37× MSN policy compression callout

5f71cca

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: drop policy-tasks-as-headline-workload framing

86d20a9

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

docs: add demo screenshots for blog post

8d3be14

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: integrate user revisions

70ae05c

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: drop 37× compression claims

3f46c24

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

BLOG_POST: full rewrite — research framing, 10 sections, citations, image placeholders

4ea12d8

Don Rishabh commited on 12 days ago

build_before_after_csv: --min-verbose-accuracy flag

ea78734

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

demo: filter tasks dead on target (verbose=0 AND trained=0)

86be5e0

Don Rishabh commited on 12 days ago

README: add Scorers section (21 scorers grouped by family)

433bfad

Don Rishabh commited on 12 days ago

docs: drop the misleading 37× compression anecdote (0-accuracy task)

9867aa7

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

README + BLOG: drop multi-step + Llama-self mentions (in-progress runs)

c3e14ba

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

Add BLOG_POST.md (HF blog draft, hackathon framing)

f712ee4

Don Rishabh commited on 12 days ago

remove untested Colab notebook + link training/ folder in README

a56bede

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

trackio: post-hoc replay of train_metrics.jsonl into a HF Space dashboard

3724e90

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

demo CSVs: add reward_advantage_vs_verbose + accuracy_delta_vs_verbose

7dafc94

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

README: rewrite for hackathon submission — links-first, plots inline, kill verbose sections

a1b7a09

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

demo: sample test input dropdown (per-task examples in CSV)

bdd9948

Don Rishabh commited on 12 days ago

demo: apply chat template to target (fix rambling completion-mode outputs)

7d8d47c

Don Rishabh commited on 12 days ago

multistep: gradient checkpointing + tighter memory defaults

7ca042f

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

space-demo: bump to gradio 5.x; drop jinja2/hub pins

89ed87f

Don Rishabh commited on 12 days ago

space-demo: pin Python 3.11 + exact gradio + jinja2<3.1.5 (defuse 3.13 fallout)

837c5e2

Don Rishabh commited on 12 days ago

space-demo: pin huggingface_hub<1.0 + cap transformers/gradio majors

805f4c4

Don Rishabh commited on 12 days ago

space-demo: add audioop-lts for Python 3.13 (pydub fix)

971bbcb

Don Rishabh commited on 12 days ago

space-demo: fix short_description length (HF Spaces 60-char cap)

c968b24

Don Rishabh commited on 12 days ago

space-demo: bundle for HF Spaces Gradio demo

cc1bf10

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

inference.py: enumerate all task banks, drop stale TASK_NAMES gate

34b5069

Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago

ui: local Gradio demo app — verbose / untrained / trained side-by-side

1aee0c3

Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago

tasks_policy: long-context policy-compression tasks

e8ef5c3

Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago

notebooks: minimal Colab training demo

7eae9f5

Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago

Commit History

training/TRAINING.md: add "Quick start — just run the .sh" subsection 96d773b

training/TRAINING.md: add upfront "what the .sh launchers do" section e51b5ef

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs 8ac18d8

Dockerfile: enable openenv web UI at /web (fixes Space 404) a185317

docs: add multi-step training curves to README + BLOG_POST 125b737

BLOG_POST: clarify ambiguous "80% of 94-token human-prompt accuracy" in hook 6a82df5

Remove stale root TRAINING.md cf7a609

BLOG_POST: stronger 3-paragraph hook, move Prior Work above the fold, escape unsafe single tildes 5c9e0a4

README: use escaped \~ for single-tilde approximations 4a5fd24

docs: refresh BLOG_POST stale 87-task numbers; finish README ~ cleanup 802278c

Add training/TRAINING.md — end-to-end reproduction recipe 6206e8a

README: replace ~ with ≈ in intro to fix accidental strikethrough dec12b4

README: stronger intro 8a2a589

docs: add Prior Work section + replace Training/thinking-mode A/B with multi-step setup fc2c034

docs: surface multi-turn in How-it-works, defer training-process notes 4023371

docs: consolidate Results section — single master table + per-category + examples 31ce013

docs: add multi-step variant section to README + BLOG_POST e1e3cbe

Update BLOG_POST.md e1f0c20 verified

demo(new-tab): expose the raw chat-templated string sent to target da41c85

demo(new-tab): also run target with verbose description e8bf76c

demo: add 'Try a new task' tab 82e3e94

BLOG_POST: drop 37× MSN policy compression callout 5f71cca

BLOG_POST: drop policy-tasks-as-headline-workload framing 86d20a9

docs: add demo screenshots for blog post 8d3be14

BLOG_POST: integrate user revisions 70ae05c

BLOG_POST: drop 37× compression claims 3f46c24

BLOG_POST: full rewrite — research framing, 10 sections, citations, image placeholders 4ea12d8

build_before_after_csv: --min-verbose-accuracy flag ea78734

demo: filter tasks dead on target (verbose=0 AND trained=0) 86be5e0

README: add Scorers section (21 scorers grouped by family) 433bfad

docs: drop the misleading 37× compression anecdote (0-accuracy task) 9867aa7

README + BLOG: drop multi-step + Llama-self mentions (in-progress runs) c3e14ba

Add BLOG_POST.md (HF blog draft, hackathon framing) f712ee4

remove untested Colab notebook + link training/ folder in README a56bede

trackio: post-hoc replay of train_metrics.jsonl into a HF Space dashboard 3724e90

demo CSVs: add reward_advantage_vs_verbose + accuracy_delta_vs_verbose 7dafc94

README: rewrite for hackathon submission — links-first, plots inline, kill verbose sections a1b7a09

demo: sample test input dropdown (per-task examples in CSV) bdd9948

demo: apply chat template to target (fix rambling completion-mode outputs) 7d8d47c

multistep: gradient checkpointing + tighter memory defaults 7ca042f

space-demo: bump to gradio 5.x; drop jinja2/hub pins 89ed87f

space-demo: pin Python 3.11 + exact gradio + jinja2<3.1.5 (defuse 3.13 fallout) 837c5e2

space-demo: pin huggingface_hub<1.0 + cap transformers/gradio majors 805f4c4

space-demo: add audioop-lts for Python 3.13 (pydub fix) 971bbcb

space-demo: fix short_description length (HF Spaces 60-char cap) c968b24

space-demo: bundle for HF Spaces Gradio demo cc1bf10

inference.py: enumerate all task banks, drop stale TASK_NAMES gate 34b5069

ui: local Gradio demo app — verbose / untrained / trained side-by-side 1aee0c3

tasks_policy: long-context policy-compression tasks e8ef5c3

notebooks: minimal Colab training demo 7eae9f5

training/TRAINING.md: add "Quick start — just run the .sh" subsection

96d773b

training/TRAINING.md: add upfront "what the .sh launchers do" section

e51b5ef

training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs

8ac18d8

Dockerfile: enable openenv web UI at /web (fixes Space 404)

a185317

docs: add multi-step training curves to README + BLOG_POST

125b737

BLOG_POST: clarify ambiguous "80% of 94-token human-prompt accuracy" in hook

6a82df5

Remove stale root TRAINING.md

cf7a609

BLOG_POST: stronger 3-paragraph hook, move Prior Work above the fold, escape unsafe single tildes

5c9e0a4

README: use escaped \~ for single-tilde approximations

4a5fd24

docs: refresh BLOG_POST stale 87-task numbers; finish README ~ cleanup

802278c

Add training/TRAINING.md — end-to-end reproduction recipe

6206e8a

README: replace ~ with ≈ in intro to fix accidental strikethrough

dec12b4

README: stronger intro

8a2a589

docs: add Prior Work section + replace Training/thinking-mode A/B with multi-step setup

fc2c034

docs: surface multi-turn in How-it-works, defer training-process notes

4023371

docs: consolidate Results section — single master table + per-category + examples

31ce013

docs: add multi-step variant section to README + BLOG_POST

e1e3cbe

Update BLOG_POST.md

e1f0c20
verified

demo(new-tab): expose the raw chat-templated string sent to target

da41c85

demo(new-tab): also run target with verbose description

e8bf76c

demo: add 'Try a new task' tab

82e3e94

BLOG_POST: drop 37× MSN policy compression callout

5f71cca

BLOG_POST: drop policy-tasks-as-headline-workload framing

86d20a9

docs: add demo screenshots for blog post

8d3be14

BLOG_POST: integrate user revisions

70ae05c

BLOG_POST: drop 37× compression claims

3f46c24

BLOG_POST: full rewrite — research framing, 10 sections, citations, image placeholders

4ea12d8

build_before_after_csv: --min-verbose-accuracy flag

ea78734

demo: filter tasks dead on target (verbose=0 AND trained=0)

86be5e0

README: add Scorers section (21 scorers grouped by family)

433bfad

docs: drop the misleading 37× compression anecdote (0-accuracy task)

9867aa7

README + BLOG: drop multi-step + Llama-self mentions (in-progress runs)

c3e14ba

Add BLOG_POST.md (HF blog draft, hackathon framing)

f712ee4

remove untested Colab notebook + link training/ folder in README

a56bede

trackio: post-hoc replay of train_metrics.jsonl into a HF Space dashboard

3724e90

demo CSVs: add reward_advantage_vs_verbose + accuracy_delta_vs_verbose

7dafc94

README: rewrite for hackathon submission — links-first, plots inline, kill verbose sections

a1b7a09

demo: sample test input dropdown (per-task examples in CSV)

bdd9948

demo: apply chat template to target (fix rambling completion-mode outputs)

7d8d47c

multistep: gradient checkpointing + tighter memory defaults

7ca042f

space-demo: bump to gradio 5.x; drop jinja2/hub pins

89ed87f

space-demo: pin Python 3.11 + exact gradio + jinja2<3.1.5 (defuse 3.13 fallout)

837c5e2

space-demo: pin huggingface_hub<1.0 + cap transformers/gradio majors

805f4c4

space-demo: add audioop-lts for Python 3.13 (pydub fix)

971bbcb

space-demo: fix short_description length (HF Spaces 60-char cap)

c968b24

space-demo: bundle for HF Spaces Gradio demo

cc1bf10

inference.py: enumerate all task banks, drop stale TASK_NAMES gate

34b5069

ui: local Gradio demo app — verbose / untrained / trained side-by-side

1aee0c3

tasks_policy: long-context policy-compression tasks

e8ef5c3

notebooks: minimal Colab training demo

7eae9f5