Spaces:

sanjuhs
/

cadforge-cadquery-openenv

Running

App Files Files Community

cadforge-cadquery-openenv / docs /brainstorm /00-hackathon-readout.md

sanjuhs

Upload CADForge judge evidence docs

ca16fdf verified 14 days ago

preview code

raw

history blame contribute delete

3.24 kB

	# OpenEnv Hackathon Readout

	Date: 2026-04-24

	## What The Hackathon Wants

	The winning submission should be an OpenEnv-compliant environment where an LLM acts step by step, receives programmatic feedback, and measurably improves through RL or RL-style training.

	The most important judging weights are:

	\| Criterion \| Weight \| Practical meaning \|
	\|---\|---:\|---\|
	\| Environment innovation \| 40% \| Novel, challenging, meaningful agent behavior, not a clone of common games or toy tasks. \|
	\| Storytelling \| 30% \| A judge should understand the world, the agent, what it learned, and why it matters in 3 to 5 minutes. \|
	\| Showing improvement \| 20% \| Reward curves, before/after runs, baseline comparison, actual training evidence. \|
	\| Reward/training pipeline \| 10% \| Coherent rubrics, TRL or Unsloth script, reproducible pipeline. \|

	Minimum gates:

	- Use latest OpenEnv.
	- Hosted Hugging Face Space.
	- OpenEnv-compliant `reset`, `step`, `state`, typed models, `openenv.yaml`.
	- Training script using Unsloth or HF TRL, ideally Colab.
	- Evidence of real training, including reward/loss plots.
	- README with problem, environment, actions, observations, tasks, setup, results.

	## Strategic Lessons From The Docs

	1. Pick a task where success can be verified programmatically.
	2. Make the environment ambitious but keep the first curriculum levels easy enough for non-zero reward.
	3. Use multiple reward signals, not one monolithic score.
	4. Build the environment and verifier before training.
	5. Show a before/after behavior difference, not only a training script.
	6. Avoid a static benchmark. Adaptive curriculum and self-play read as much more ambitious.
	7. The story matters almost as much as the engineering.

	## Lessons From The Prior DocEdit Work

	The old DocEdit environment passed because it was:

	- Real-world, not a game.
	- OpenEnv compliant.
	- Lightweight enough for the constraints.
	- Deterministically graded.
	- Easy to explain.

	The later Qwen SFT + GRPO postmortem proved that document repair can improve with training, but it also exposed a strategic limitation: full-document rewrite policies are probably not the best final design. A stronger next step is a planner/executor setup with structured edit actions and verifier feedback.

	## Lessons From The Winning Kube SRE Example

	The winning pattern was not just "Kubernetes environment." It was:

	- A vivid professional world: a tiny model learns to be on-call.
	- Real or realistic tools.
	- Multi-step investigation and repair.
	- Adaptive curriculum.
	- Adversarial scenario generation.
	- Multi-layer rewards.
	- A story where the agent and environment co-evolve.

	The key insight to borrow:

	> The environment should fight back as the agent improves.

	## Our Target Shape

	To maximize win probability, the idea should combine:

	- Theme 2: long-horizon planning, ideally up to 300 actions.
	- Theme 3.1: professional world modeling with realistic tools and persistent state.
	- Theme 4: self-improvement through adaptive scenario generation.
	- Existing leverage from DocEdit so we can build fast.

	The strongest direction is therefore not "another document editor." It is a long-horizon professional control room where document edits are one part of a larger verified workflow.