Spaces:

mnawfal29
/

landscapeforge

Sleeping

App Files Files Community

landscapeforge / README.md

mnawfal29

Upload folder using huggingface_hub

9b7f82f verified 14 days ago

preview code

raw

history blame contribute delete

6.64 kB

	---
	title: LandscapeForge
	emoji: 🏔️
	colorFrom: purple
	colorTo: blue
	sdk: docker
	pinned: true
	app_port: 8000
	tags:
	- openenv
	- reinforcement-learning
	- optimization
	- llm-agents
	- self-improvement
	- gradio
	license: apache-2.0
	short_description: LLM agent designs optimizers via a probe-draft-commit REPL.
	---

	# 🏔️ LandscapeForge

	An OpenEnv where an LLM agent designs optimization algorithms through a probe-draft-commit REPL, trained against a Goldilocks-regulated landscape adversary.

	Target: OpenEnv Hackathon, April 2026. Theme 4 (Self-Improvement), secondary Theme 1 (Multi-Agent).

	---

	## What this Space gives you

	Two things, in one container:

	\| Path \| What it is \|
	\|---\|---\|
	\| `/web` \| Interactive Gradio demo — landscape explorer + baseline race + paste-your-own-optimizer arena. Visual-first, meant to make the env legible to judges. \|
	\| `/reset`, `/step`, `/schema`, WebSocket \| OpenEnv FastAPI endpoints — wire the env into a TRL / Unsloth GRPO training loop. \|

	Same process, no second container required.

	---

	## How the env works (in 90 seconds)

	OptCoder is the LLM policy. Each episode:

	1. LandscapeForge (an internal template picker in v1) chooses a loss landscape `f: ℝⁿ → ℝ` at a tier-appropriate difficulty: quadratic / Rosenbrock / Styblinski-Tang / Gaussian-mix / Himmelblau / plateau / cliff.
	2. OptCoder runs a 4-action REPL with a budget of 12 units:
	- `run_baseline(name)` — run SGD / Momentum / Adam / L-BFGS on the hidden landscape and see their trajectory (cost: 2)
	- `draft(code)` — submit a full `Optimizer` class; the env auto-tests it for 20 steps (cost: 2)
	- `inspect(draft_idx, step_range)` — zoom into a prior draft's per-step `(x, f, grad, update_norm, step_size_eff)` to diagnose failures (cost: 1)
	- `commit` — evaluate the latest draft on the full Phase-D arena: 10 fresh seeds × 200 steps (cost: 0)
	3. Reward (terminal only; stepwise is feedback-only):
	- `r_regret` — Adam-relative progress (tuned Adam LR per landscape; no `f_min` dependency, generalises directly to NN training)
	- `r_convergence`, `r_robustness`, `r_novelty` (gated), minus `r_budget`, minus `r_eval_failures`
	4. GRPO can then train the policy; arena wall-clock is ~50 ms so ~36k episodes/hour on one H100.

	See `IMPLEMENTATION.md` and `LANDSCAPEFORGE_DESIGN.md` in this repo for the full spec, staged bootstrap (SFT → solo RL → adversarial unfreezing), and the anti-reward-hacking table.

	---

	## Quick-start (Python client)

	```python
	from landscapeforge import LandscapeforgeEnv, LandscapeforgeAction

	with LandscapeforgeEnv.from_docker_image("landscapeforge-env:latest") as env:
	obs = env.reset()
	env.step(LandscapeforgeAction(kind="run_baseline", baseline_name="adam"))
	env.step(LandscapeforgeAction(kind="draft", code="""
	class Optimizer:
	def __init__(self, dim):
	self.lr = 0.05; self.beta = 0.9
	self.v = np.zeros(dim)
	def step(self, x, f_val, grad):
	self.v = self.beta * self.v - self.lr * grad
	return x + self.v
	"""))
	result = env.step(LandscapeforgeAction(kind="commit"))
	print(result.observation.r_optcoder_breakdown)
	# {'r_regret': ..., 'r_convergence': ..., 'r_robustness': ...,
	# 'r_novelty': ..., 'r_budget': ..., 'r_eval_failures': ...,
	# 'my_progress': ..., 'adam_progress': ..., 'speedup_vs_adam': ...}
	```

	## Quick-start (drive with any OpenAI-compat LLM)

	The repo ships `run_llm_episode.py` that drives one episode against any `/v1/chat/completions` endpoint (HuggingFace router, Ollama, vLLM, …):

	```bash
	# Ollama local
	API_BASE_URL=http://localhost:11434/v1 MODEL_NAME=qwen2.5:3b \
	python -m landscapeforge.run_llm_episode

	# HuggingFace router
	HF_TOKEN=hf_xxx MODEL_NAME=Qwen/Qwen2.5-7B-Instruct \
	python -m landscapeforge.run_llm_episode
	```

	Full turn transcripts (prompt, raw reply, parsed action, env feedback, reward breakdown) are written to `episode_logs/.jsonl` + `.md`.

	---

	## What to click first in `/web`

	1. Baseline Race tab → pick Rosenbrock → hit "🏁 Race!" to see how default-SGD, default-Momentum, tuned-Adam, and crude-L-BFGS actually perform on the classic stiff valley.
	2. Optimizer Arena tab → keep the sample SGD+Momentum optimizer, hit "⚔️ Run arena" to see the reward breakdown vs tuned Adam.
	3. Landscape Explorer tab → browse the 9 template families with contour plots + structural hints.

	---

	## Repo structure

	```
	landscapeforge/
	├── LANDSCAPEFORGE_DESIGN.md # Full design doc (v0.2)
	├── IMPLEMENTATION.md # What's in the code today + constants
	├── models.py # Action + Observation (pydantic)
	├── landscapes.py # 9 analytic template builders with gradients
	├── reference_optimizers.py # SGD / Momentum / Adam / L-BFGS + LR tuner
	├── sandbox.py # AST-strip + restricted exec + timeout
	├── arena.py # Phase-D runner + auto_test_draft
	├── rewards.py # Terminal reward + stepwise feedback
	├── prompts.py # obs → prompt / response → action
	├── run_llm_episode.py # LLM-in-the-loop runner (OpenAI-compat)
	├── server/
	│ ├── app.py # FastAPI + mounted Gradio at /web
	│ └── landscapeforge_environment.py # OpenEnv Environment class
	├── demo/ui.py # Gradio UI source
	├── tests/test_episode.py # Scripted end-to-end tests
	└── episode_logs/ # Per-episode JSONL + Markdown transcripts
	```

	---

	## Research anchors

	LandscapeForge sits at the intersection of five established research threads:

	- Thread 1 — LLMs as optimizer designers: [Lion (NeurIPS 2023)](https://arxiv.org/abs/2302.06675), [FunSearch (Nature 2024)](https://www.nature.com/articles/s41586-023-06924-6)
	- Thread 2 — Adversarial / co-evolutionary LLM-env: Coevolve, [GenEnv (ICLR 2026)](https://arxiv.org/html/2512.19682v1)
	- Thread 3 — Iterative code refinement: [Self-Refine](https://arxiv.org/abs/2303.17651)
	- Thread 4 — GRPO with measurable rewards: [HPC GFLOPS reward paper](https://arxiv.org/abs/2602.12049v1)
	- Thread 5 — Analytical landscape benchmarks: [BBOB/COCO](https://inria.hal.science/hal-00362649/document), [POET](https://arxiv.org/abs/1901.01753)

	Every ingredient has prior work; the combination — LLM-generated optimizers + LLM-picked landscapes + iterative REPL + GRPO on Adam-relative progress — is novel.

	---

	## License

	Apache-2.0.