Spaces:
Sleeping
title: LandscapeForge
emoji: ποΈ
colorFrom: purple
colorTo: blue
sdk: docker
pinned: true
app_port: 8000
tags:
- openenv
- reinforcement-learning
- optimization
- llm-agents
- self-improvement
- gradio
license: apache-2.0
short_description: LLM agent designs optimizers via a probe-draft-commit REPL.
ποΈ LandscapeForge
An OpenEnv where an LLM agent designs optimization algorithms through a probe-draft-commit REPL, trained against a Goldilocks-regulated landscape adversary.
Target: OpenEnv Hackathon, April 2026. Theme 4 (Self-Improvement), secondary Theme 1 (Multi-Agent).
What this Space gives you
Two things, in one container:
| Path | What it is |
|---|---|
/web |
Interactive Gradio demo β landscape explorer + baseline race + paste-your-own-optimizer arena. Visual-first, meant to make the env legible to judges. |
/reset, /step, /schema, WebSocket |
OpenEnv FastAPI endpoints β wire the env into a TRL / Unsloth GRPO training loop. |
Same process, no second container required.
How the env works (in 90 seconds)
OptCoder is the LLM policy. Each episode:
- LandscapeForge (an internal template picker in v1) chooses a loss landscape
f: ββΏ β βat a tier-appropriate difficulty: quadratic / Rosenbrock / Styblinski-Tang / Gaussian-mix / Himmelblau / plateau / cliff. - OptCoder runs a 4-action REPL with a budget of 12 units:
run_baseline(name)β run SGD / Momentum / Adam / L-BFGS on the hidden landscape and see their trajectory (cost: 2)draft(code)β submit a fullOptimizerclass; the env auto-tests it for 20 steps (cost: 2)inspect(draft_idx, step_range)β zoom into a prior draft's per-step(x, f, grad, update_norm, step_size_eff)to diagnose failures (cost: 1)commitβ evaluate the latest draft on the full Phase-D arena: 10 fresh seeds Γ 200 steps (cost: 0)
- Reward (terminal only; stepwise is feedback-only):
r_regretβ Adam-relative progress (tuned Adam LR per landscape; nof_mindependency, generalises directly to NN training)r_convergence,r_robustness,r_novelty(gated), minusr_budget, minusr_eval_failures
- GRPO can then train the policy; arena wall-clock is ~50 ms so ~36k episodes/hour on one H100.
See IMPLEMENTATION.md and LANDSCAPEFORGE_DESIGN.md in this repo for the full spec, staged bootstrap (SFT β solo RL β adversarial unfreezing), and the anti-reward-hacking table.
Quick-start (Python client)
from landscapeforge import LandscapeforgeEnv, LandscapeforgeAction
with LandscapeforgeEnv.from_docker_image("landscapeforge-env:latest") as env:
obs = env.reset()
env.step(LandscapeforgeAction(kind="run_baseline", baseline_name="adam"))
env.step(LandscapeforgeAction(kind="draft", code="""
class Optimizer:
def __init__(self, dim):
self.lr = 0.05; self.beta = 0.9
self.v = np.zeros(dim)
def step(self, x, f_val, grad):
self.v = self.beta * self.v - self.lr * grad
return x + self.v
"""))
result = env.step(LandscapeforgeAction(kind="commit"))
print(result.observation.r_optcoder_breakdown)
# {'r_regret': ..., 'r_convergence': ..., 'r_robustness': ...,
# 'r_novelty': ..., 'r_budget': ..., 'r_eval_failures': ...,
# 'my_progress': ..., 'adam_progress': ..., 'speedup_vs_adam': ...}
Quick-start (drive with any OpenAI-compat LLM)
The repo ships run_llm_episode.py that drives one episode against any /v1/chat/completions endpoint (HuggingFace router, Ollama, vLLM, β¦):
# Ollama local
API_BASE_URL=http://localhost:11434/v1 MODEL_NAME=qwen2.5:3b \
python -m landscapeforge.run_llm_episode
# HuggingFace router
HF_TOKEN=hf_xxx MODEL_NAME=Qwen/Qwen2.5-7B-Instruct \
python -m landscapeforge.run_llm_episode
Full turn transcripts (prompt, raw reply, parsed action, env feedback, reward breakdown) are written to episode_logs/*.jsonl + *.md.
What to click first in /web
- Baseline Race tab β pick Rosenbrock β hit "π Race!" to see how default-SGD, default-Momentum, tuned-Adam, and crude-L-BFGS actually perform on the classic stiff valley.
- Optimizer Arena tab β keep the sample SGD+Momentum optimizer, hit "βοΈ Run arena" to see the reward breakdown vs tuned Adam.
- Landscape Explorer tab β browse the 9 template families with contour plots + structural hints.
Repo structure
landscapeforge/
βββ LANDSCAPEFORGE_DESIGN.md # Full design doc (v0.2)
βββ IMPLEMENTATION.md # What's in the code today + constants
βββ models.py # Action + Observation (pydantic)
βββ landscapes.py # 9 analytic template builders with gradients
βββ reference_optimizers.py # SGD / Momentum / Adam / L-BFGS + LR tuner
βββ sandbox.py # AST-strip + restricted exec + timeout
βββ arena.py # Phase-D runner + auto_test_draft
βββ rewards.py # Terminal reward + stepwise feedback
βββ prompts.py # obs β prompt / response β action
βββ run_llm_episode.py # LLM-in-the-loop runner (OpenAI-compat)
βββ server/
β βββ app.py # FastAPI + mounted Gradio at /web
β βββ landscapeforge_environment.py # OpenEnv Environment class
βββ demo/ui.py # Gradio UI source
βββ tests/test_episode.py # Scripted end-to-end tests
βββ episode_logs/ # Per-episode JSONL + Markdown transcripts
Research anchors
LandscapeForge sits at the intersection of five established research threads:
- Thread 1 β LLMs as optimizer designers: Lion (NeurIPS 2023), FunSearch (Nature 2024)
- Thread 2 β Adversarial / co-evolutionary LLM-env: Coevolve, GenEnv (ICLR 2026)
- Thread 3 β Iterative code refinement: Self-Refine
- Thread 4 β GRPO with measurable rewards: HPC GFLOPS reward paper
- Thread 5 β Analytical landscape benchmarks: BBOB/COCO, POET
Every ingredient has prior work; the combination β LLM-generated optimizers + LLM-picked landscapes + iterative REPL + GRPO on Adam-relative progress β is novel.
License
Apache-2.0.