landscapeforge / README.md
mnawfal29's picture
Upload folder using huggingface_hub
9b7f82f verified
metadata
title: LandscapeForge
emoji: πŸ”οΈ
colorFrom: purple
colorTo: blue
sdk: docker
pinned: true
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - optimization
  - llm-agents
  - self-improvement
  - gradio
license: apache-2.0
short_description: LLM agent designs optimizers via a probe-draft-commit REPL.

πŸ”οΈ LandscapeForge

An OpenEnv where an LLM agent designs optimization algorithms through a probe-draft-commit REPL, trained against a Goldilocks-regulated landscape adversary.

Target: OpenEnv Hackathon, April 2026. Theme 4 (Self-Improvement), secondary Theme 1 (Multi-Agent).


What this Space gives you

Two things, in one container:

Path What it is
/web Interactive Gradio demo β€” landscape explorer + baseline race + paste-your-own-optimizer arena. Visual-first, meant to make the env legible to judges.
/reset, /step, /schema, WebSocket OpenEnv FastAPI endpoints β€” wire the env into a TRL / Unsloth GRPO training loop.

Same process, no second container required.


How the env works (in 90 seconds)

OptCoder is the LLM policy. Each episode:

  1. LandscapeForge (an internal template picker in v1) chooses a loss landscape f: ℝⁿ β†’ ℝ at a tier-appropriate difficulty: quadratic / Rosenbrock / Styblinski-Tang / Gaussian-mix / Himmelblau / plateau / cliff.
  2. OptCoder runs a 4-action REPL with a budget of 12 units:
    • run_baseline(name) β€” run SGD / Momentum / Adam / L-BFGS on the hidden landscape and see their trajectory (cost: 2)
    • draft(code) β€” submit a full Optimizer class; the env auto-tests it for 20 steps (cost: 2)
    • inspect(draft_idx, step_range) β€” zoom into a prior draft's per-step (x, f, grad, update_norm, step_size_eff) to diagnose failures (cost: 1)
    • commit β€” evaluate the latest draft on the full Phase-D arena: 10 fresh seeds Γ— 200 steps (cost: 0)
  3. Reward (terminal only; stepwise is feedback-only):
    • r_regret β€” Adam-relative progress (tuned Adam LR per landscape; no f_min dependency, generalises directly to NN training)
    • r_convergence, r_robustness, r_novelty (gated), minus r_budget, minus r_eval_failures
  4. GRPO can then train the policy; arena wall-clock is ~50 ms so ~36k episodes/hour on one H100.

See IMPLEMENTATION.md and LANDSCAPEFORGE_DESIGN.md in this repo for the full spec, staged bootstrap (SFT β†’ solo RL β†’ adversarial unfreezing), and the anti-reward-hacking table.


Quick-start (Python client)

from landscapeforge import LandscapeforgeEnv, LandscapeforgeAction

with LandscapeforgeEnv.from_docker_image("landscapeforge-env:latest") as env:
    obs = env.reset()
    env.step(LandscapeforgeAction(kind="run_baseline", baseline_name="adam"))
    env.step(LandscapeforgeAction(kind="draft", code="""
class Optimizer:
    def __init__(self, dim):
        self.lr = 0.05; self.beta = 0.9
        self.v = np.zeros(dim)
    def step(self, x, f_val, grad):
        self.v = self.beta * self.v - self.lr * grad
        return x + self.v
"""))
    result = env.step(LandscapeforgeAction(kind="commit"))
    print(result.observation.r_optcoder_breakdown)
    # {'r_regret': ..., 'r_convergence': ..., 'r_robustness': ...,
    #  'r_novelty': ..., 'r_budget': ..., 'r_eval_failures': ...,
    #  'my_progress': ..., 'adam_progress': ..., 'speedup_vs_adam': ...}

Quick-start (drive with any OpenAI-compat LLM)

The repo ships run_llm_episode.py that drives one episode against any /v1/chat/completions endpoint (HuggingFace router, Ollama, vLLM, …):

# Ollama local
API_BASE_URL=http://localhost:11434/v1 MODEL_NAME=qwen2.5:3b \
  python -m landscapeforge.run_llm_episode

# HuggingFace router
HF_TOKEN=hf_xxx MODEL_NAME=Qwen/Qwen2.5-7B-Instruct \
  python -m landscapeforge.run_llm_episode

Full turn transcripts (prompt, raw reply, parsed action, env feedback, reward breakdown) are written to episode_logs/*.jsonl + *.md.


What to click first in /web

  1. Baseline Race tab β†’ pick Rosenbrock β†’ hit "🏁 Race!" to see how default-SGD, default-Momentum, tuned-Adam, and crude-L-BFGS actually perform on the classic stiff valley.
  2. Optimizer Arena tab β†’ keep the sample SGD+Momentum optimizer, hit "βš”οΈ Run arena" to see the reward breakdown vs tuned Adam.
  3. Landscape Explorer tab β†’ browse the 9 template families with contour plots + structural hints.

Repo structure

landscapeforge/
β”œβ”€β”€ LANDSCAPEFORGE_DESIGN.md    # Full design doc (v0.2)
β”œβ”€β”€ IMPLEMENTATION.md           # What's in the code today + constants
β”œβ”€β”€ models.py                   # Action + Observation (pydantic)
β”œβ”€β”€ landscapes.py               # 9 analytic template builders with gradients
β”œβ”€β”€ reference_optimizers.py     # SGD / Momentum / Adam / L-BFGS + LR tuner
β”œβ”€β”€ sandbox.py                  # AST-strip + restricted exec + timeout
β”œβ”€β”€ arena.py                    # Phase-D runner + auto_test_draft
β”œβ”€β”€ rewards.py                  # Terminal reward + stepwise feedback
β”œβ”€β”€ prompts.py                  # obs β†’ prompt / response β†’ action
β”œβ”€β”€ run_llm_episode.py          # LLM-in-the-loop runner (OpenAI-compat)
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                  # FastAPI + mounted Gradio at /web
β”‚   └── landscapeforge_environment.py  # OpenEnv Environment class
β”œβ”€β”€ demo/ui.py                  # Gradio UI source
β”œβ”€β”€ tests/test_episode.py       # Scripted end-to-end tests
└── episode_logs/               # Per-episode JSONL + Markdown transcripts

Research anchors

LandscapeForge sits at the intersection of five established research threads:

Every ingredient has prior work; the combination β€” LLM-generated optimizers + LLM-picked landscapes + iterative REPL + GRPO on Adam-relative progress β€” is novel.


License

Apache-2.0.