Spaces:

mnawfal29
/

landscapeforge

Sleeping

App Files Files Community

mnawfal29 commited on 14 days ago

Commit

b0b140b

verified ·

1 Parent(s): 21a64d5

Upload folder using huggingface_hub

Browse files

Files changed (30) hide show

Dockerfile +81 -0
IMPLEMENTATION.md +278 -0
README.md +147 -6
__init__.py +16 -0
arena.py +146 -0
client.py +33 -0
demo/__init__.py +0 -0
demo/ui.py +543 -0
landscapes.py +273 -0
models.py +95 -0
openenv.yaml +7 -0
openenv_landscapeforge.egg-info/PKG-INFO +15 -0
openenv_landscapeforge.egg-info/SOURCES.txt +32 -0
openenv_landscapeforge.egg-info/dependency_links.txt +1 -0
openenv_landscapeforge.egg-info/entry_points.txt +2 -0
openenv_landscapeforge.egg-info/requires.txt +11 -0
openenv_landscapeforge.egg-info/top_level.txt +1 -0
prompts.py +267 -0
pyproject.toml +43 -0
reference_optimizers.py +150 -0
rewards.py +183 -0
run_llm_episode.py +281 -0
sandbox.py +160 -0
server/__init__.py +11 -0
server/app.py +90 -0
server/landscapeforge_environment.py +513 -0
server/requirements.txt +6 -0
tests/__init__.py +0 -0
tests/test_episode.py +150 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=landscapeforge
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

IMPLEMENTATION.md ADDED Viewed

	@@ -0,0 +1,278 @@

+# LandscapeForge — Implementation Notes (v0.1 code)
+Status: **env core working end-to-end in-process**; scripted tests pass.
+Ships what §18 v1 of `LANDSCAPEFORGE_DESIGN.md` specifies as the weekend MVP,
+minus training and demo layers.
+## What's implemented
+| File | Purpose |
+|---|---|
+| `models.py` | Unified `LandscapeforgeAction` (discriminated by `kind`) + `LandscapeforgeObservation` |
+| `landscapes.py` | 9 analytic template builders with hand-written gradients + `TIER_MENU` + `structural_hints()` |
+| `reference_optimizers.py` | SGD / Momentum / Adam / crude L-BFGS + `run_baseline()` |
+| `sandbox.py` | AST strip (keep only `class Optimizer`), safe globals, SIGALRM timeout; `compile_optimizer()` |
+| `arena.py` | `run_arena()` for Phase-D eval + `auto_test_draft()` for draft-time feedback |
+| `rewards.py` | Terminal reward (`compute_optcoder_reward`) + stepwise feedback (`compute_step_reward`) + `ast_novelty_score` |
+| `server/landscapeforge_environment.py` | OpenEnv `Environment` subclass wiring everything together |
+| `server/app.py` | FastAPI wrapper (scaffold, unchanged) |
+| `client.py` | HTTP client over the unified action schema |
+| `tests/test_episode.py` | 3 scripted episodes, all passing |
+## Action space (§7.1)
+Four actions with differentiated budget cost:
+| Action | Cost | What it returns |
+|---|---|---|
+| `run_baseline(name)` | 2 | Fixed 30-step trajectory `(x_t, f_t, \|g_t\|)`; step count is env-controlled for comparability; source NOT revealed |
+| `draft(code)` | 2 | Auto-test summary on 1 seed × 20 steps + compile_error if any |
+| `inspect(draft_idx, step_range)` | 1 | Per-step detail `(x, f, grad, update_norm, step_size_eff)` from the referenced draft |
+| `commit` | 0 | Terminates, triggers Phase-D arena eval |
+Budget total: **12 units**. Hard ceiling of 6 drafts per episode prevents brute-force enumeration.
+**Auto-commit contract:** if the agent never calls `commit`, budget exhaustion auto-triggers the same Phase-D arena evaluation on `state.current_draft` (i.e. the most recent draft the agent submitted). Whether the agent calls `commit` explicitly or hits budget exhaustion, **the most recent draft is always what gets evaluated**. Implemented in `LandscapeforgeEnvironment._finalize_episode` — `current_draft` is evaluated; only "no draft at all" produces worst-case regret. The prompt (`prompts.SYSTEM`) documents this contract to the LLM so it understands it isn't penalized for not committing, but should make sure its *latest* draft is its best one.
+## Reward
+Two distinct signals — only the terminal one is used as the training scalar. Stepwise signals are in-context feedback for the LLM.
+### Terminal reward (the GRPO training scalar)
+Computed once, at commit (or auto-commit on budget exhaustion), after the full Phase-D arena evaluation. Lives in `obs.reward` and `obs.r_optcoder`.
+```
+r_total = 1.0 · r_regret
+        + 0.3 · r_convergence
+        + 0.3 · r_robustness
+        + 0.1 · r_novelty           (gated: 0 unless r_regret > 0.5)
+        − 0.05 · r_budget
+        − 0.5 · r_eval_failures
+```
+Range: roughly **[−1.55, +1.65]** in practice. Weights live in `rewards.py` (`W_*`).
+#### 1. `r_regret` — the main signal (Adam-relative descent, NO `f_min` dependency)
+**Measures:** how much further the committed optimizer descended on `f(x)` than Adam's default on the same landscape, starting from the same init. Purely relative — does not require knowing the absolute minimum.
+**Computed:**
+```
+# Before running the Adam baseline, tune its LR per-landscape via a short
+# sweep over {1e-4, 1e-3, 3e-3, 1e-2, 3e-2, 1e-1, 3e-1} on a dedicated seed
+# (30-step each). This keeps the comparison fair — agent must beat Adam-at-
+# best-LR, not Adam-at-PyTorch-default.
+best_lr       = tune_adam_lr(f, grad, x0=sweep_seed_init, sweep_steps=30)
+my_progress   = mean over 10 arena seeds of (f_initial − f_final)
+adam_progress = same, running Adam(lr=best_lr) on the same seeds
+denom         = max(adam_progress, 0.01 · mean|f_initial| + 1e-6)
+r_regret      = clamp( my_progress / denom − 1 , -1, +1 )
+```
+where `f_initial` and `f_final` are observable per-seed values from `arena.initial_values` and `arena.final_values`. Crashed seeds contribute 0 progress (conservative — "you didn't descend on that seed").
+The denominator floor (~1% of initial f magnitude) protects against near-zero Adam progress exploding the ratio (e.g. on plateau landscapes where Adam barely moves).
+**Range:** [−1, +1]
+- `+1.0`: descended ≥ 2× as far as Adam (clipped ceiling)
+- `0.0`: matched Adam's descent exactly
+- `−1.0`: made zero or negative progress while Adam descended normally (clipped floor)
+**Why this shape:** Adam-relative normalization is scale-invariant — works the same on T0 quadratics (|f| ~ 10) and Rosenbrock (|f| ~ 1000) without hand-tuned per-landscape knobs. And crucially, **it does NOT require knowing `f_min`** — this design extends directly to neural-network training as Phase D (v3), where the global minimum of training loss is unknowable.
+**Bonus fields in the reward breakdown:** `my_progress`, `adam_progress`, and `speedup_vs_adam` (= `my_progress / denom`) are logged alongside `r_regret` for diagnostics and human-readable leaderboards (e.g. "this optimizer is 10× faster than Adam on this landscape").
+#### 2. `r_convergence` — speed bonus
+**Measures:** how quickly the committed optimizer drops `f` below 1% of the initial value on the first arena seed.
+**Computed:**
+```
+r_conv = clamp( 1 − convergence_step / N , 0, 1 )     if converged
+       = 0.0                                           if never reached 1%
+```
+where `N = ARENA_STEPS = 200` and `convergence_step` is the first `t` such that `f(x_t) < 0.01 · f(x_0)` on seed 101 (the first seed in `ARENA_SEEDS`).
+**Range:** [0, 1]
+- `1.0`: converged at step 0 (impossible; asymptotic)
+- `0.5`: converged at step 100
+- `0.0`: never converged within 200 steps
+**Why:** distinguishes fast optimizers from slow ones among those that do converge. Without this, an optimizer that reaches the minimum in 50 steps and one that reaches it in 199 get identical `r_regret`.
+**Only uses seed 101** — one seed's trajectory is enough to proxy speed; averaging across 10 would be more faithful but this is cheaper.
+#### 3. `r_robustness` — cross-seed consistency
+**Measures:** whether the optimizer achieves similar final values across all 10 arena seeds, or whether it's luck-of-the-init sensitive.
+**Computed in `ArenaResult.robustness`:**
+```
+r_robust = clamp( 1 − std(final_values) / |mean(final_values)| , 0, 1 )
+```
+using only seeds that didn't crash. If mean ≈ 0, returns 1.0 when std is tiny else 0.0.
+**Range:** [0, 1]
+- `1.0`: all 10 seeds ended at essentially the same `f` value (tight distribution)
+- `0.0`: huge variance across seeds (works on some inits, fails on others)
+**Why:** anti-"works-only-from-favorable-init" defense (§11). An optimizer that converges cleanly on seed 101 but diverges on seed 505 has low r_robustness even if mean_regret is okay.
+#### 4. `r_novelty` — structural departure from references
+**Measures:** how structurally different the committed source is from the standard optimizers SGD/Adam/Momentum.
+**Computed via `ast_novelty_score`:**
+```
+r_novelty = min over ref in {sgd, adam, momentum} of
+              1 − difflib.SequenceMatcher(committed, ref).ratio()
+          clamped to [0, 1]
+```
+Uses character-level diff ratio (difflib). `0.0` = byte-identical to one of the references. `1.0` = totally different strings. For reference, a tweaked-Adam with different hparams scores ~0.3; a genuinely different algorithm (line-search + trust region) scores ~0.7.
+**Range:** [0, 1]
+**Gate:** `r_novelty` is **only applied when `r_regret > 0.5`**. This prevents rewarding "novel AND broken" — you must beat Adam by a clear margin before creativity earns anything.
+**Why:** prevents the model from just copying Adam after running `run_baseline("adam")`. Without this gate or term, the reward-maximizing strategy is "memorize Adam."
+#### 5. `r_budget` — penalty for over-spending budget
+**Measures:** what fraction of the action budget was used.
+**Computed:**
+```
+r_budget = clamp( budget_spent / 12 , 0, 1 )
+```
+where `budget_spent` is the sum of per-action costs (baseline=2, draft=2, inspect=1, commit=0), NOT the count of actions.
+**Range:** [0, 1]
+- `0.0`: no budget used (impossible since at least one draft is needed)
+- `1.0`: full budget consumed
+**Why:** mild pressure toward efficiency. With `W_BUDGET = 0.05`, the swing between "committed at budget 4" and "exhausted at 12" is only 0.4 × 0.05 = 0.02 reward — deliberately small so it doesn't override algorithmic quality but is positive enough to discourage deliberate stalling.
+#### 6. `r_eval_failures` — crash penalty
+**Measures:** fraction of arena seeds where the committed optimizer raised a `SandboxError` (NaN output, wrong shape, timeout, Python error).
+**Computed:**
+```
+r_eval_failures = sum(arena.crashed) / 10
+```
+**Range:** [0, 1]
+- `0.0`: all 10 seeds ran to completion
+- `1.0`: committed code crashes on every seed
+**Why:** heavily weighted at `W_EVAL_FAIL = 0.5` so a uniformly-crashing commit scores at least −0.5 regardless of any other component. Prevents "commit broken code to avoid bad eval" gaming (§11).
+### Concrete example
+From the scripted test (quadratic dim=5, cond=44.4, SGD+momentum with lr=0.05 committed, progress-based reward, LR-tuned Adam baseline):
+| Component | Value | Weighted contribution |
+|---|---|---|
+| `r_regret` | ~0 (my_progress ≈ adam_progress, both ≈10.51) | 0.000 |
+| `r_convergence` | +0.835 | +0.251 |
+| `r_robustness` | +0.5897 | +0.177 |
+| `r_novelty` | 0 (gated; r_regret < 0.5) | 0.000 |
+| `r_budget` | 0.583 (7/12 used) | −0.029 |
+| `r_eval_failures` | 0.0 (no crashes) | 0.0 |
+| **`r_total`** | — | **+0.398** |
+Adam's tuned LR for this landscape came out to `0.03` — when LR is tuned, the committed SGD+momentum (lr=0.05) is essentially **tied** with Adam, and the reward correctly reflects that. Under the previous unfair baseline (Adam default lr=1e-3), the same draft would have scored 1.42. The ~1.0 reward swing reflects how much of the old "win" was just LR-tuning, not algorithmic merit.
+### Stepwise feedback (NOT training reward)
+Computed in `compute_step_reward`. Surfaced to the LLM via `obs.last_action_result["feedback"]` after each non-terminal turn. Explicitly NOT summed into `r_total`.
+- `phi_delta`: change in `−best_auto_test_final_f / 10` across this turn. Positive means the newest draft improved the best auto-test result. The LLM sees this and knows "I just made progress."
+- `compile_penalty`: literal `-0.1` marker emitted whenever the latest draft failed to compile. Purely a flag for the LLM's context.
+These are communication channels, not reward. Keeping them out of the training scalar preserves the terminal-only robustness property while still giving the LLM something to react to mid-episode.
+## Constants
+| Constant | Value | Source |
+|---|---|---|
+| `BUDGET_TOTAL` | 12 | `server/landscapeforge_environment.py` |
+| `ACTION_COSTS` | baseline=2, draft=2, inspect=1, commit=0 | `models.py` |
+| `ARENA_SEEDS` | `[101, 202, ..., 1010]` (10 fresh seeds) | `server/landscapeforge_environment.py` |
+| `ARENA_STEPS` | 200 | same |
+| `BASELINE_STEPS` | 30 | same (fixed; agent cannot override) |
+| Adam LR sweep grid | `{1e-4, 1e-3, 3e-3, 1e-2, 3e-2, 1e-1, 3e-1}` | `reference_optimizers.tune_adam_lr` |
+| Adam LR sweep steps | 30 | same |
+| Adam LR sweep init seed | 0 (not in ARENA_SEEDS) | `_ensure_adam_arena` in env |
+| Draft auto-test init seed | 0 | `arena.auto_test_draft` |
+| Draft auto-test steps | 20 | same |
+| Init scale (seed-sampled `x0`) | `N(0, 0.5² I)` | `arena.run_arena` + `auto_test_draft` |
+| Dim range per episode | 2–5 (v1) | `server/landscapeforge_environment.py` |
+| Sandbox init timeout | 1.0 s | `sandbox.compile_optimizer` |
+| Sandbox step timeout | 0.5 s | same |
+| Reward weights | w_regret=1.0, w_conv=0.3, w_robust=0.3, w_novelty=0.1, w_budget=0.05, w_evalfail=0.5 | `rewards.py` |
+| Novelty gate | Applied only when `r_regret > 0.5` | `rewards.NOVELTY_GATE` |
+| `PHI_SCALE` (potential normalizer) | 10.0 | `rewards.py` |
+| `COMPILE_PENALTY_SIGNAL` | -0.1 | `rewards.py` |
+| Tier menus | T0: quadratic/styblinski_tang/huber · T1: +gaussian_mix/himmelblau · T2: +rosenbrock/stiff_quadratic/plateau/cliff | `landscapes.TIER_MENU` |
+| Quadratic cond-number cap per tier | T0: 100, T1: 1000, T2: 10000 | `_sample_params` in env |
+## Assumptions / simplifications (v1)
+1. **LandscapeForge is a template picker**, not a free-form code author. The env internally samples (template, params) uniformly from the active tier's menu — no LandscapeForge LLM adapter in v1. Defers §18 v2 non-differentiability and gradient-source risks.
+2. **All gradients are analytic**, hand-written per template. No autodiff/JAX, no finite differences. Templates are verified differentiable by construction.
+2b. **Reward does NOT depend on `f_min`.** v0.2 switched from `r_regret = 1 − (my_regret / adam_regret)` (which required knowing the global minimum) to a progress-based form: `r_regret = clamp(my_progress / adam_progress − 1, −1, +1)` where `progress = f_initial − f_final` is observable per seed. `Landscape.f_min` is retained only for diagnostics, NOT used in training. This unlocks v3 NN extension (training loss has no knowable minimum).
+3. **Only OptCoder has a policy.** The OpenEnv `Environment` here exposes the OptCoder side; LandscapeForge selection is internal.
+4. **Single backbone assumption** (Qwen2.5-3B base + OptCoder LoRA) is in the design but not in code; training script is not yet implemented.
+5. **Sandbox is in-process + SIGALRM timeout.** Works on main thread / CPython / POSIX. Known bug: HTTP `/step` via uvicorn returns 500 because SIGALRM only fires on the main thread; thread-based timeout fix is TODO.
+6. **AST strip drops all module-level code except `class Optimizer`.** Imports are also dropped — the sandbox pre-injects `np` and `numpy` into globals, so submitted code can use `np.*` without an import line.
+7. **Dim range 2–5** for v1 even though the design allows up to 100. Keeps arena eval fast (~30 ms/episode) and keeps the prompt token budget tight.
+8. **Adam baseline for reward normalization is run inside the env** on every commit to compute `baseline_adam_regret`. Cost: one 200-step × 10-seed arena run per episode on top of the OptCoder eval. ~30 ms, acceptable.
+9. **AST novelty score uses difflib** (character-level Levenshtein-ish) rather than true AST diff. Enough to detect "commit ≈ reference" but not semantically rigorous. Upgrade path noted.
+10. **Tier advancement is not auto-wired.** `env.advance_tier(new_tier)` exists as a manual API; rolling-regret-based auto-advance is a trainer-side concern and not yet implemented.
+## How to run
+```bash
+cd landscapeforge
+uv sync                                # installs deps
+uv run python tests/test_episode.py    # 3 scripted episodes
+```
+Expected output: three `✓ PASSED` lines, final line `All tests passed.`
+In-process usage (no server needed):
+```python
+from landscapeforge.models import LandscapeforgeAction
+from landscapeforge.server.landscapeforge_environment import LandscapeforgeEnvironment
+env = LandscapeforgeEnvironment(tier="T0", seed=42)
+obs = env.reset()
+obs = env.step(LandscapeforgeAction(kind="run_baseline", baseline_name="adam"))
+obs = env.step(LandscapeforgeAction(kind="draft", code="...Optimizer class..."))
+obs = env.step(LandscapeforgeAction(kind="commit"))
+print(obs.reward, obs.r_optcoder_breakdown)
+```
+HTTP server starts with `uv run uvicorn landscapeforge.server.app:app`. `/reset` and `/schema` work; `/step` currently returns 500 (see assumption 5).
+## Known gaps (tracked for next passes)
+- SFT warm-start corpus: ~200 hand-authored `run_baseline → draft → inspect → draft → commit` traces (§15 Stage 0)
+- GRPO training script using TRL + HF transformers
+- Prompt renderer: format `obs` into the LLM prompt template from Appendix A
+- Curriculum auto-advancement (rolling-mean-regret watchdog on top of `env.advance_tier`)
+- Gradio demo Space with contour + trajectory animation
+- Thread-based sandbox timeout to unblock HTTP `/step`
+- True AST-diff-based novelty (replace difflib)
+- Docker image + HF Spaces push
+## Non-goals (v1)
+- Free-form LandscapeForge code authoring (deferred to v2 per §18)
+- Non-differentiable landscape defense (moot while LandscapeForge is template-picker)
+- Multi-turn LandscapeForge-vs-OptCoder within a single episode (sequential only)
+- Neural-net-as-landscape Phase-D (v3)

README.md CHANGED Viewed

@@ -1,10 +1,151 @@
 ---
-title: Landscapeforge
-emoji: 👁
-colorFrom: indigo
-colorTo: indigo
 sdk: docker
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: LandscapeForge
+emoji: 🏔️
+colorFrom: purple
+colorTo: blue
 sdk: docker
+pinned: true
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+  - reinforcement-learning
+  - optimization
+  - llm-agents
+  - self-improvement
+  - gradio
+license: apache-2.0
+short_description: LLM agent designs optimizers via a probe-draft-commit REPL.
 ---
+# 🏔️ LandscapeForge
+**An OpenEnv where an LLM agent designs optimization algorithms through a probe-draft-commit REPL, trained against a Goldilocks-regulated landscape adversary.**
+Target: **OpenEnv Hackathon, April 2026**. Theme 4 (Self-Improvement), secondary Theme 1 (Multi-Agent).
+---
+## What this Space gives you
+Two things, in one container:
+| Path | What it is |
+|---|---|
+| **`/web`** | **Interactive Gradio demo** — landscape explorer + baseline race + paste-your-own-optimizer arena. Visual-first, meant to make the env legible to judges. |
+| **`/reset`, `/step`, `/schema`, WebSocket** | **OpenEnv FastAPI endpoints** — wire the env into a TRL / Unsloth GRPO training loop. |
+Same process, no second container required.
+---
+## How the env works (in 90 seconds)
+**OptCoder** is the LLM policy. Each episode:
+1. **LandscapeForge** (an internal template picker in v1) chooses a loss landscape `f: ℝⁿ → ℝ` at a tier-appropriate difficulty: quadratic / Rosenbrock / Styblinski-Tang / Gaussian-mix / Himmelblau / plateau / cliff.
+2. **OptCoder runs a 4-action REPL** with a budget of 12 units:
+   - `run_baseline(name)` — run SGD / Momentum / Adam / L-BFGS on the hidden landscape and see their trajectory (cost: 2)
+   - `draft(code)` — submit a full `Optimizer` class; the env auto-tests it for 20 steps (cost: 2)
+   - `inspect(draft_idx, step_range)` — zoom into a prior draft's per-step `(x, f, grad, update_norm, step_size_eff)` to diagnose failures (cost: 1)
+   - `commit` — evaluate the latest draft on the full **Phase-D arena**: 10 fresh seeds × 200 steps (cost: 0)
+3. **Reward** (terminal only; stepwise is feedback-only):
+   - `r_regret` — **Adam-relative progress** (tuned Adam LR per landscape; no `f_min` dependency, generalises directly to NN training)
+   - `r_convergence`, `r_robustness`, `r_novelty` (gated), minus `r_budget`, minus `r_eval_failures`
+4. **GRPO** can then train the policy; arena wall-clock is ~50 ms so ~36k episodes/hour on one H100.
+See `IMPLEMENTATION.md` and `LANDSCAPEFORGE_DESIGN.md` in this repo for the full spec, staged bootstrap (SFT → solo RL → adversarial unfreezing), and the anti-reward-hacking table.
+---
+## Quick-start (Python client)
+```python
+from landscapeforge import LandscapeforgeEnv, LandscapeforgeAction
+with LandscapeforgeEnv.from_docker_image("landscapeforge-env:latest") as env:
+    obs = env.reset()
+    env.step(LandscapeforgeAction(kind="run_baseline", baseline_name="adam"))
+    env.step(LandscapeforgeAction(kind="draft", code="""
+class Optimizer:
+    def __init__(self, dim):
+        self.lr = 0.05; self.beta = 0.9
+        self.v = np.zeros(dim)
+    def step(self, x, f_val, grad):
+        self.v = self.beta * self.v - self.lr * grad
+        return x + self.v
+"""))
+    result = env.step(LandscapeforgeAction(kind="commit"))
+    print(result.observation.r_optcoder_breakdown)
+    # {'r_regret': ..., 'r_convergence': ..., 'r_robustness': ...,
+    #  'r_novelty': ..., 'r_budget': ..., 'r_eval_failures': ...,
+    #  'my_progress': ..., 'adam_progress': ..., 'speedup_vs_adam': ...}
+```
+## Quick-start (drive with any OpenAI-compat LLM)
+The repo ships `run_llm_episode.py` that drives one episode against any `/v1/chat/completions` endpoint (HuggingFace router, Ollama, vLLM, …):
+```bash
+# Ollama local
+API_BASE_URL=http://localhost:11434/v1 MODEL_NAME=qwen2.5:3b \
+  python -m landscapeforge.run_llm_episode
+# HuggingFace router
+HF_TOKEN=hf_xxx MODEL_NAME=Qwen/Qwen2.5-7B-Instruct \
+  python -m landscapeforge.run_llm_episode
+```
+Full turn transcripts (prompt, raw reply, parsed action, env feedback, reward breakdown) are written to `episode_logs/*.jsonl` + `*.md`.
+---
+## What to click first in `/web`
+1. **Baseline Race** tab → pick Rosenbrock → hit "🏁 Race!" to see how default-SGD, default-Momentum, **tuned-Adam**, and crude-L-BFGS actually perform on the classic stiff valley.
+2. **Optimizer Arena** tab → keep the sample SGD+Momentum optimizer, hit "⚔️ Run arena" to see the reward breakdown vs tuned Adam.
+3. **Landscape Explorer** tab → browse the 9 template families with contour plots + structural hints.
+---
+## Repo structure
+```
+landscapeforge/
+├── LANDSCAPEFORGE_DESIGN.md    # Full design doc (v0.2)
+├── IMPLEMENTATION.md           # What's in the code today + constants
+├── models.py                   # Action + Observation (pydantic)
+├── landscapes.py               # 9 analytic template builders with gradients
+├── reference_optimizers.py     # SGD / Momentum / Adam / L-BFGS + LR tuner
+├── sandbox.py                  # AST-strip + restricted exec + timeout
+├── arena.py                    # Phase-D runner + auto_test_draft
+├── rewards.py                  # Terminal reward + stepwise feedback
+├── prompts.py                  # obs → prompt / response → action
+├── run_llm_episode.py          # LLM-in-the-loop runner (OpenAI-compat)
+├── server/
+│   ├── app.py                  # FastAPI + mounted Gradio at /web
+│   └── landscapeforge_environment.py  # OpenEnv Environment class
+├── demo/ui.py                  # Gradio UI source
+├── tests/test_episode.py       # Scripted end-to-end tests
+└── episode_logs/               # Per-episode JSONL + Markdown transcripts
+```
+---
+## Research anchors
+LandscapeForge sits at the intersection of five established research threads:
+- **Thread 1** — LLMs as optimizer designers: [Lion (NeurIPS 2023)](https://arxiv.org/abs/2302.06675), [FunSearch (Nature 2024)](https://www.nature.com/articles/s41586-023-06924-6)
+- **Thread 2** — Adversarial / co-evolutionary LLM-env: Coevolve, [GenEnv (ICLR 2026)](https://arxiv.org/html/2512.19682v1)
+- **Thread 3** — Iterative code refinement: [Self-Refine](https://arxiv.org/abs/2303.17651)
+- **Thread 4** — GRPO with measurable rewards: [HPC GFLOPS reward paper](https://arxiv.org/abs/2602.12049v1)
+- **Thread 5** — Analytical landscape benchmarks: [BBOB/COCO](https://inria.hal.science/hal-00362649/document), [POET](https://arxiv.org/abs/1901.01753)
+Every ingredient has prior work; the combination — LLM-generated optimizers + LLM-picked landscapes + iterative REPL + GRPO on Adam-relative progress — is novel.
+---
+## License
+Apache-2.0.

__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Landscapeforge Environment."""
+from .client import LandscapeforgeEnv
+from .models import LandscapeforgeAction, LandscapeforgeObservation
+__all__ = [
+    "LandscapeforgeAction",
+    "LandscapeforgeObservation",
+    "LandscapeforgeEnv",
+]

arena.py ADDED Viewed

	@@ -0,0 +1,146 @@

+"""Phase-D runner: run a compiled optimizer for N steps from K fresh seeds.
+Computes per-run final regret and aggregate stats used by the reward module.
+Also handles auto-test during draft actions (single fixed seed, fewer steps).
+"""
+from dataclasses import dataclass
+import numpy as np
+from .landscapes import Landscape
+from .sandbox import CompiledOptimizer, SandboxError
+@dataclass
+class ArenaResult:
+    initial_values: list[float]   # per-seed f(x_0)
+    final_values: list[float]     # per-seed f(x_N); NaN if crashed
+    crashed: list[bool]           # per-seed
+    trajectories: list[list[dict]]  # per-seed trajectories (may be empty)
+    @property
+    def mean_progress(self) -> float:
+        """Mean descent: f_initial - f_final, averaged across non-crashed seeds.
+        Positive = optimizer descended; 0 = stayed put; negative = went uphill.
+        Crashed seeds count as 0 progress (conservative).
+        """
+        prog: list[float] = []
+        for init, fin, crashed in zip(self.initial_values, self.final_values,
+                                       self.crashed):
+            if crashed or not np.isfinite(fin):
+                prog.append(0.0)
+            else:
+                prog.append(init - fin)
+        return float(np.mean(prog)) if prog else 0.0
+    @property
+    def mean_initial_scale(self) -> float:
+        """|mean initial f|; used to establish a denominator floor when Adam
+        itself makes near-zero progress (rare but possible on plateaus)."""
+        vals = [abs(v) for v in self.initial_values if np.isfinite(v)]
+        return float(np.mean(vals)) if vals else 1.0
+    @property
+    def crash_fraction(self) -> float:
+        return float(np.mean(self.crashed)) if self.crashed else 0.0
+    @property
+    def robustness(self) -> float:
+        """1 - std/|mean|, clamped to [0, 1]. High = consistent across seeds."""
+        vals = [v for v in self.final_values if np.isfinite(v)]
+        if len(vals) < 2:
+            return 0.0
+        m = np.mean(vals)
+        s = np.std(vals)
+        if abs(m) < 1e-9:
+            return 1.0 if s < 1e-6 else 0.0
+        return float(np.clip(1.0 - s / (abs(m) + 1e-9), 0.0, 1.0))
+def run_arena(optimizer: CompiledOptimizer, ls: Landscape,
+              seeds: list[int], steps: int = 200,
+              init_scale: float = 0.5) -> ArenaResult:
+    """Run the compiled optimizer from fresh seeds; capture per-run metrics.
+    Does NOT depend on `ls.f_min` — per-seed progress is `f_initial - f_final`,
+    which is observable regardless of whether the global minimum is known.
+    """
+    initials, finals, crashed, trajs = [], [], [], []
+    for seed in seeds:
+        rng = np.random.default_rng(seed)
+        x = rng.normal(0.0, init_scale, size=ls.dim)
+        f0 = float(ls.f(x))
+        initials.append(f0)
+        traj: list[dict] = []
+        did_crash = False
+        try:
+            for t in range(steps):
+                fv = float(ls.f(x))
+                g = np.asarray(ls.grad(x), dtype=float)
+                traj.append({"t": t, "x": x.tolist(), "f": fv})
+                x = optimizer.step(x, fv, g)
+        except SandboxError:
+            did_crash = True
+        if did_crash:
+            finals.append(float("nan"))
+        else:
+            finals.append(float(ls.f(x)))
+        crashed.append(did_crash)
+        trajs.append(traj)
+    return ArenaResult(
+        initial_values=initials,
+        final_values=finals,
+        crashed=crashed,
+        trajectories=trajs,
+    )
+def auto_test_draft(optimizer: CompiledOptimizer, ls: Landscape,
+                    seed: int = 0, steps: int = 20, init_scale: float = 0.5) -> dict:
+    """Single-seed quick test used at draft() time.
+    Returns a lightweight summary (not the full trajectory) plus a detailed
+    per-step record that `inspect` can later dig into.
+    """
+    rng = np.random.default_rng(seed)
+    x = rng.normal(0.0, init_scale, size=ls.dim)
+    x0 = x.copy()
+    detail: list[dict] = []
+    diverged = False
+    err: str | None = None
+    try:
+        for t in range(steps):
+            fv = float(ls.f(x))
+            g = np.asarray(ls.grad(x), dtype=float)
+            gn = float(np.linalg.norm(g))
+            prev_x = x.copy()
+            x = optimizer.step(x, fv, g)
+            update_norm = float(np.linalg.norm(x - prev_x))
+            step_size = update_norm / (gn + 1e-12)
+            detail.append({
+                "t": t, "x": x.tolist(), "f": float(ls.f(x)),
+                "grad_norm": gn, "update_norm": update_norm, "step_size_eff": step_size,
+            })
+    except SandboxError as e:
+        diverged = True
+        err = str(e)
+    if diverged or not detail:
+        summary = {
+            "converged": False, "diverged": True, "error": err,
+            "final_f": None, "initial_f": float(ls.f(x0)),
+            "step_of_min": None, "min_f": None,
+        }
+    else:
+        fs = [d["f"] for d in detail]
+        step_of_min = int(np.argmin(fs))
+        summary = {
+            "converged": bool(fs[-1] < 0.1 * ls.f(x0)),
+            "diverged": False, "error": None,
+            "final_f": fs[-1], "initial_f": float(ls.f(x0)),
+            "step_of_min": step_of_min, "min_f": min(fs),
+        }
+    return {"summary": summary, "detail": detail}

client.py ADDED Viewed

	@@ -0,0 +1,33 @@

+"""LandscapeForge Environment Client."""
+from typing import Any, Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import LandscapeforgeAction, LandscapeforgeObservation
+class LandscapeforgeEnv(
+    EnvClient[LandscapeforgeAction, LandscapeforgeObservation, State]
+):
+    """Client for the LandscapeForge OptCoder environment."""
+    def _step_payload(self, action: LandscapeforgeAction) -> Dict:
+        return action.model_dump(exclude_none=False)
+    def _parse_result(self, payload: Dict) -> StepResult[LandscapeforgeObservation]:
+        obs_data = payload.get("observation", {}) or {}
+        observation = LandscapeforgeObservation(**obs_data)
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

demo/__init__.py ADDED Viewed

File without changes

demo/ui.py ADDED Viewed

	@@ -0,0 +1,543 @@

+"""Gradio demo for LandscapeForge.
+Three tabs:
+  1. Landscape Explorer — pick a template, see 2D contour + structural hints.
+  2. Baseline Race      — run all 4 reference optimizers on one landscape,
+                          see the trajectories racing to the minimum.
+  3. Optimizer Arena    — paste a custom Optimizer class, run it through
+                          the full arena, see reward + trajectory vs Adam.
+The Gradio app is exposed at `/web` of the OpenEnv FastAPI server when
+ENABLE_WEB_INTERFACE=true and the env's app.py wires this module in via
+create_app(..., gradio_builder=build_ui).
+"""
+from __future__ import annotations
+import io
+from typing import Any
+import gradio as gr
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+matplotlib.use("Agg")   # headless backend for Spaces
+from ..arena import auto_test_draft, run_arena
+from ..landscapes import BUILDERS, build_landscape, structural_hints
+from ..reference_optimizers import run_baseline, tune_adam_lr
+from ..rewards import ast_novelty_score, compute_optcoder_reward
+from ..sandbox import SandboxError, compile_optimizer
+# ---------- plotting helpers ----------
+TEMPLATES_2D_SAFE = ["quadratic", "rosenbrock", "styblinski_tang", "huber",
+                     "gaussian_mix", "himmelblau", "plateau", "cliff"]
+BASELINE_COLORS = {
+    "sgd":      "#ef4444",   # red
+    "momentum": "#f59e0b",   # amber
+    "adam":     "#10b981",   # green
+    "lbfgs":    "#3b82f6",   # blue
+    "custom":   "#a855f7",   # purple (for user drafts)
+}
+def _contour_plot(ls, trajectories: dict[str, list[tuple[float, float]]] | None = None,
+                  title: str | None = None):
+    """Render a 2D contour with optional trajectories overlaid.
+    trajectories: {name: [(x0, x1), (x0, x1), ...]} — positions per step.
+    """
+    assert ls.dim == 2, "contour plot requires dim=2"
+    fig, ax = plt.subplots(figsize=(6.5, 5.5))
+    # Auto-range the plot around the trajectories + origin
+    xs_all: list[float] = [0.0]
+    ys_all: list[float] = [0.0]
+    for traj in (trajectories or {}).values():
+        for pt in traj:
+            xs_all.append(pt[0]); ys_all.append(pt[1])
+    x_min = min(xs_all) - 1.5; x_max = max(xs_all) + 1.5
+    y_min = min(ys_all) - 1.5; y_max = max(ys_all) + 1.5
+    x_min = min(x_min, -3.5); x_max = max(x_max, 3.5)
+    y_min = min(y_min, -3.5); y_max = max(y_max, 3.5)
+    # Eval f on a grid
+    g = 60
+    xs = np.linspace(x_min, x_max, g)
+    ys = np.linspace(y_min, y_max, g)
+    X, Y = np.meshgrid(xs, ys)
+    Z = np.empty_like(X)
+    for i in range(g):
+        for j in range(g):
+            Z[i, j] = ls.f(np.array([X[i, j], Y[i, j]]))
+    # Robust contour levels (percentile-based, avoids long-tail blowing out)
+    finite = Z[np.isfinite(Z)]
+    lo = np.percentile(finite, 2)
+    hi = np.percentile(finite, 95)
+    levels = np.linspace(lo, hi, 25)
+    cs = ax.contourf(X, Y, Z, levels=levels, cmap="viridis", alpha=0.9)
+    ax.contour(X, Y, Z, levels=levels[::4], colors="white",
+               alpha=0.3, linewidths=0.5)
+    fig.colorbar(cs, ax=ax, shrink=0.85, label="f(x)")
+    # Overlay trajectories
+    if trajectories:
+        for name, traj in trajectories.items():
+            if not traj:
+                continue
+            color = BASELINE_COLORS.get(name, "#ffffff")
+            arr = np.array(traj)
+            ax.plot(arr[:, 0], arr[:, 1], color=color, linewidth=2.0,
+                    alpha=0.95, label=name, zorder=5)
+            ax.scatter(arr[0:1, 0], arr[0:1, 1], color=color,
+                       marker="o", s=55, edgecolors="white", linewidths=1.2,
+                       zorder=6)
+            ax.scatter(arr[-1:, 0], arr[-1:, 1], color=color,
+                       marker="*", s=150, edgecolors="white", linewidths=1.2,
+                       zorder=7)
+        ax.legend(loc="upper left", framealpha=0.9)
+    ax.set_xlabel("x₁"); ax.set_ylabel("x₂")
+    ax.set_title(title or f"{ls.name} (dim=2)")
+    fig.tight_layout()
+    return fig
+def _loss_curves_plot(traj_map: dict[str, list[float]], title: str):
+    """f-vs-step line plot for each optimizer."""
+    fig, ax = plt.subplots(figsize=(7, 4.5))
+    for name, fs in traj_map.items():
+        if not fs:
+            continue
+        color = BASELINE_COLORS.get(name, "#ffffff")
+        ax.plot(range(len(fs)), fs, color=color, linewidth=2.0,
+                alpha=0.9, label=name)
+    ax.set_yscale("symlog", linthresh=1.0)
+    ax.set_xlabel("step"); ax.set_ylabel("f(x) (symlog)")
+    ax.set_title(title)
+    ax.grid(alpha=0.3)
+    ax.legend(loc="upper right", framealpha=0.9)
+    fig.tight_layout()
+    return fig
+def _bar_plot(values: dict[str, float], title: str, ylabel: str,
+              invert: bool = False):
+    """Simple bar chart for final-f comparison."""
+    fig, ax = plt.subplots(figsize=(6, 3.2))
+    names = list(values.keys())
+    vs = [values[n] for n in names]
+    colors = [BASELINE_COLORS.get(n, "#9ca3af") for n in names]
+    bars = ax.bar(names, vs, color=colors)
+    for bar, v in zip(bars, vs):
+        ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(),
+                f"{v:.3g}", ha="center", va="bottom", fontsize=9)
+    ax.set_ylabel(ylabel)
+    ax.set_title(title)
+    if invert:
+        ax.invert_yaxis()
+    ax.grid(alpha=0.3, axis="y")
+    fig.tight_layout()
+    return fig
+# ---------- tab 1: Landscape Explorer ----------
+def _explore_landscape(template: str, dim: int, seed: int):
+    """Build the landscape, return a contour (if 2D) + structural hints table."""
+    rng = np.random.default_rng(seed)
+    # For templates that need random centers (gaussian_mix), pass rng.
+    params: dict[str, Any] = {}
+    if template == "quadratic":
+        params = {"cond": 10.0}
+    if template == "gaussian_mix":
+        params = {"k": 3, "sigma": 0.5, "spread": 2.0}
+    if template == "himmelblau":
+        dim = 2
+    ls = build_landscape(template=template, dim=dim, params=params, rng=rng)
+    hints = structural_hints(ls, rng=rng)
+    # 2D contour when possible; otherwise a "not 2D" placeholder
+    if ls.dim == 2:
+        fig = _contour_plot(ls, title=f"{template} (dim=2)")
+    else:
+        fig, ax = plt.subplots(figsize=(6.5, 5.5))
+        ax.text(0.5, 0.5, f"{template} · dim={ls.dim}\n\nContour view is only rendered\nfor 2D landscapes.",
+                ha="center", va="center", fontsize=12,
+                color="#6b7280", transform=ax.transAxes)
+        ax.set_axis_off()
+        fig.tight_layout()
+    hints_rows = [[k, f"{v}" if not isinstance(v, float) else f"{v:.4g}"]
+                  for k, v in hints.items()]
+    hints_rows.append(["dim", ls.dim])
+    hints_rows.append(["f_min (known)", f"{ls.f_min:.4g}"])
+    hints_rows.append(["description", ls.description])
+    return fig, hints_rows
+# ---------- tab 2: Baseline Race ----------
+def _baseline_race(template: str, seed: int):
+    """Run all 4 baselines + Adam-tuned from the same init; return contour + curves."""
+    rng = np.random.default_rng(seed)
+    params: dict[str, Any] = {}
+    dim = 2
+    if template == "quadratic":
+        params = {"cond": 10.0}
+    if template == "gaussian_mix":
+        params = {"k": 3, "sigma": 0.5, "spread": 2.0}
+    ls = build_landscape(template=template, dim=dim, params=params, rng=rng)
+    init_rng = np.random.default_rng(seed + 999)
+    x0 = init_rng.normal(0.0, 0.5, size=dim)
+    # Tune Adam LR per-landscape for fair comparison.
+    best_lr = tune_adam_lr(ls.f, ls.grad, x0, sweep_steps=30)
+    # Run each baseline; collect 2D positions for the contour + f-curves for the plot.
+    traj_2d: dict[str, list[tuple[float, float]]] = {}
+    curves: dict[str, list[float]] = {}
+    finals: dict[str, float] = {}
+    for name in ["sgd", "momentum", "adam", "lbfgs"]:
+        r = run_baseline(name, ls.f, ls.grad, x0, steps=50)
+        traj = [s for s in r["trajectory"] if s.get("x") is not None]
+        traj_2d[name] = [(s["x"][0], s["x"][1]) for s in traj]
+        curves[name] = [s["f"] for s in traj if s.get("f") is not None]
+        finals[name] = curves[name][-1] if curves[name] else float("inf")
+    # Also run Adam with the tuned LR to show it dominates default-Adam.
+    adam_tuned_src = (
+        "class Optimizer:\n"
+        "    def __init__(self, dim):\n"
+        f"        self.lr = {best_lr}\n"
+        "        self.b1 = 0.9; self.b2 = 0.999; self.eps = 1e-8\n"
+        "        self.m = np.zeros(dim); self.v = np.zeros(dim); self.t = 0\n"
+        "    def step(self, x, f_val, g):\n"
+        "        self.t += 1\n"
+        "        self.m = self.b1*self.m + (1-self.b1)*g\n"
+        "        self.v = self.b2*self.v + (1-self.b2)*g*g\n"
+        "        mh = self.m/(1-self.b1**self.t); vh = self.v/(1-self.b2**self.t)\n"
+        "        return x - self.lr * mh / (np.sqrt(vh) + self.eps)\n"
+    )
+    try:
+        tuned_opt = compile_optimizer(adam_tuned_src, dim=dim)
+        xt = x0.copy()
+        traj_tuned = [(xt[0], xt[1])]
+        curve_tuned = [float(ls.f(xt))]
+        for _ in range(50):
+            g = np.asarray(ls.grad(xt), dtype=float)
+            xt = tuned_opt.step(xt, float(ls.f(xt)), g)
+            traj_tuned.append((xt[0], xt[1]))
+            curve_tuned.append(float(ls.f(xt)))
+        label = f"adam(tuned lr={best_lr:g})"
+        traj_2d[label] = traj_tuned
+        curves[label] = curve_tuned
+        finals[label] = curve_tuned[-1]
+    except SandboxError:
+        pass
+    contour = _contour_plot(ls, trajectories=traj_2d,
+                             title=f"{template} — baselines racing")
+    curves_plot = _loss_curves_plot(curves, f"f(x) vs step (log)")
+    finals_plot = _bar_plot(finals,
+                             title="Final f after 50 steps (lower = better)",
+                             ylabel="f", invert=False)
+    summary = (
+        f"**Landscape:** {ls.description}\n\n"
+        f"**Tuned Adam LR:** `{best_lr:g}` (swept over a 7-point grid per-landscape)\n\n"
+        f"**Best baseline** on this run: `{min(finals, key=finals.get)}` "
+        f"with final f = `{min(finals.values()):.4f}`"
+    )
+    return contour, curves_plot, finals_plot, summary
+# ---------- tab 3: Optimizer Arena ----------
+SAMPLE_OPTIMIZER_CODE = """
+class Optimizer:
+    def __init__(self, dim):
+        self.lr = 0.05
+        self.beta = 0.9
+        self.v = np.zeros(dim)
+    def step(self, x, f_val, grad):
+        # SGD with Nesterov-ish momentum
+        self.v = self.beta * self.v - self.lr * grad
+        return x + self.v
+""".strip()
+def _arena_compare(template: str, dim: int, seed: int, code: str):
+    """Run user-submitted optimizer + tuned-Adam on the full arena; return plots."""
+    rng = np.random.default_rng(seed)
+    params: dict[str, Any] = {}
+    if template == "quadratic":
+        params = {"cond": 10.0}
+    if template == "gaussian_mix":
+        params = {"k": 3, "sigma": 0.5, "spread": 2.0}
+    if template == "himmelblau":
+        dim = 2
+    ls = build_landscape(template=template, dim=dim, params=params, rng=rng)
+    # Tune Adam LR for the baseline
+    tune_x0 = np.random.default_rng(0).normal(0.0, 0.5, size=dim)
+    best_lr = tune_adam_lr(ls.f, ls.grad, tune_x0, sweep_steps=30)
+    # Compile user code
+    try:
+        opt = compile_optimizer(code, dim=dim)
+    except SandboxError as e:
+        return None, None, None, f"### ❌ Compile error\n\n```\n{e}\n```", {}
+    # Single-seed auto-test for quick trajectory view
+    test = auto_test_draft(opt, ls, seed=seed, steps=20)
+    # Full arena for user's optimizer
+    user_arena = run_arena(opt, ls, seeds=[101, 202, 303, 404, 505, 606, 707, 808, 909, 1010],
+                            steps=200)
+    # Full arena for tuned Adam
+    ADAM_TEMPLATE = f"""
+class Optimizer:
+    def __init__(self, dim):
+        self.lr={best_lr}; self.b1=0.9; self.b2=0.999; self.eps=1e-8
+        self.m = np.zeros(dim); self.v = np.zeros(dim); self.t = 0
+    def step(self, x, f_val, grad):
+        self.t += 1
+        self.m = self.b1*self.m + (1-self.b1)*grad
+        self.v = self.b2*self.v + (1-self.b2)*grad*grad
+        mh = self.m/(1-self.b1**self.t)
+        vh = self.v/(1-self.b2**self.t)
+        return x - self.lr * mh / (np.sqrt(vh) + self.eps)
+""".strip()
+    adam_opt = compile_optimizer(ADAM_TEMPLATE, dim=dim)
+    adam_arena = run_arena(adam_opt, ls, seeds=[101, 202, 303, 404, 505, 606, 707, 808, 909, 1010],
+                            steps=200)
+    reward = compute_optcoder_reward(
+        arena=user_arena,
+        adam_arena=adam_arena,
+        actions_used_cost=0,         # not relevant outside an episode
+        budget_total=12,
+        novelty_score=ast_novelty_score(code, [ADAM_TEMPLATE]),
+        convergence_step=None,
+        arena_steps=200,
+    )
+    # 2D contour if applicable
+    if dim == 2:
+        user_traj = [(s["x"][0], s["x"][1]) for s in test["detail"]]
+        adam_run = run_baseline("adam", ls.f, ls.grad,
+                                 np.random.default_rng(seed).normal(0.0, 0.5, 2), steps=50)
+        adam_traj = [(s["x"][0], s["x"][1]) for s in adam_run["trajectory"]
+                     if s.get("x") is not None]
+        contour = _contour_plot(
+            ls,
+            trajectories={"custom": user_traj, "adam": adam_traj},
+            title=f"{template} — your optimizer vs tuned Adam",
+        )
+    else:
+        fig, ax = plt.subplots(figsize=(6.5, 5.5))
+        ax.text(0.5, 0.5, f"{template} · dim={dim}\nContour view only for 2D",
+                ha="center", va="center", fontsize=12,
+                color="#6b7280", transform=ax.transAxes)
+        ax.set_axis_off()
+        contour = fig
+    # Progress bar plot
+    progress = {
+        "custom": user_arena.mean_progress,
+        "adam (tuned)": adam_arena.mean_progress,
+    }
+    progress_plot = _bar_plot(progress,
+                              title="Arena mean progress (higher = better)",
+                              ylabel="mean(f_initial - f_final) across 10 seeds")
+    # Reward breakdown plot
+    bk = reward.breakdown
+    components = {
+        "r_regret":      bk["r_regret"],
+        "r_convergence": bk["r_convergence"],
+        "r_robustness":  bk["r_robustness"],
+        "r_novelty":     bk["r_novelty"],
+        "-r_budget":     -bk["r_budget"],
+        "-r_eval_fail":  -bk["r_eval_failures"],
+    }
+    fig, ax = plt.subplots(figsize=(7, 3.2))
+    colors = ["#10b981" if v >= 0 else "#ef4444" for v in components.values()]
+    bars = ax.bar(list(components.keys()), list(components.values()), color=colors)
+    for bar, v in zip(bars, components.values()):
+        ax.text(bar.get_x() + bar.get_width() / 2,
+                bar.get_height() + (0.02 if v >= 0 else -0.06),
+                f"{v:+.3f}", ha="center",
+                va="bottom" if v >= 0 else "top", fontsize=9)
+    ax.axhline(0, color="black", linewidth=0.5)
+    ax.set_title(f"Reward breakdown · total = {reward.r_total:+.3f}")
+    ax.grid(alpha=0.3, axis="y")
+    fig.tight_layout()
+    reward_plot = fig
+    summary = (
+        f"### Summary\n\n"
+        f"- **Your progress (mean over 10 seeds):** `{user_arena.mean_progress:.4g}`\n"
+        f"- **Tuned Adam's progress:** `{adam_arena.mean_progress:.4g}` (lr={best_lr:g})\n"
+        f"- **Speedup vs Adam:** `{bk.get('speedup_vs_adam', 0):.3g}×`\n"
+        f"- **Your crash fraction:** `{user_arena.crash_fraction:.0%}`\n"
+        f"- **Total reward:** `{reward.r_total:+.3f}`"
+    )
+    return contour, progress_plot, reward_plot, summary, bk
+# ---------- top-level UI ----------
+ABOUT_MD = """
+# LandscapeForge
+An OpenEnv environment where an LLM agent designs optimization algorithms
+through a probe-draft-commit REPL. Two agents co-evolve: one writes
+optimizer code, the other picks adversarial landscapes.
+**How it works:**
+1. LandscapeForge picks a loss landscape `f : ℝⁿ → ℝ` (quadratic, Rosenbrock,
+   Styblinski-Tang, …) at a difficulty tier calibrated to the agent's skill.
+2. The OptCoder agent runs the REPL: `run_baseline` reference optimizers to
+   observe behaviour, `draft` candidate `Optimizer` classes (env auto-tests
+   them), `inspect` prior drafts to diagnose, `commit` when satisfied.
+3. Phase D — full evaluation on 10 fresh seeds × 200 steps. Reward is
+   **Adam-relative progress** (no `f_min` dependency — generalizes to NNs).
+4. GRPO trains both agents against each other, Goldilocks-regulated so
+   difficulty tracks skill.
+**This demo** lets you explore the env:
+- **Landscape Explorer** — pick a template, see what the agent sees.
+- **Baseline Race** — see how SGD / Momentum / Adam (tuned) / L-BFGS
+  actually perform on each landscape.
+- **Optimizer Arena** — paste a custom `Optimizer` class, run it through
+  the full arena, see the reward breakdown vs tuned Adam.
+The full env is also available via the FastAPI endpoint at `/step`, `/reset`,
+`/schema` — wire it into any TRL/Unsloth GRPO training loop.
+**Links:** [Design doc](./) · [Paper anchors: Lion, FunSearch, GenEnv] ·
+[Source]
+"""
+def build_ui(*args, **kwargs) -> gr.Blocks:
+    """Entry point consumed by `create_app(..., gradio_builder=build_ui)`.
+    Accepts any args/kwargs that OpenEnv forwards; ignores them since this
+    demo operates on its own in-process env instances.
+    """
+    with gr.Blocks(title="LandscapeForge",
+                   theme=gr.themes.Soft(primary_hue="violet"),
+                   css=".gr-box{padding:1em;}") as app:
+        gr.Markdown("# 🏔️ LandscapeForge\n"
+                    "**An LLM agent designing optimizers through a probe-draft-commit REPL.**")
+        with gr.Tabs():
+            # --- Tab 1 ---
+            with gr.Tab("🌄 Landscape Explorer"):
+                gr.Markdown("Pick a landscape template; see what structural "
+                            "hints the env shows the OptCoder agent.")
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        tmpl = gr.Dropdown(
+                            choices=TEMPLATES_2D_SAFE,
+                            value="rosenbrock",
+                            label="Template",
+                        )
+                        dim = gr.Slider(2, 10, value=2, step=1, label="Dim")
+                        seed = gr.Slider(0, 100, value=0, step=1, label="Seed")
+                        go1 = gr.Button("Build landscape", variant="primary")
+                    with gr.Column(scale=2):
+                        plot1 = gr.Plot(label="Contour (2D only)")
+                        hints1 = gr.Dataframe(
+                            headers=["property", "value"],
+                            datatype=["str", "str"],
+                            label="Structural hints (shown to the agent at reset)",
+                            wrap=True,
+                        )
+                go1.click(_explore_landscape, [tmpl, dim, seed],
+                          [plot1, hints1])
+                # Fire once on load so the UI isn't empty
+                app.load(_explore_landscape,
+                         [gr.State("rosenbrock"), gr.State(2), gr.State(0)],
+                         [plot1, hints1])
+            # --- Tab 2 ---
+            with gr.Tab("🏁 Baseline Race"):
+                gr.Markdown("Watch SGD / Momentum / Adam (tuned per-landscape) / "
+                            "L-BFGS race to the minimum on the same initial point.")
+                with gr.Row():
+                    tmpl2 = gr.Dropdown(
+                        choices=TEMPLATES_2D_SAFE,
+                        value="rosenbrock",
+                        label="Template (dim=2 for contour)",
+                    )
+                    seed2 = gr.Slider(0, 100, value=1, step=1, label="Seed")
+                    go2 = gr.Button("🏁 Race!", variant="primary")
+                with gr.Row():
+                    plot2a = gr.Plot(label="Contour + trajectories")
+                with gr.Row():
+                    plot2b = gr.Plot(label="f(x) vs step")
+                    plot2c = gr.Plot(label="Final f (50 steps)")
+                summary2 = gr.Markdown()
+                go2.click(_baseline_race, [tmpl2, seed2],
+                          [plot2a, plot2b, plot2c, summary2])
+            # --- Tab 3 ---
+            with gr.Tab("⚔️ Optimizer Arena"):
+                gr.Markdown(
+                    "Paste (or edit the sample) an `Optimizer` class, and "
+                    "we'll run it through the full Phase-D arena against "
+                    "tuned-Adam on the chosen landscape. **No `import` needed — "
+                    "`np` is pre-injected.**"
+                )
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        tmpl3 = gr.Dropdown(
+                            choices=list(BUILDERS.keys()),
+                            value="quadratic",
+                            label="Template",
+                        )
+                        dim3 = gr.Slider(2, 10, value=5, step=1, label="Dim")
+                        seed3 = gr.Slider(0, 100, value=42, step=1, label="Seed")
+                        go3 = gr.Button("⚔️ Run arena", variant="primary")
+                    with gr.Column(scale=2):
+                        code3 = gr.Code(
+                            value=SAMPLE_OPTIMIZER_CODE,
+                            language="python",
+                            label="Your Optimizer class",
+                            lines=14,
+                        )
+                with gr.Row():
+                    plot3a = gr.Plot(label="2D trajectory (if dim=2)")
+                    plot3b = gr.Plot(label="Mean arena progress (higher = better)")
+                plot3c = gr.Plot(label="Reward breakdown vs tuned Adam")
+                summary3 = gr.Markdown()
+                breakdown3 = gr.JSON(label="Full reward breakdown")
+                go3.click(_arena_compare, [tmpl3, dim3, seed3, code3],
+                          [plot3a, plot3b, plot3c, summary3, breakdown3])
+            # --- About ---
+            with gr.Tab("📖 About"):
+                gr.Markdown(ABOUT_MD)
+    return app

landscapes.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""Landscape template library (v1 template-picker LandscapeForge).
+Each template has a hand-written analytic gradient — no autodiff required.
+All templates are guaranteed differentiable, finite, and bounded below on
+the typical init region x ~ N(0, 0.5^2 I).
+"""
+from dataclasses import dataclass, field
+from typing import Callable, Literal
+import numpy as np
+TemplateName = Literal[
+    "quadratic", "styblinski_tang", "huber",
+    "gaussian_mix", "himmelblau",
+    "rosenbrock", "stiff_quadratic", "plateau", "cliff",
+]
+Tier = Literal["T0", "T1", "T2"]
+TIER_MENU: dict[str, list[str]] = {
+    "T0": ["quadratic", "styblinski_tang", "huber"],
+    "T1": ["quadratic", "styblinski_tang", "huber", "gaussian_mix", "himmelblau"],
+    "T2": ["quadratic", "styblinski_tang", "huber", "gaussian_mix", "himmelblau",
+           "rosenbrock", "stiff_quadratic", "plateau", "cliff"],
+}
+@dataclass
+class Landscape:
+    name: str
+    dim: int
+    params: dict
+    f: Callable[[np.ndarray], float]
+    grad: Callable[[np.ndarray], np.ndarray]
+    f_min: float = 0.0            # known global minimum value
+    description: str = ""
+# ---------- template constructors ----------
+def make_quadratic(dim: int, cond: float = 1.0, rng: np.random.Generator | None = None) -> Landscape:
+    """f(x) = 0.5 * x^T A x with diag Hessian of condition number `cond`."""
+    diag = np.linspace(1.0, float(cond), dim)
+    A = np.diag(diag)
+    def f(x): return float(0.5 * x @ A @ x)
+    def grad(x): return A @ x
+    return Landscape(
+        name="quadratic", dim=dim, params={"cond": cond},
+        f=f, grad=grad, f_min=0.0,
+        description=f"Convex quadratic in R^{dim}, condition number {cond:.1f}.",
+    )
+def make_stiff_quadratic(dim: int, cond: float = 1000.0, **_) -> Landscape:
+    return make_quadratic(dim, cond)  # alias with higher cond
+def make_styblinski_tang(dim: int, **_) -> Landscape:
+    """f(x) = 0.5 * sum(x^4 - 16 x^2 + 5 x), min at x_i ≈ -2.903534."""
+    def f(x):
+        return float(0.5 * np.sum(x**4 - 16.0 * x**2 + 5.0 * x))
+    def grad(x):
+        return 0.5 * (4.0 * x**3 - 32.0 * x + 5.0)
+    f_min = dim * 0.5 * ((-2.903534)**4 - 16.0 * (-2.903534)**2 + 5.0 * (-2.903534))
+    return Landscape(
+        name="styblinski_tang", dim=dim, params={},
+        f=f, grad=grad, f_min=float(f_min),
+        description=f"Styblinski-Tang in R^{dim}, multimodal with global min at x_i ≈ -2.9.",
+    )
+def make_huber(dim: int, delta: float = 1.0, **_) -> Landscape:
+    """Smooth Huber-ish loss: f(x) = sum(delta^2 * (sqrt(1 + (x/delta)^2) - 1)).
+    Smooth everywhere (unlike piecewise Huber). Behaves like 0.5 x^2 near 0,
+    linear for |x| >> delta.
+    """
+    def f(x):
+        return float(np.sum(delta**2 * (np.sqrt(1.0 + (x/delta)**2) - 1.0)))
+    def grad(x):
+        return x / np.sqrt(1.0 + (x/delta)**2)
+    return Landscape(
+        name="huber", dim=dim, params={"delta": delta},
+        f=f, grad=grad, f_min=0.0,
+        description=f"Smooth pseudo-Huber in R^{dim}, delta={delta}.",
+    )
+def make_rosenbrock(dim: int, **_) -> Landscape:
+    """Classic stiff-valley Rosenbrock."""
+    assert dim >= 2
+    def f(x):
+        return float(np.sum(100.0 * (x[1:] - x[:-1]**2)**2 + (1.0 - x[:-1])**2))
+    def grad(x):
+        g = np.zeros_like(x)
+        g[:-1] += -400.0 * x[:-1] * (x[1:] - x[:-1]**2) - 2.0 * (1.0 - x[:-1])
+        g[1:] += 200.0 * (x[1:] - x[:-1]**2)
+        return g
+    return Landscape(
+        name="rosenbrock", dim=dim, params={},
+        f=f, grad=grad, f_min=0.0,
+        description=f"Rosenbrock (curved stiff valley) in R^{dim}, min at (1,..,1).",
+    )
+def make_gaussian_mix(dim: int, k: int = 3, sigma: float = 0.5, spread: float = 2.0,
+                      rng: np.random.Generator | None = None, **_) -> Landscape:
+    """f(x) = -sum_j w_j * exp(-||x - c_j||^2 / (2 sigma^2)).
+    Negated so minima are where mixture density is highest. Bounded below by 0.
+    """
+    rng = rng if rng is not None else np.random.default_rng(0)
+    centers = rng.normal(0.0, spread, size=(k, dim))
+    weights = np.ones(k) / k   # uniform; one of these is the "global" min
+    s2 = sigma * sigma
+    def f(x):
+        d2 = np.sum((centers - x)**2, axis=1)           # (k,)
+        return float(-np.sum(weights * np.exp(-d2 / (2.0 * s2))))
+    def grad(x):
+        diffs = x - centers                              # (k, dim)
+        d2 = np.sum(diffs**2, axis=1)                   # (k,)
+        e = np.exp(-d2 / (2.0 * s2))                     # (k,)
+        # d/dx [-w_j exp(-||x-c_j||^2 / 2s^2)] = w_j * (x-c_j) / s^2 * exp(...)
+        return (weights * e / s2)[:, None] * diffs      # broadcast, sum over k below
+        # Wait — need to sum over components:
+    # Fix grad to properly sum:
+    def grad_correct(x):
+        diffs = x - centers
+        d2 = np.sum(diffs**2, axis=1)
+        e = np.exp(-d2 / (2.0 * s2))
+        coeff = weights * e / s2
+        return np.sum(coeff[:, None] * diffs, axis=0)
+    # Global min approx: evaluate at each center, return the lowest f.
+    f_min = float(min(f(c) for c in centers))
+    return Landscape(
+        name="gaussian_mix", dim=dim,
+        params={"k": k, "sigma": sigma, "spread": spread, "centers": centers.tolist()},
+        f=f, grad=grad_correct, f_min=f_min,
+        description=f"Negative Gaussian mixture in R^{dim}, k={k} modes, sigma={sigma}, spread={spread}.",
+    )
+def make_himmelblau(dim: int = 2, **_) -> Landscape:
+    """f(x,y) = (x^2+y-11)^2 + (x+y^2-7)^2. 4 global minima at value 0."""
+    assert dim == 2, "Himmelblau is 2D only"
+    def f(x):
+        return float((x[0]**2 + x[1] - 11.0)**2 + (x[0] + x[1]**2 - 7.0)**2)
+    def grad(x):
+        gx = 4.0 * x[0] * (x[0]**2 + x[1] - 11.0) + 2.0 * (x[0] + x[1]**2 - 7.0)
+        gy = 2.0 * (x[0]**2 + x[1] - 11.0) + 4.0 * x[1] * (x[0] + x[1]**2 - 7.0)
+        return np.array([gx, gy])
+    return Landscape(
+        name="himmelblau", dim=2, params={}, f=f, grad=grad, f_min=0.0,
+        description="Himmelblau (2D), four global minima at value 0.",
+    )
+def make_plateau(dim: int, radius: float = 1.0, **_) -> Landscape:
+    """Smooth plateau: f = tanh((||x||^2 - r^2) / r^2). Near-zero gradient far from boundary."""
+    r2 = radius * radius
+    def f(x):
+        return float(np.tanh((np.sum(x**2) - r2) / r2))
+    def grad(x):
+        u = (np.sum(x**2) - r2) / r2
+        return (1.0 - np.tanh(u)**2) * (2.0 * x / r2)
+    return Landscape(
+        name="plateau", dim=dim, params={"radius": radius},
+        f=f, grad=grad, f_min=-1.0,
+        description=f"Plateau landscape in R^{dim}, radius {radius}, vanishing gradient at center.",
+    )
+def make_cliff(dim: int, **_) -> Landscape:
+    """Smooth cliff: quadratic + tanh step. Tough for fixed-step optimizers."""
+    def f(x):
+        s = np.sum(x)
+        return float(0.5 * np.sum(x**2) + 5.0 * np.tanh(s))
+    def grad(x):
+        s = np.sum(x)
+        t = 1.0 - np.tanh(s)**2
+        return x + 5.0 * t * np.ones_like(x)
+    return Landscape(
+        name="cliff", dim=dim, params={},
+        f=f, grad=grad, f_min=-5.0,  # approximate lower bound
+        description=f"Quadratic with tanh cliff in R^{dim}.",
+    )
+BUILDERS: dict[str, Callable[..., Landscape]] = {
+    "quadratic": make_quadratic,
+    "stiff_quadratic": make_stiff_quadratic,
+    "styblinski_tang": make_styblinski_tang,
+    "huber": make_huber,
+    "rosenbrock": make_rosenbrock,
+    "gaussian_mix": make_gaussian_mix,
+    "himmelblau": make_himmelblau,
+    "plateau": make_plateau,
+    "cliff": make_cliff,
+}
+def build_landscape(template: str, dim: int, params: dict | None = None,
+                    rng: np.random.Generator | None = None) -> Landscape:
+    """Instantiate a landscape by name."""
+    if template not in BUILDERS:
+        raise ValueError(f"Unknown template {template!r}; known: {list(BUILDERS)}")
+    return BUILDERS[template](dim=dim, rng=rng, **(params or {}))
+def structural_hints(ls: Landscape, n_samples: int = 200,
+                     rng: np.random.Generator | None = None) -> dict:
+    """Env-computed hints: Lipschitz estimate, gradient spread, modality hint.
+    Sampled at reset, exposed to OptCoder as free info.
+    """
+    rng = rng if rng is not None else np.random.default_rng(0)
+    xs = rng.normal(0.0, 1.0, size=(n_samples, ls.dim))
+    fs = np.array([ls.f(x) for x in xs])
+    gs = np.array([ls.grad(x) for x in xs])
+    g_norms = np.linalg.norm(gs, axis=1)
+    return {
+        "lipschitz_estimate": float(np.percentile(g_norms, 95)),
+        "grad_norm_median": float(np.median(g_norms)),
+        "f_range": [float(fs.min()), float(fs.max())],
+        "f_median": float(np.median(fs)),
+        # crude modality: count local f peaks on random 1D slices
+        "modality_hint": _modality_hint(ls, rng),
+    }
+def _modality_hint(ls: Landscape, rng: np.random.Generator) -> str:
+    """Very crude multimodality probe on 5 random 1D slices."""
+    hits = 0
+    for _ in range(5):
+        center = rng.normal(0.0, 0.5, size=ls.dim)
+        direction = rng.normal(0.0, 1.0, size=ls.dim)
+        direction /= np.linalg.norm(direction) + 1e-12
+        ts = np.linspace(-3.0, 3.0, 30)
+        vals = np.array([ls.f(center + t * direction) for t in ts])
+        # count sign changes in finite diff
+        d = np.diff(vals)
+        s = np.sign(d)
+        sign_changes = int(np.sum(s[1:] != s[:-1]))
+        if sign_changes >= 3:
+            hits += 1
+    if hits >= 3:
+        return "multimodal"
+    if hits >= 1:
+        return "possibly_multimodal"
+    return "unimodal"

models.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""Data models for the LandscapeForge environment.
+OptCoder actions are modelled as a single unified Action with a `kind`
+discriminator. Fields are optional per-kind and validated by a model
+validator so the HTTP envelope stays flat and easy to serialize.
+"""
+from typing import Any, Literal, Optional
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field, model_validator
+ActionKind = Literal["run_baseline", "draft", "inspect", "commit"]
+BaselineName = Literal["sgd", "adam", "momentum", "lbfgs"]
+# Per-action budget costs (§7.1 of LANDSCAPEFORGE_DESIGN.md).
+ACTION_COSTS: dict[str, int] = {
+    "run_baseline": 2,
+    "draft": 2,
+    "inspect": 1,
+    "commit": 0,
+}
+class LandscapeforgeAction(Action):
+    """OptCoder REPL action.
+    A single class covers all four action kinds; `kind` discriminates and
+    a model validator ensures each kind has its required fields.
+    """
+    kind: ActionKind = Field(..., description="Which REPL action")
+    # run_baseline fields
+    baseline_name: Optional[BaselineName] = Field(
+        default=None, description="Reference optimizer to run"
+    )
+    # Note: steps count is env-controlled (BASELINE_STEPS in the env) — the
+    # agent does not choose it. Kept off the schema so the LLM never emits it.
+    # draft fields
+    code: Optional[str] = Field(
+        default=None, description="Full Optimizer class source (for kind='draft')"
+    )
+    # inspect fields
+    draft_idx: Optional[int] = Field(
+        default=None, ge=0, description="Which prior draft to inspect"
+    )
+    step_range_start: int = Field(default=0, ge=0)
+    step_range_end: int = Field(default=20, ge=1, le=50)
+    @model_validator(mode="after")
+    def _check_kind_fields(self) -> "LandscapeforgeAction":
+        k = self.kind
+        if k == "run_baseline" and self.baseline_name is None:
+            raise ValueError("run_baseline requires baseline_name")
+        if k == "draft" and not self.code:
+            raise ValueError("draft requires code")
+        if k == "inspect" and self.draft_idx is None:
+            raise ValueError("inspect requires draft_idx")
+        return self
+class LandscapeforgeObservation(Observation):
+    """OptCoder's view of env state after an action.
+    Fields are self-describing strings/structured data that fit into an
+    LLM prompt. Heavy trajectory data is JSON-serializable lists.
+    """
+    # Stable across the episode
+    landscape_description: str = Field(default="")
+    dim: int = Field(default=0)
+    structural_hints: dict[str, Any] = Field(default_factory=dict)
+    # REPL state (grows over the episode)
+    baseline_history: list[dict[str, Any]] = Field(default_factory=list)
+    draft_history: list[dict[str, Any]] = Field(default_factory=list)
+    inspect_requests: list[dict[str, Any]] = Field(default_factory=list)
+    current_draft: Optional[str] = Field(default=None)
+    budget_remaining: int = Field(default=0)
+    # Result of the immediate step
+    last_action_kind: Optional[str] = Field(default=None)
+    last_action_result: dict[str, Any] = Field(default_factory=dict)
+    # Terminal info (only populated after commit / budget exhausted)
+    committed: bool = Field(default=False)
+    final_regret: Optional[float] = Field(default=None)
+    r_optcoder: Optional[float] = Field(default=None)
+    r_optcoder_breakdown: dict[str, float] = Field(default_factory=dict)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: landscapeforge
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

openenv_landscapeforge.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,15 @@

+Metadata-Version: 2.4
+Name: openenv-landscapeforge
+Version: 0.1.0
+Summary: Landscapeforge environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: openenv-core[core]>=0.2.2
+Requires-Dist: numpy<3,>=1.26
+Requires-Dist: scipy<2,>=1.11
+Requires-Dist: requests<3,>=2.31
+Requires-Dist: gradio<6,>=4.44
+Requires-Dist: matplotlib<4,>=3.8
+Requires-Dist: plotly<7,>=5.22
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_landscapeforge.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+README.md
+__init__.py
+arena.py
+client.py
+landscapes.py
+models.py
+prompts.py
+pyproject.toml
+reference_optimizers.py
+rewards.py
+run_llm_episode.py
+sandbox.py
+./__init__.py
+./arena.py
+./client.py
+./landscapes.py
+./models.py
+./prompts.py
+./reference_optimizers.py
+./rewards.py
+./run_llm_episode.py
+./sandbox.py
+openenv_landscapeforge.egg-info/PKG-INFO
+openenv_landscapeforge.egg-info/SOURCES.txt
+openenv_landscapeforge.egg-info/dependency_links.txt
+openenv_landscapeforge.egg-info/entry_points.txt
+openenv_landscapeforge.egg-info/requires.txt
+openenv_landscapeforge.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/landscapeforge_environment.py
+tests/test_episode.py

openenv_landscapeforge.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_landscapeforge.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = landscapeforge.server.app:main

openenv_landscapeforge.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+openenv-core[core]>=0.2.2
+numpy<3,>=1.26
+scipy<2,>=1.11
+requests<3,>=2.31
+gradio<6,>=4.44
+matplotlib<4,>=3.8
+plotly<7,>=5.22
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_landscapeforge.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ landscapeforge

prompts.py ADDED Viewed

	@@ -0,0 +1,267 @@

+"""Observation → prompt rendering + LLM response → action parsing.
+Keeps prompt format aligned with Appendix A of LANDSCAPEFORGE_DESIGN.md while
+trimming obs fields that bloat tokens (e.g. full trajectories get summarised).
+"""
+from __future__ import annotations
+import json
+import re
+from typing import Any
+from .models import LandscapeforgeAction, LandscapeforgeObservation
+SYSTEM = """You are OptCoder. You will design an optimization algorithm for a
+hidden landscape f: R^n → R by iteratively: running reference optimizers to
+observe their behaviour, writing candidate `Optimizer` classes and seeing how
+they perform, inspecting past drafts to diagnose failures, and committing when
+you are satisfied.
+How the episode ends:
+  - When you call `commit`, the env runs the full arena evaluation
+    (10 seeds × 200 steps) on your MOST RECENT draft and that becomes your
+    reward. This is the normal, preferred way to finish.
+  - If you never call `commit`, when your budget runs out the env will
+    automatically do the same thing — evaluate your most recent draft.
+    Your last draft is always what gets evaluated, whether you commit
+    explicitly or the budget runs out.
+  - So: make sure your last draft is the one you actually want evaluated.
+    If you improve a draft then change your mind, re-submit the good one
+    before ending the episode.
+A typical good episode is ~4 turns:
+  draft → (maybe) inspect → (maybe) refine → commit.
+Reply with a single JSON object — nothing else, no prose, no markdown.
+JSON formatting rules (important, models frequently get this wrong):
+  - All strings use standard JSON double-quotes: "like this"
+  - Do NOT use Python triple-quoted strings \"\"\"...\"\"\" — they are NOT valid JSON
+  - For multi-line code, escape newlines as \\n inside the string value:
+      {"kind": "draft", "code": "class Optimizer:\\n    def __init__(self, dim): ..."}
+""".strip()
+ACTION_SPEC = """
+Available actions (cost charged against your budget):
+  run_baseline  (cost 2)  Run a reference optimizer on the hidden landscape.
+    JSON: {"kind": "run_baseline", "baseline_name": "sgd"|"momentum"|"adam"|"lbfgs"}
+    Returns a 30-step trajectory (x_t, f_t, grad_norm_t). Source code not revealed.
+  draft         (cost 2)  Submit a full Optimizer class; env auto-tests it.
+    JSON: {"kind": "draft", "code": "<python source>"}
+    The code MUST be a standalone class with no base class:
+        class Optimizer:
+            def __init__(self, dim):
+                ...
+            def step(self, x, f_val, grad):
+                ...
+                return x_new
+    Rules:
+      - Top-level line must be exactly:  class Optimizer:
+        (no parent class — BaseOptimizer, nn.Module, object, etc. do NOT exist)
+      - Use only numpy as `np` and math — both pre-injected; DO NOT write import lines
+      - step(x, f_val, grad) must return a numpy array of shape (dim,)
+      - No I/O, no globals, no file operations
+      - Only the class definition is kept; demo code at module level is stripped
+  inspect       (cost 1)  Zoom into a prior draft's per-step behaviour.
+    JSON: {"kind": "inspect", "draft_idx": 0, "step_range_start": 10, "step_range_end": 20}
+    Returns per-step (x, f, grad, update_norm, step_size_eff).
+  commit        (cost 0)  Evaluate your most recent draft on the full arena.
+    JSON: {"kind": "commit"}
+    Preferred way to end the episode. Call it when you have a draft you
+    trust. If you don't call it, budget exhaustion triggers the same
+    evaluation on whatever your latest draft is — so your most recent
+    draft should always be your best one. Committing explicitly just
+    ends the episode sooner.
+""".strip()
+def render_observation(obs: LandscapeforgeObservation) -> str:
+    """Turn an Observation into a compact prompt-friendly state summary."""
+    lines: list[str] = []
+    lines.append(f"Landscape: {obs.landscape_description}")
+    lines.append(f"Dim: {obs.dim}")
+    lines.append(f"Structural hints:")
+    for k, v in (obs.structural_hints or {}).items():
+        lines.append(f"  {k}: {_fmt(v)}")
+    lines.append(f"Budget remaining: {obs.budget_remaining}")
+    if obs.baseline_history:
+        lines.append("\nBaseline runs (diagnostic trajectories):")
+        for i, b in enumerate(obs.baseline_history):
+            summary = _summarise_trajectory(b.get("trajectory", []))
+            lines.append(f"  [{i}] {b['name']}: {summary}")
+    if obs.draft_history:
+        lines.append("\nDraft history:")
+        for i, d in enumerate(obs.draft_history):
+            if d.get("compile_error"):
+                lines.append(f"  [{i}] COMPILE ERROR: {d['compile_error']}")
+            else:
+                s = d["summary"] or {}
+                status = "CONVERGED" if s.get("converged") else (
+                    "DIVERGED" if s.get("diverged") else "partial"
+                )
+                lines.append(
+                    f"  [{i}] {status} | initial_f={_fmt(s.get('initial_f'))} "
+                    f"final_f={_fmt(s.get('final_f'))} "
+                    f"step_of_min={s.get('step_of_min')}"
+                )
+                code = d.get("code") or ""
+                lines.append("       code:")
+                for cl in code.splitlines()[:40]:    # first 40 lines only
+                    lines.append(f"         {cl}")
+    if obs.inspect_requests:
+        lines.append("\nInspect results:")
+        for r in obs.inspect_requests:
+            detail = r.get("detail") or []
+            lines.append(
+                f"  draft={r.get('draft_idx')} range={r.get('step_range')} "
+                f"({len(detail)} steps)"
+            )
+            for d in detail[:8]:    # first 8 of the slice
+                lines.append(
+                    f"    t={d.get('t'):>3}  f={_fmt(d.get('f'))}  "
+                    f"|g|={_fmt(d.get('grad_norm'))}  "
+                    f"|Δx|={_fmt(d.get('update_norm'))}  "
+                    f"η_eff={_fmt(d.get('step_size_eff'))}"
+                )
+    if obs.current_draft:
+        lines.append(f"\nCurrent draft ({len(obs.current_draft)} chars) — will be evaluated on commit.")
+    if obs.last_action_kind:
+        lines.append(f"\nLast action: {obs.last_action_kind}")
+        feedback = (obs.last_action_result or {}).get("feedback")
+        if feedback:
+            parts = ", ".join(f"{k}={_fmt(v)}" for k, v in feedback.items())
+            lines.append(f"Step feedback: {parts}   "
+                         "(signals for your reasoning; not added to final reward)")
+    return "\n".join(lines)
+def build_prompt(obs: LandscapeforgeObservation) -> list[dict]:
+    """Return OpenAI-style messages list for the chat completions endpoint."""
+    state_text = render_observation(obs)
+    return [
+        {"role": "system", "content": SYSTEM},
+        {"role": "user", "content": f"{ACTION_SPEC}\n\nCurrent state:\n{state_text}\n\n"
+                                     "Reply with a single JSON object for your next action."},
+    ]
+# ---------- response → action ----------
+_JSON_RE = re.compile(r"\{.*\}", re.DOTALL)
+def parse_action(response_text: str) -> LandscapeforgeAction:
+    """Extract the first JSON object from the LLM's reply and build an Action.
+    Accepts code-fenced JSON, raw JSON, and JSON embedded in prose. Tolerates
+    the common LLM failure mode of emitting unescaped newlines / tabs inside
+    string values (especially for the `code` field of a `draft` action).
+    Raises ValueError if no parseable object is found.
+    """
+    text = response_text.strip()
+    if text.startswith("```"):
+        text = re.sub(r"^```(?:json)?\n?", "", text)
+        text = re.sub(r"\n?```\s*$", "", text)
+    match = _JSON_RE.search(text)
+    if not match:
+        raise ValueError(f"No JSON object in response: {response_text[:200]!r}")
+    raw_json = match.group(0)
+    # First pass: strict.
+    try:
+        data = json.loads(raw_json)
+    except json.JSONDecodeError:
+        # Second pass: escape raw control chars inside string literals.
+        fixed = _escape_string_controls(raw_json)
+        try:
+            data = json.loads(fixed)
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON even after control-char fix: {e}; "
+                             f"raw: {raw_json[:200]!r}") from e
+    if "kind" not in data:
+        raise ValueError(f"Missing `kind`: {data}")
+    return LandscapeforgeAction(**data)
+def _escape_string_controls(s: str) -> str:
+    """Escape raw newlines, carriage returns, and tabs inside JSON string literals.
+    Walks character-by-character tracking whether we're inside a double-quoted
+    string, and replaces raw control chars with their escaped forms. Handles
+    the common case: `"code": "class Optimizer:\\n  def __init__..."` where the
+    LLM emitted literal newlines.
+    """
+    out: list[str] = []
+    in_string = False
+    escape_next = False
+    for ch in s:
+        if escape_next:
+            out.append(ch)
+            escape_next = False
+            continue
+        if ch == "\\":
+            out.append(ch)
+            escape_next = True
+            continue
+        if ch == '"':
+            in_string = not in_string
+            out.append(ch)
+            continue
+        if in_string:
+            if ch == "\n":
+                out.append("\\n"); continue
+            if ch == "\r":
+                out.append("\\r"); continue
+            if ch == "\t":
+                out.append("\\t"); continue
+        out.append(ch)
+    return "".join(out)
+# ---------- helpers ----------
+def _fmt(v: Any) -> str:
+    if v is None:
+        return "None"
+    if isinstance(v, float):
+        if abs(v) < 1e-4 or abs(v) >= 1e4:
+            return f"{v:.3e}"
+        return f"{v:.4f}"
+    if isinstance(v, list):
+        if len(v) <= 4:
+            return "[" + ", ".join(_fmt(x) for x in v) + "]"
+        return f"[{_fmt(v[0])}, {_fmt(v[1])}, ..., {_fmt(v[-1])}] (len={len(v)})"
+    return str(v)
+def _summarise_trajectory(traj: list[dict]) -> str:
+    """Condense a 30-step baseline trajectory to head/tail snapshots."""
+    finite = [s for s in traj if s.get("f") is not None]
+    if not finite:
+        return "diverged immediately"
+    head = finite[0]
+    mid = finite[len(finite) // 2] if len(finite) > 2 else finite[-1]
+    tail = finite[-1]
+    diverged_mark = "  (DIVERGED)" if len(finite) < len(traj) else ""
+    return (f"t=0: f={_fmt(head['f'])}, |g|={_fmt(head['grad_norm'])}  "
+            f"→ t={mid['t']}: f={_fmt(mid['f'])}  "
+            f"→ t={tail['t']}: f={_fmt(tail['f'])}{diverged_mark}")

pyproject.toml ADDED Viewed

	@@ -0,0 +1,43 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-landscapeforge"
+version = "0.1.0"
+description = "Landscapeforge environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    "numpy>=1.26,<3",
+    "scipy>=1.11,<2",
+    "requests>=2.31,<3",
+    "gradio>=4.44,<6",
+    "matplotlib>=3.8,<4",
+    "plotly>=5.22,<7",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m landscapeforge.server.app
+server = "landscapeforge.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["landscapeforge", "landscapeforge.server"]
+package-dir = { "landscapeforge" = ".", "landscapeforge.server" = "server" }

reference_optimizers.py ADDED Viewed

	@@ -0,0 +1,150 @@

+"""Reference optimizers run by `run_baseline` action.
+These are invoked by the env — not by OptCoder's submitted code. They
+produce diagnostic trajectories (x_t, f_t, |g_t|) that the agent sees.
+The source code is NEVER exposed to the agent.
+"""
+from typing import Callable
+import numpy as np
+def _step_sgd(x, g, state, lr=0.01):
+    return x - lr * g, state
+def _step_momentum(x, g, state, lr=0.01, beta=0.9):
+    v = state.get("v", np.zeros_like(x))
+    v = beta * v - lr * g
+    state["v"] = v
+    return x + v, state
+def _step_adam(x, g, state, lr=0.001, b1=0.9, b2=0.999, eps=1e-8):
+    m = state.get("m", np.zeros_like(x))
+    v = state.get("v", np.zeros_like(x))
+    t = state.get("t", 0) + 1
+    m = b1 * m + (1 - b1) * g
+    v = b2 * v + (1 - b2) * g**2
+    m_hat = m / (1 - b1**t)
+    v_hat = v / (1 - b2**t)
+    state["m"], state["v"], state["t"] = m, v, t
+    return x - lr * m_hat / (np.sqrt(v_hat) + eps), state
+def _run_adam_with_lr(f, grad, x0: np.ndarray, lr: float, steps: int) -> tuple[np.ndarray, float]:
+    """Run Adam for `steps` steps from x0 with the given lr. Returns (x_final, f_final).
+    Used by the LR-tuning sweep for the Adam baseline. Returns (x0, inf) on divergence.
+    """
+    x = x0.copy().astype(float)
+    state: dict = {}
+    for _ in range(steps):
+        g = np.asarray(grad(x), dtype=float)
+        x, state = _step_adam(x, g, state, lr=lr)
+        if not np.all(np.isfinite(x)):
+            return x0, float("inf")
+    return x, float(f(x))
+def tune_adam_lr(f, grad, x0: np.ndarray,
+                 lrs: tuple[float, ...] = (1e-4, 1e-3, 3e-3, 1e-2, 3e-2, 1e-1, 3e-1),
+                 sweep_steps: int = 30) -> float:
+    """Grid-search Adam's LR on a short run from x0. Returns the best LR.
+    Fair baseline for the env: the optimizer Qwen is compared against is
+    Adam-at-best-LR-for-this-landscape, not Adam-at-PyTorch-default.
+    """
+    best_lr = lrs[0]
+    best_f = float("inf")
+    for lr in lrs:
+        _, f_final = _run_adam_with_lr(f, grad, x0, lr=lr, steps=sweep_steps)
+        if f_final < best_f:
+            best_f = f_final
+            best_lr = lr
+    return best_lr
+def _step_lbfgs(x, g, state, lr=0.01, m_size=5):
+    """Crude L-BFGS with finite-step history. Good enough as a reference."""
+    xs = state.setdefault("xs", [])     # positions
+    gs = state.setdefault("gs", [])     # gradients
+    if len(xs) < 2:
+        # First step: plain gradient descent to seed history
+        x_new = x - lr * g
+    else:
+        # Two-loop recursion over last m_size pairs
+        s_list, y_list, rho_list = [], [], []
+        for i in range(1, min(m_size, len(xs)) + 1):
+            s = xs[-i] - xs[-i - 1] if len(xs) > i else None
+            if s is None:
+                continue
+            y = gs[-i] - gs[-i - 1]
+            denom = float(y @ s)
+            if abs(denom) < 1e-12:
+                continue
+            s_list.append(s); y_list.append(y); rho_list.append(1.0 / denom)
+        q = g.copy()
+        alpha = []
+        for s, y, rho in zip(s_list, y_list, rho_list):
+            a = rho * float(s @ q)
+            alpha.append(a)
+            q = q - a * y
+        # H0 scaling
+        if y_list:
+            y0 = y_list[0]; s0 = s_list[0]
+            gamma = float(s0 @ y0) / (float(y0 @ y0) + 1e-12)
+        else:
+            gamma = 1.0
+        r = gamma * q
+        for (s, y, rho), a in zip(reversed(list(zip(s_list, y_list, rho_list))), reversed(alpha)):
+            b = rho * float(y @ r)
+            r = r + (a - b) * s
+        x_new = x - lr * r
+    xs.append(x.copy())
+    gs.append(g.copy())
+    return x_new, state
+BASELINES: dict[str, Callable] = {
+    "sgd": _step_sgd,
+    "momentum": _step_momentum,
+    "adam": _step_adam,
+    "lbfgs": _step_lbfgs,
+}
+def run_baseline(name: str, f, grad, x0: np.ndarray, steps: int = 30) -> dict:
+    """Run a reference optimizer from x0 for `steps` steps.
+    Returns a trajectory dict with per-step (x, f, |g|).
+    """
+    if name not in BASELINES:
+        raise ValueError(f"Unknown baseline {name!r}")
+    step_fn = BASELINES[name]
+    x = x0.copy().astype(float)
+    state: dict = {}
+    traj = []
+    for t in range(steps):
+        fv = float(f(x))
+        g = np.asarray(grad(x), dtype=float)
+        gn = float(np.linalg.norm(g))
+        traj.append({"t": t, "x": x.tolist(), "f": fv, "grad_norm": gn})
+        x, state = step_fn(x, g, state)
+        if not np.all(np.isfinite(x)):
+            # Pad with the last finite state; record divergence
+            traj.append({"t": t + 1, "x": None, "f": None, "grad_norm": None,
+                         "diverged": True})
+            break
+    # Final state
+    if np.all(np.isfinite(x)):
+        traj.append({"t": len(traj), "x": x.tolist(), "f": float(f(x)),
+                     "grad_norm": float(np.linalg.norm(np.asarray(grad(x))))})
+    return {"name": name, "trajectory": traj, "final_x": x.tolist()}

rewards.py ADDED Viewed

	@@ -0,0 +1,183 @@

+"""Reward computation for OptCoder and LandscapeForge.
+Matches §9 of LANDSCAPEFORGE_DESIGN.md (v0.2).
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+import numpy as np
+from .arena import ArenaResult
+# Default weights (§9.1)
+W_REGRET = 1.0
+W_CONV = 0.3
+W_ROBUST = 0.3
+W_NOVELTY = 0.1
+W_BUDGET = 0.05
+W_EVAL_FAIL = 0.5
+NOVELTY_GATE = 0.5    # novelty only applied when r_regret > this
+@dataclass
+class OptCoderReward:
+    r_total: float
+    breakdown: dict[str, float]
+def compute_optcoder_reward(
+    arena: ArenaResult,
+    adam_arena: ArenaResult,
+    actions_used_cost: int,     # sum of per-action costs, not count
+    budget_total: int,
+    novelty_score: float,       # AST edit distance / len, clamped to [0, 1]
+    convergence_step: int | None,
+    arena_steps: int,
+) -> OptCoderReward:
+    """Compute OptCoder's terminal reward (no `f_min` dependency).
+    r_regret is driven by Adam-relative descent:
+        my_progress  = mean(f_initial - f_final) across seeds for the draft
+        adam_progress = same for Adam
+        r_regret = clamp(my_progress / max(adam_progress, floor) - 1, -1, +1)
+    Interpretation:
+        r_regret = +1   → descended ≥ 2× as far as Adam
+        r_regret =  0   → matched Adam's descent
+        r_regret = -1   → descended ≤ 0 while Adam descended normally
+    """
+    my_progress = arena.mean_progress
+    adam_progress = adam_arena.mean_progress
+    # Denominator floor: if Adam barely descended (e.g. plateau landscape),
+    # use ~1% of initial |f| to avoid a tiny denominator exploding the ratio.
+    denom_floor = 0.01 * adam_arena.mean_initial_scale + 1e-6
+    denom = max(adam_progress, denom_floor)
+    r_regret_raw = my_progress / denom - 1.0
+    r_regret = float(np.clip(r_regret_raw, -1.0, 1.0))
+    # Convergence speed: 1 if hit 1% of initial f before N steps
+    if convergence_step is None or convergence_step >= arena_steps:
+        r_conv = 0.0
+    else:
+        r_conv = float(np.clip(1.0 - convergence_step / arena_steps, 0.0, 1.0))
+    r_robust = arena.robustness
+    # Novelty gated on regret performance
+    r_novelty = float(novelty_score) if r_regret > NOVELTY_GATE else 0.0
+    r_budget = float(np.clip(actions_used_cost / max(budget_total, 1), 0.0, 1.0))
+    r_eval_fail = arena.crash_fraction
+    total = (
+        W_REGRET * r_regret
+        + W_CONV * r_conv
+        + W_ROBUST * r_robust
+        + W_NOVELTY * r_novelty
+        - W_BUDGET * r_budget
+        - W_EVAL_FAIL * r_eval_fail
+    )
+    return OptCoderReward(
+        r_total=float(total),
+        breakdown={
+            "r_regret": r_regret,
+            "r_convergence": r_conv,
+            "r_robustness": r_robust,
+            "r_novelty": r_novelty,
+            "r_budget": r_budget,
+            "r_eval_failures": r_eval_fail,
+            "my_progress": float(my_progress),
+            "adam_progress": float(adam_progress),
+            "speedup_vs_adam": float(my_progress / denom),
+        },
+    )
+def ast_novelty_score(committed_source: str, reference_sources: list[str]) -> float:
+    """Coarse AST-diff score vs a set of reference optimizers.
+    Returns min over references of (edit distance / len of committed), clamped
+    to [0, 1]. Near-zero means heavy copy. For v1 we use a simple char-level
+    Levenshtein-ish ratio; AST diffing is deferred.
+    """
+    if not committed_source:
+        return 0.0
+    best = 1.0
+    for ref in reference_sources:
+        ratio = _diff_ratio(committed_source, ref)
+        if ratio < best:
+            best = ratio
+    return float(np.clip(best, 0.0, 1.0))
+def _diff_ratio(a: str, b: str) -> float:
+    """difflib-based ratio: 1 - similarity. Cheap, roughly AST-order-insensitive."""
+    import difflib
+    sim = difflib.SequenceMatcher(None, a, b).ratio()
+    return 1.0 - sim
+# ---------- Stepwise FEEDBACK (not reward) ----------
+# These signals are exposed to the LLM through the observation so it can
+# course-correct mid-episode. They are NOT summed into the training reward —
+# terminal arena reward is the only GRPO signal, to preserve robustness.
+COMPILE_PENALTY_SIGNAL = -0.1
+PHI_SCALE = 10.0    # normalizer for best_draft_f to keep potential in ~[-1, 0]
+def _draft_potential(draft_history: list[dict]) -> float:
+    """Potential: higher = better. No drafts yet → -1.0 (worst) so that the
+    first valid draft always emits a positive phi_delta proportional to its
+    quality. Clamped to [-1, 0].
+    """
+    finals = [
+        d["summary"]["final_f"]
+        for d in draft_history
+        if d.get("summary") and d["summary"].get("final_f") is not None
+    ]
+    if not finals:
+        return -1.0   # no valid draft yet → worst-case potential
+    best = min(finals)
+    normed = min(max(best, 0.0), PHI_SCALE) / PHI_SCALE
+    return -normed    # lower best_f → higher potential
+def compute_step_reward(prev_draft_history: list[dict],
+                        new_draft_history: list[dict],
+                        action_kind: str,
+                        action_result: dict) -> dict:
+    """Stepwise FEEDBACK signals for one REPL turn.
+    Despite the name (kept for API compatibility), this does NOT produce
+    training reward. The returned dict is structured for surfacing to the
+    LLM's next prompt as part of `last_action_result.feedback`:
+    - `phi_delta`: improvement in best-draft-so-far. Positive = the latest
+        action moved the best known draft closer to the minimum.
+    - `compile_penalty`: -0.1 marker indicating the last draft failed to
+        compile. Helps the agent notice structural bugs immediately.
+    Caller should NOT add the numeric values to the training reward.
+    """
+    breakdown: dict[str, float] = {}
+    phi_prev = _draft_potential(prev_draft_history)
+    phi_new = _draft_potential(new_draft_history)
+    phi_delta = phi_new - phi_prev
+    if abs(phi_delta) > 1e-9:
+        breakdown["phi_delta"] = float(phi_delta)
+    if action_kind == "draft" and action_result.get("compile_error"):
+        breakdown["compile_penalty"] = COMPILE_PENALTY_SIGNAL
+    # r_step retained for backwards compatibility but caller must ignore it
+    # for training purposes (see docstring).
+    r_step = float(sum(breakdown.values()))
+    return {"r_step": r_step, "breakdown": breakdown}

run_llm_episode.py ADDED Viewed

	@@ -0,0 +1,281 @@

+"""Drive one LandscapeForge episode with an LLM.
+Works with any OpenAI-compatible /v1/chat/completions endpoint:
+    # HuggingFace router (default)
+    HF_TOKEN=hf_xxx python -m landscapeforge.run_llm_episode
+    # Ollama (no key needed — base URL override)
+    API_BASE_URL=http://localhost:11434/v1 \\
+    MODEL_NAME=qwen2.5:3b \\
+    python -m landscapeforge.run_llm_episode
+    # Optional: pick a tier and seed
+    LF_TIER=T0 LF_SEED=42 python -m landscapeforge.run_llm_episode
+Prints every action the model takes, the environment's feedback, and the
+final commit result.
+"""
+from __future__ import annotations
+import datetime as _dt
+import json
+import os
+import sys
+import textwrap
+import time
+from pathlib import Path
+from typing import Any
+import requests
+try:
+    from .models import LandscapeforgeAction
+    from .prompts import build_prompt, parse_action
+    from .server.landscapeforge_environment import LandscapeforgeEnvironment
+except ImportError:   # pragma: no cover
+    sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+    from landscapeforge.models import LandscapeforgeAction
+    from landscapeforge.prompts import build_prompt, parse_action
+    from landscapeforge.server.landscapeforge_environment import LandscapeforgeEnvironment
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-7B-Instruct")
+TEMPERATURE = float(os.getenv("LF_TEMPERATURE", "0.6"))
+MAX_TOKENS = int(os.getenv("LF_MAX_TOKENS", "1200"))
+MAX_TURNS = int(os.getenv("LF_MAX_TURNS", "12"))
+TIER = os.getenv("LF_TIER", "T0")
+SEED = int(os.getenv("LF_SEED", "42"))
+LOG_DIR = Path(os.getenv("LF_LOG_DIR", "./episode_logs"))
+def call_llm(messages: list[dict]) -> str:
+    """Single chat completion call. Returns the assistant's content string."""
+    url = API_BASE_URL.rstrip("/") + "/chat/completions"
+    headers = {"Content-Type": "application/json"}
+    if API_KEY:
+        headers["Authorization"] = f"Bearer {API_KEY}"
+    payload = {
+        "model": MODEL_NAME,
+        "messages": messages,
+        "temperature": TEMPERATURE,
+        "max_tokens": MAX_TOKENS,
+        "stream": False,
+    }
+    r = requests.post(url, headers=headers, json=payload, timeout=180)
+    if r.status_code >= 400:
+        raise RuntimeError(f"LLM call failed [{r.status_code}]: {r.text[:400]}")
+    body = r.json()
+    return body["choices"][0]["message"]["content"]
+def pretty_action(action: LandscapeforgeAction) -> str:
+    if action.kind == "run_baseline":
+        return f"run_baseline(name={action.baseline_name!r})"
+    if action.kind == "draft":
+        code = (action.code or "").strip()
+        lines = code.splitlines()
+        header = f"draft ({len(action.code or '')} chars, {len(lines)} lines):"
+        # Indent the code block so it's clearly nested under the action line.
+        body = textwrap.indent(code, "    │ ")
+        return f"{header}\n{body}"
+    if action.kind == "inspect":
+        return f"inspect(draft_idx={action.draft_idx}, range=[{action.step_range_start},{action.step_range_end}])"
+    if action.kind == "commit":
+        return "commit()"
+    return str(action)
+def pretty_result(result: dict) -> str:
+    """Compact one-line summary of the env's step result."""
+    keys_of_interest = (
+        "baseline_index", "name", "n_steps", "final_f",
+        "draft_index", "compile_error",
+        "step_range", "feedback",
+        "reason", "mean_regret", "crash_fraction",
+        "novelty_score", "convergence_step",
+    )
+    parts = []
+    for k in keys_of_interest:
+        if k in result and result[k] is not None:
+            if k == "feedback":
+                parts.append(f"feedback={result[k]}")
+            elif isinstance(result[k], float):
+                parts.append(f"{k}={result[k]:.4g}")
+            elif isinstance(result[k], dict):
+                parts.append(f"{k}=<dict>")
+            else:
+                parts.append(f"{k}={result[k]}")
+    return ", ".join(parts) or "ok"
+def _new_log_files() -> tuple[Path, Path]:
+    """Create a timestamped .jsonl (structured) + .md (human-readable) pair."""
+    LOG_DIR.mkdir(parents=True, exist_ok=True)
+    ts = _dt.datetime.now().strftime("%Y%m%d-%H%M%S")
+    model_tag = MODEL_NAME.replace("/", "_").replace(":", "_")
+    stem = f"{ts}_{model_tag}_seed{SEED}"
+    return LOG_DIR / f"{stem}.jsonl", LOG_DIR / f"{stem}.md"
+def _log_jsonl(path: Path, event: dict) -> None:
+    with path.open("a") as f:
+        f.write(json.dumps(event, default=str) + "\n")
+def _log_md(path: Path, text: str) -> None:
+    with path.open("a") as f:
+        f.write(text + "\n")
+def main() -> None:
+    jsonl_path, md_path = _new_log_files()
+    header = (
+        "=" * 78 + "\n"
+        "LandscapeForge — LLM episode runner\n"
+        + "=" * 78 + "\n"
+        f"Model:       {MODEL_NAME}\n"
+        f"Endpoint:    {API_BASE_URL}\n"
+        f"Auth:        {'Bearer token present' if API_KEY else 'none (local endpoint)'}\n"
+        f"Temperature: {TEMPERATURE}\n"
+        f"Tier:        {TIER}\n"
+        f"Seed:        {SEED}\n"
+        f"Log dir:     {LOG_DIR.resolve()}\n"
+        f"  jsonl:     {jsonl_path.name}\n"
+        f"  markdown:  {md_path.name}\n"
+    )
+    print(header)
+    _log_md(md_path, f"# Episode log — {MODEL_NAME} seed={SEED} tier={TIER}\n")
+    _log_md(md_path, f"```\n{header.strip()}\n```\n")
+    _log_jsonl(jsonl_path, {
+        "event": "episode_start",
+        "model": MODEL_NAME, "endpoint": API_BASE_URL,
+        "temperature": TEMPERATURE, "tier": TIER, "seed": SEED,
+        "timestamp": _dt.datetime.now().isoformat(),
+    })
+    env = LandscapeforgeEnvironment(tier=TIER, seed=SEED)
+    obs = env.reset()
+    reset_line = (
+        f"Landscape chosen: {obs.landscape_description}\n"
+        f"Dim: {obs.dim}\n"
+        f"Structural hints: {obs.structural_hints}\n"
+        f"Initial budget: {obs.budget_remaining}\n"
+    )
+    print(reset_line)
+    _log_md(md_path, "## Reset\n```\n" + reset_line + "```\n")
+    _log_jsonl(jsonl_path, {
+        "event": "reset",
+        "landscape_description": obs.landscape_description,
+        "dim": obs.dim,
+        "structural_hints": obs.structural_hints,
+        "budget_remaining": obs.budget_remaining,
+    })
+    for turn in range(1, MAX_TURNS + 1):
+        turn_header = f"─── turn {turn} ─────────────────────────────────────────────────────"
+        print(turn_header)
+        _log_md(md_path, f"\n## Turn {turn}\n")
+        messages = build_prompt(obs)
+        # Prompt is large, so log it to the files but not console.
+        _log_md(md_path, "### Prompt (user message)\n```\n"
+                         + messages[-1]["content"] + "\n```\n")
+        _log_jsonl(jsonl_path, {
+            "event": "prompt",
+            "turn": turn,
+            "messages": messages,
+        })
+        t0 = time.time()
+        try:
+            raw = call_llm(messages)
+        except Exception as e:
+            print(f"[LLM error] {e}")
+            _log_jsonl(jsonl_path, {"event": "llm_error", "turn": turn, "error": str(e)})
+            _log_md(md_path, f"### LLM error\n```\n{e}\n```\n")
+            break
+        dt = time.time() - t0
+        print(f"[LLM reply in {dt:.1f}s, {len(raw)} chars]")
+        _log_md(md_path, "### Raw LLM reply\n```\n" + raw + "\n```\n")
+        _log_jsonl(jsonl_path, {
+            "event": "llm_reply",
+            "turn": turn,
+            "latency_s": dt,
+            "raw": raw,
+        })
+        try:
+            action = parse_action(raw)
+        except Exception as e:
+            print(f"[parse error] {e}")
+            print("--- raw reply (first 400 chars) ---")
+            print(raw[:400])
+            print("-----------------------------------")
+            _log_jsonl(jsonl_path, {
+                "event": "parse_error", "turn": turn, "error": str(e),
+                "raw_first_400": raw[:400],
+            })
+            _log_md(md_path, f"### Parse error\n```\n{e}\n```\n")
+            break
+        pretty = pretty_action(action)
+        print(f"action: {pretty}")
+        _log_md(md_path, "### Action\n```\n" + pretty + "\n```\n")
+        _log_jsonl(jsonl_path, {
+            "event": "action", "turn": turn,
+            "action": action.model_dump(exclude_none=True),
+        })
+        obs = env.step(action)
+        result_line = pretty_result(obs.last_action_result)
+        print(f"  → {result_line}")
+        print(f"  budget remaining: {obs.budget_remaining}")
+        _log_md(md_path, "### Step result\n```\n"
+                         + f"→ {result_line}\nbudget_remaining={obs.budget_remaining}\n```\n")
+        _log_jsonl(jsonl_path, {
+            "event": "step_result", "turn": turn,
+            "last_action_result": obs.last_action_result,
+            "budget_remaining": obs.budget_remaining,
+            "done": obs.done,
+            "reward_so_far": obs.reward,
+        })
+        if obs.done:
+            final_block = (
+                "═══ EPISODE DONE ═══\n"
+                f"  reason          : {obs.last_action_result.get('reason')}\n"
+                f"  final_regret    : {obs.final_regret}\n"
+                f"  terminal reward : {obs.r_optcoder}\n"
+                f"  breakdown       : {obs.r_optcoder_breakdown}\n"
+            )
+            print()
+            print(final_block)
+            _log_md(md_path, "\n## Episode done\n```\n" + final_block + "```\n")
+            _log_jsonl(jsonl_path, {
+                "event": "episode_done",
+                "reason": obs.last_action_result.get("reason"),
+                "final_regret": obs.final_regret,
+                "r_optcoder": obs.r_optcoder,
+                "r_optcoder_breakdown": obs.r_optcoder_breakdown,
+                "last_action_result": obs.last_action_result,
+            })
+            print(f"\n[logged] full transcript at:\n  {jsonl_path}\n  {md_path}")
+            return
+    print("\n[!] Reached MAX_TURNS without commit — agent never committed.")
+    _log_jsonl(jsonl_path, {"event": "max_turns_reached", "max_turns": MAX_TURNS})
+    _log_md(md_path, f"\n[!] Reached MAX_TURNS ({MAX_TURNS}) without commit.\n")
+    print(f"\n[logged] transcript at:\n  {jsonl_path}\n  {md_path}")
+if __name__ == "__main__":
+    main()

sandbox.py ADDED Viewed

	@@ -0,0 +1,160 @@

+"""Sandbox for executing OptCoder-submitted optimizer code.
+In-process exec with:
+- AST strip of module-level demo code (keeps only imports + `class Optimizer`)
+- Restricted globals (only `np` and `math` exposed)
+- Signal-based 1-second timeout on instantiation and each step call
+"""
+from __future__ import annotations
+import ast
+import math
+import signal
+from dataclasses import dataclass
+from typing import Any
+import numpy as np
+class SandboxError(Exception):
+    """Raised for any sandbox-level failure (syntax, timeout, security)."""
+class StepTimeout(SandboxError):
+    pass
+def _signal_timeout(seconds: float):
+    """Context manager using SIGALRM to bound execution time."""
+    from contextlib import contextmanager
+    @contextmanager
+    def _cm():
+        def handler(signum, frame):
+            raise StepTimeout(f"exceeded {seconds}s")
+        old = signal.signal(signal.SIGALRM, handler)
+        # setitimer supports sub-second timers (signal.alarm only takes ints)
+        signal.setitimer(signal.ITIMER_REAL, seconds)
+        try:
+            yield
+        finally:
+            signal.setitimer(signal.ITIMER_REAL, 0)
+            signal.signal(signal.SIGALRM, old)
+    return _cm()
+def strip_module_code(source: str) -> str:
+    """Keep only the `class Optimizer` node.
+    Drops imports (the sandbox pre-injects np/numpy/math into globals),
+    hallucinated demo functions, `if __name__ == '__main__'` blocks, and
+    trailing execution code that frequently appears in LLM output.
+    """
+    try:
+        tree = ast.parse(source)
+    except SyntaxError as e:
+        raise SandboxError(f"SyntaxError: {e}") from e
+    kept: list[ast.stmt] = []
+    found_class = False
+    for node in tree.body:
+        if isinstance(node, ast.ClassDef) and node.name == "Optimizer":
+            kept.append(node)
+            found_class = True
+        # Imports are dropped — env provides np/numpy/math via globals.
+    if not found_class:
+        raise SandboxError("No `class Optimizer` found in submission")
+    new_tree = ast.Module(body=kept, type_ignores=[])
+    ast.fix_missing_locations(new_tree)
+    return ast.unparse(new_tree)
+def _safe_globals() -> dict:
+    """Globals exposed to submitted code. Minimal builtins + np/numpy/math."""
+    import builtins as _bi
+    safe_names = [
+        # numeric / iteration
+        "abs", "min", "max", "sum", "len", "range", "zip", "enumerate",
+        "list", "tuple", "dict", "set", "float", "int", "bool", "str",
+        "round", "divmod", "pow", "reversed", "sorted", "any", "all", "map", "filter",
+        # introspection (safe subset)
+        "isinstance", "issubclass", "hasattr", "getattr", "setattr",
+        "True", "False", "None",
+        # class definition machinery (required to define `class Optimizer`)
+        "__build_class__", "__name__", "object", "super",
+        "type", "property", "staticmethod", "classmethod",
+        # errors (so submitted code can raise/catch sanely)
+        "Exception", "ValueError", "TypeError", "IndexError", "KeyError",
+        "ZeroDivisionError", "RuntimeError", "ArithmeticError", "OverflowError",
+    ]
+    safe_bi = {n: getattr(_bi, n) for n in safe_names if hasattr(_bi, n)}
+    return {
+        "__builtins__": safe_bi,
+        "__name__": "__submission__",
+        "np": np,
+        "numpy": np,
+        "math": math,
+    }
+@dataclass
+class CompiledOptimizer:
+    """Wraps an instantiated Optimizer with bounded `step` execution."""
+    instance: Any
+    step_timeout: float = 0.5
+    def step(self, x: np.ndarray, f_val: float, grad: np.ndarray) -> np.ndarray:
+        with _signal_timeout(self.step_timeout):
+            try:
+                out = self.instance.step(x, f_val, grad)
+            except StepTimeout:
+                raise
+            except Exception as e:
+                raise SandboxError(f"step() raised {type(e).__name__}: {e}") from e
+        try:
+            out = np.asarray(out, dtype=float)
+        except Exception as e:
+            raise SandboxError(f"step() returned non-array value ({type(e).__name__}: {e})") from e
+        if out.shape != x.shape:
+            raise SandboxError(f"step() returned shape {out.shape}, expected {x.shape}")
+        if not np.all(np.isfinite(out)):
+            raise SandboxError("step() returned non-finite values")
+        return out
+def compile_optimizer(source: str, dim: int, init_timeout: float = 1.0,
+                      step_timeout: float = 0.5) -> CompiledOptimizer:
+    """Strip, exec, and instantiate Optimizer(dim=dim). Returns a wrapper."""
+    stripped = strip_module_code(source)
+    globs = _safe_globals()
+    locs: dict = {}
+    try:
+        with _signal_timeout(init_timeout):
+            exec(compile(stripped, "<submission>", "exec"), globs, locs)
+    except SandboxError:
+        raise
+    except Exception as e:
+        raise SandboxError(f"exec failed: {type(e).__name__}: {e}") from e
+    OptimizerCls = locs.get("Optimizer") or globs.get("Optimizer")
+    if OptimizerCls is None:
+        raise SandboxError("Optimizer class not defined after exec")
+    try:
+        with _signal_timeout(init_timeout):
+            instance = OptimizerCls(dim=dim)
+    except Exception as e:
+        raise SandboxError(f"__init__ failed: {type(e).__name__}: {e}") from e
+    if not hasattr(instance, "step"):
+        raise SandboxError("Optimizer instance missing `step` method")
+    return CompiledOptimizer(instance=instance, step_timeout=step_timeout)

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Landscapeforge environment server components."""
+from .landscapeforge_environment import LandscapeforgeEnvironment
+__all__ = ["LandscapeforgeEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,90 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Landscapeforge Environment.
+This module creates an HTTP server that exposes the LandscapeforgeEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import LandscapeforgeAction, LandscapeforgeObservation
+    from .landscapeforge_environment import LandscapeforgeEnvironment
+    from ..demo.ui import build_ui as _build_demo_ui
+except ModuleNotFoundError:
+    from models import LandscapeforgeAction, LandscapeforgeObservation
+    from server.landscapeforge_environment import LandscapeforgeEnvironment
+    from demo.ui import build_ui as _build_demo_ui
+# Create the core FastAPI app (without OpenEnv's built-in web UI, which has a
+# theme-kwarg incompatibility with Gradio 5.x). We mount our custom Gradio
+# demo manually at /web below.
+app = create_app(
+    LandscapeforgeEnvironment,
+    LandscapeforgeAction,
+    LandscapeforgeObservation,
+    env_name="landscapeforge",
+    max_concurrent_envs=4,
+)
+# Mount Gradio demo at /web
+try:
+    import gradio as gr
+    _demo = _build_demo_ui()
+    app = gr.mount_gradio_app(app, _demo, path="/web")
+except Exception as _e:  # pragma: no cover
+    import logging
+    logging.getLogger(__name__).warning(
+        "Gradio demo failed to mount (%s); FastAPI endpoints still available.", _e,
+    )
+def main():
+    """Entry point for direct execution.
+    Parses --host / --port from the command line (also honours $PORT),
+    defaulting to 0.0.0.0:8000 for container-friendly launches.
+    """
+    import argparse
+    import os
+    import uvicorn
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int,
+                        default=int(os.environ.get("PORT", 8000)))
+    parser.add_argument("--host", type=str, default="0.0.0.0")
+    args = parser.parse_args()
+    uvicorn.run(app, host=args.host, port=args.port)
+if __name__ == "__main__":
+    main()

server/landscapeforge_environment.py ADDED Viewed

	@@ -0,0 +1,513 @@

+"""LandscapeForge OpenEnv environment — OptCoder REPL (Phase C).
+For v1 we ship OptCoder-only: LandscapeForge is a fixed template picker
+controlled by the env itself (uniform random over the tier menu). The agent
+acting through OpenEnv is OptCoder.
+Each `reset()` samples a new landscape from the current tier. Each `step()`
+executes one OptCoder action (run_baseline / draft / inspect / commit),
+mutates env state, and returns an Observation reflecting the new state.
+Episode ends when OptCoder commits or budget is exhausted.
+"""
+from __future__ import annotations
+from typing import Any, Optional
+from uuid import uuid4
+import numpy as np
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import (
+        ACTION_COSTS,
+        LandscapeforgeAction,
+        LandscapeforgeObservation,
+    )
+    from ..landscapes import (
+        TIER_MENU,
+        Landscape,
+        build_landscape,
+        structural_hints,
+    )
+    from ..reference_optimizers import run_baseline as run_reference_baseline
+    from ..reference_optimizers import tune_adam_lr
+    from ..sandbox import SandboxError, compile_optimizer
+    from ..arena import ArenaResult, auto_test_draft, run_arena
+    from ..rewards import ast_novelty_score, compute_optcoder_reward, compute_step_reward
+except ImportError:
+    # Running from repo root or package layout quirks
+    from models import (                                    # type: ignore
+        ACTION_COSTS,
+        LandscapeforgeAction,
+        LandscapeforgeObservation,
+    )
+    from landscapes import (                                 # type: ignore
+        TIER_MENU,
+        Landscape,
+        build_landscape,
+        structural_hints,
+    )
+    from reference_optimizers import run_baseline as run_reference_baseline  # type: ignore
+    from reference_optimizers import tune_adam_lr  # type: ignore
+    from sandbox import SandboxError, compile_optimizer     # type: ignore
+    from arena import ArenaResult, auto_test_draft, run_arena  # type: ignore
+    from rewards import ast_novelty_score, compute_optcoder_reward, compute_step_reward  # type: ignore
+BUDGET_TOTAL = 12
+ARENA_SEEDS = [101, 202, 303, 404, 505, 606, 707, 808, 909, 1010]
+ARENA_STEPS = 200
+BASELINE_STEPS = 30    # env-controlled; agent does not choose
+# Reference source blobs for AST novelty comparison (short pseudo-implementations).
+# Kept minimal — enough to detect "this commit is basically Adam".
+_REF_SGD = """
+class Optimizer:
+    def __init__(self, dim): self.lr = 0.01
+    def step(self, x, f, g): return x - self.lr * g
+""".strip()
+def _adam_source(lr: float) -> str:
+    """Adam reference implementation parameterized by LR.
+    Used by `_ensure_adam_arena` after LR tuning — the baseline is
+    Adam-at-best-LR-for-this-landscape, not Adam-at-fixed-default.
+    """
+    return f"""
+class Optimizer:
+    def __init__(self, dim):
+        self.lr = {lr}
+        self.b1 = 0.9
+        self.b2 = 0.999
+        self.eps = 1e-8
+        self.m = np.zeros(dim)
+        self.v = np.zeros(dim)
+        self.t = 0
+    def step(self, x, f_val, g):
+        self.t += 1
+        self.m = self.b1*self.m + (1-self.b1)*g
+        self.v = self.b2*self.v + (1-self.b2)*g*g
+        mh = self.m/(1-self.b1**self.t)
+        vh = self.v/(1-self.b2**self.t)
+        return x - self.lr * mh / (np.sqrt(vh) + self.eps)
+""".strip()
+# Frozen default-LR source used only for AST-novelty comparison (so r_novelty
+# measures "structurally different from Adam" regardless of the tuned LR).
+_REF_ADAM = _adam_source(0.001)
+_REF_MOMENTUM = """
+class Optimizer:
+    def __init__(self, dim):
+        import numpy as np
+        self.lr=0.01; self.beta=0.9; self.v = np.zeros(dim)
+    def step(self, x, f, g):
+        self.v = self.beta*self.v - self.lr*g
+        return x + self.v
+""".strip()
+REFERENCE_SOURCES = [_REF_SGD, _REF_ADAM, _REF_MOMENTUM]
+class LandscapeforgeEnvironment(Environment):
+    """OptCoder-facing OpenEnv environment.
+    LandscapeForge is internal (template picker) in v1.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self, tier: str = "T0", seed: int = 0):
+        self._initial_tier = tier
+        self._master_rng = np.random.default_rng(seed)
+        self._reset_count = 0
+        self._tier = tier
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        # Populated by reset()
+        self._landscape: Optional[Landscape] = None
+        self._hints: dict = {}
+        self._baseline_history: list[dict] = []
+        self._draft_history: list[dict] = []
+        self._draft_details: list[list[dict]] = []     # per-draft per-step detail
+        self._inspect_requests: list[dict] = []
+        self._current_draft: Optional[str] = None
+        self._budget_spent: int = 0
+        self._committed: bool = False
+        self._final_obs: Optional[LandscapeforgeObservation] = None
+        # Cache Adam's full arena result per episode (computed lazily, for
+        # reward normalization via progress-based r_regret). The baseline is
+        # Adam-at-tuned-LR — per-landscape LR is selected via a short sweep.
+        self._adam_arena_cache: Optional[ArenaResult] = None
+        self._adam_tuned_lr: Optional[float] = None
+        # Stepwise feedback log (PBS delta + compile penalty). This is shown to
+        # the LLM in the observation so it can course-correct mid-episode, but
+        # NEVER added to the training scalar — final reward is purely terminal
+        # arena reward (§9.1) for robustness against reward hacking.
+        self._step_feedback_log: list[dict] = []
+    # ---------- OpenEnv API ----------
+    def reset(self) -> LandscapeforgeObservation:
+        self._reset_count += 1
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        # Pick a landscape from the current tier's menu.
+        menu = TIER_MENU[self._tier]
+        template = str(self._master_rng.choice(menu))
+        dim = int(self._master_rng.integers(2, 6))   # small dims for v1
+        params = self._sample_params(template)
+        self._landscape = build_landscape(
+            template=template, dim=dim, params=params,
+            rng=np.random.default_rng(int(self._master_rng.integers(0, 2**31))),
+        )
+        self._hints = structural_hints(
+            self._landscape,
+            rng=np.random.default_rng(int(self._master_rng.integers(0, 2**31))),
+        )
+        # Wipe REPL state
+        self._baseline_history = []
+        self._draft_history = []
+        self._draft_details = []
+        self._inspect_requests = []
+        self._current_draft = None
+        self._budget_spent = 0
+        self._committed = False
+        self._final_obs = None
+        self._adam_arena_cache = None
+        self._adam_tuned_lr = None
+        self._step_feedback_log = []
+        return self._make_observation(
+            last_kind=None, last_result={"reset": True}, done=False, reward=0.0,
+        )
+    def step(self, action: LandscapeforgeAction) -> LandscapeforgeObservation:  # type: ignore[override]
+        if self._landscape is None:
+            raise RuntimeError("step() called before reset()")
+        if self._committed:
+            # Episode already done; return terminal obs.
+            assert self._final_obs is not None
+            return self._final_obs
+        self._state.step_count += 1
+        cost = ACTION_COSTS[action.kind]
+        # Charge budget first so over-limit actions are rejected.
+        if self._budget_spent + cost > BUDGET_TOTAL and action.kind != "commit":
+            return self._force_commit(reason="budget_exhausted")
+        self._budget_spent += cost
+        # Snapshot draft history for PBS computation
+        prev_draft_history_snapshot = list(self._draft_history)
+        if action.kind == "run_baseline":
+            result = self._do_run_baseline(action)
+        elif action.kind == "draft":
+            result = self._do_draft(action)
+        elif action.kind == "inspect":
+            result = self._do_inspect(action)
+        elif action.kind == "commit":
+            return self._do_commit()
+        else:
+            raise ValueError(f"Unknown action kind: {action.kind}")
+        # Compute stepwise FEEDBACK (NOT reward). Signals the LLM can use to
+        # course-correct mid-episode — exposed through last_action_result.
+        # Explicitly NOT summed into training reward; terminal arena reward
+        # is the only signal GRPO sees (robust against reward hacking).
+        step_feedback = compute_step_reward(
+            prev_draft_history=prev_draft_history_snapshot,
+            new_draft_history=self._draft_history,
+            action_kind=action.kind,
+            action_result=result,
+        )
+        if step_feedback["breakdown"]:
+            entry = {
+                "turn": self._state.step_count,
+                "action_kind": action.kind,
+                **step_feedback["breakdown"],
+            }
+            self._step_feedback_log.append(entry)
+            # Surface on this turn's action result so the LLM sees it immediately.
+            result = {**result, "feedback": step_feedback["breakdown"]}
+        # Check if budget now exhausted; if so, auto-commit.
+        if self._budget_spent >= BUDGET_TOTAL:
+            return self._force_commit(reason="budget_exhausted")
+        return self._make_observation(
+            last_kind=action.kind, last_result=result,
+            done=False, reward=0.0,    # no reward on non-terminal steps
+        )
+    @property
+    def state(self) -> State:
+        return self._state
+    # ---------- Action handlers ----------
+    def _do_run_baseline(self, action: LandscapeforgeAction) -> dict:
+        assert self._landscape is not None
+        # Fixed init AND fixed step count for baseline comparability across
+        # episodes and rollouts (important for GRPO group-relative advantages).
+        rng = np.random.default_rng(42)
+        x0 = rng.normal(0.0, 0.5, size=self._landscape.dim)
+        result = run_reference_baseline(
+            name=action.baseline_name, f=self._landscape.f, grad=self._landscape.grad,
+            x0=x0, steps=BASELINE_STEPS,
+        )
+        self._baseline_history.append(result)
+        return {
+            "baseline_index": len(self._baseline_history) - 1,
+            "name": result["name"],
+            "n_steps": len(result["trajectory"]),
+            "final_f": (result["trajectory"][-1]["f"]
+                        if result["trajectory"] and result["trajectory"][-1]["f"] is not None
+                        else None),
+        }
+    def _do_draft(self, action: LandscapeforgeAction) -> dict:
+        assert self._landscape is not None
+        code = action.code or ""
+        self._current_draft = code
+        try:
+            opt = compile_optimizer(code, dim=self._landscape.dim)
+        except SandboxError as e:
+            # Record failed draft; still counts toward history for inspect.
+            self._draft_history.append({
+                "code": code,
+                "compile_error": str(e),
+                "summary": {"converged": False, "diverged": True, "error": str(e),
+                            "final_f": None, "step_of_min": None, "min_f": None},
+            })
+            self._draft_details.append([])
+            return {"draft_index": len(self._draft_history) - 1,
+                    "compile_error": str(e), "summary": None}
+        test = auto_test_draft(opt, self._landscape, seed=0, steps=20)
+        self._draft_history.append({
+            "code": code,
+            "compile_error": None,
+            "summary": test["summary"],
+        })
+        self._draft_details.append(test["detail"])
+        return {"draft_index": len(self._draft_history) - 1,
+                "compile_error": None, "summary": test["summary"]}
+    def _do_inspect(self, action: LandscapeforgeAction) -> dict:
+        idx = action.draft_idx
+        if idx is None or idx < 0 or idx >= len(self._draft_details):
+            return {"error": f"draft_idx {idx} out of range (have {len(self._draft_details)} drafts)"}
+        detail = self._draft_details[idx]
+        start = action.step_range_start
+        end = min(action.step_range_end, len(detail))
+        sliced = detail[start:end]
+        record = {
+            "draft_idx": idx,
+            "step_range": [start, end],
+            "detail": sliced,
+        }
+        self._inspect_requests.append(record)
+        return {"draft_idx": idx, "step_range": [start, end], "n_steps": len(sliced)}
+    def _do_commit(self) -> LandscapeforgeObservation:
+        return self._finalize_episode(reason="commit")
+    def _force_commit(self, reason: str) -> LandscapeforgeObservation:
+        return self._finalize_episode(reason=reason)
+    # ---------- Episode finalization ----------
+    def _finalize_episode(self, reason: str) -> LandscapeforgeObservation:
+        assert self._landscape is not None
+        self._committed = True
+        # Need a current_draft. If none, produce a worst-case result.
+        if not self._current_draft:
+            result = {
+                "reason": reason,
+                "no_draft": True,
+                "final_regret": 1.0,
+            }
+            r_total = -1.0
+            breakdown = {"no_draft": 1.0}
+            obs = self._make_observation(
+                last_kind="commit", last_result=result,
+                done=True, reward=r_total,
+            )
+            obs.committed = True
+            obs.final_regret = 1.0
+            obs.r_optcoder = r_total
+            obs.r_optcoder_breakdown = breakdown
+            self._final_obs = obs
+            return obs
+        # Full Phase-D arena eval
+        try:
+            opt = compile_optimizer(self._current_draft, dim=self._landscape.dim)
+            arena = run_arena(opt, self._landscape, seeds=ARENA_SEEDS, steps=ARENA_STEPS)
+        except SandboxError as e:
+            # Committed code fails to compile -> worst-case result
+            arena = ArenaResult(
+                initial_values=[1.0] * len(ARENA_SEEDS),
+                final_values=[float("nan")] * len(ARENA_SEEDS),
+                crashed=[True] * len(ARENA_SEEDS),
+                trajectories=[[] for _ in ARENA_SEEDS],
+            )
+        # Adam baseline arena for normalization (always run for reward stability).
+        adam_arena = self._ensure_adam_arena()
+        novelty = ast_novelty_score(self._current_draft, REFERENCE_SOURCES)
+        # Convergence step: first seed's trajectory, first step where f < 0.01 * f0
+        convergence_step = self._compute_convergence_step(arena)
+        reward = compute_optcoder_reward(
+            arena=arena,
+            adam_arena=adam_arena,
+            actions_used_cost=self._budget_spent,
+            budget_total=BUDGET_TOTAL,
+            novelty_score=novelty,
+            convergence_step=convergence_step,
+            arena_steps=ARENA_STEPS,
+        )
+        result = {
+            "reason": reason,
+            "my_mean_progress": arena.mean_progress,
+            "adam_mean_progress": adam_arena.mean_progress,
+            "adam_tuned_lr": self._adam_tuned_lr,
+            "speedup_vs_adam": reward.breakdown.get("speedup_vs_adam"),
+            "crash_fraction": arena.crash_fraction,
+            "novelty_score": novelty,
+            "convergence_step": convergence_step,
+        }
+        obs = self._make_observation(
+            last_kind="commit", last_result=result,
+            done=True, reward=reward.r_total,
+        )
+        obs.committed = True
+        # `final_regret` is reinterpreted (no f_min dependency): Adam-shortfall
+        # in [0, 1]. 0 = matched or beat Adam's descent; 1 = made zero progress
+        # while Adam descended normally. Capped at 1.
+        speedup = reward.breakdown.get("speedup_vs_adam", 0.0)
+        obs.final_regret = float(max(0.0, min(1.0, 1.0 - speedup)))
+        obs.r_optcoder = reward.r_total
+        obs.r_optcoder_breakdown = reward.breakdown
+        self._final_obs = obs
+        return obs
+    # ---------- Helpers ----------
+    def _make_observation(self, last_kind: Optional[str], last_result: dict,
+                          done: bool, reward: float) -> LandscapeforgeObservation:
+        assert self._landscape is not None
+        return LandscapeforgeObservation(
+            landscape_description=self._landscape.description,
+            dim=self._landscape.dim,
+            structural_hints=self._hints,
+            baseline_history=self._serialize_baseline_history(),
+            draft_history=self._serialize_draft_history(),
+            inspect_requests=list(self._inspect_requests),
+            current_draft=self._current_draft,
+            budget_remaining=BUDGET_TOTAL - self._budget_spent,
+            last_action_kind=last_kind,
+            last_action_result=last_result,
+            done=done,
+            reward=reward,
+        )
+    def _serialize_baseline_history(self) -> list[dict]:
+        # Trim trajectory to summary-friendly size (every step, x as list).
+        return [
+            {"name": b["name"], "trajectory": b["trajectory"]}
+            for b in self._baseline_history
+        ]
+    def _serialize_draft_history(self) -> list[dict]:
+        # For the observation we include code + summary per draft.
+        return [
+            {"code": d["code"], "summary": d["summary"], "compile_error": d["compile_error"]}
+            for d in self._draft_history
+        ]
+    def _sample_params(self, template: str) -> dict:
+        rng = self._master_rng
+        if template == "quadratic":
+            # T0 uses cond up to 100; T1 up to 1000; T2 higher.
+            cap = {"T0": 100.0, "T1": 1000.0, "T2": 10_000.0}[self._tier]
+            return {"cond": float(rng.uniform(1.0, cap))}
+        if template == "gaussian_mix":
+            return {
+                "k": int(rng.integers(2, 6)),
+                "sigma": float(rng.uniform(0.3, 1.0)),
+                "spread": float(rng.uniform(1.0, 4.0)),
+            }
+        if template == "huber":
+            return {"delta": float(rng.uniform(0.5, 2.0))}
+        return {}
+    def _ensure_adam_arena(self) -> ArenaResult:
+        """Build the Adam baseline, FAIRLY — LR is tuned per landscape before
+        running the arena. The tuning uses a short 30-step sweep on a dedicated
+        seed (not one of the arena seeds) to avoid overfitting.
+        Cached per episode in `_adam_arena_cache`. Tuned LR is stored in
+        `_adam_tuned_lr` for logging / demo surfacing.
+        """
+        if self._adam_arena_cache is not None:
+            return self._adam_arena_cache
+        assert self._landscape is not None
+        try:
+            # Tune LR on seed 0 (not in ARENA_SEEDS), 30-step sweep.
+            tune_rng = np.random.default_rng(0)
+            tune_x0 = tune_rng.normal(0.0, 0.5, size=self._landscape.dim)
+            best_lr = tune_adam_lr(
+                f=self._landscape.f, grad=self._landscape.grad,
+                x0=tune_x0, sweep_steps=30,
+            )
+            self._adam_tuned_lr = best_lr
+            adam_opt = compile_optimizer(_adam_source(best_lr), dim=self._landscape.dim)
+            self._adam_arena_cache = run_arena(
+                adam_opt, self._landscape,
+                seeds=ARENA_SEEDS, steps=ARENA_STEPS,
+            )
+        except Exception:
+            self._adam_tuned_lr = None
+            self._adam_arena_cache = ArenaResult(
+                initial_values=[1.0] * len(ARENA_SEEDS),
+                final_values=[1.0] * len(ARENA_SEEDS),
+                crashed=[True] * len(ARENA_SEEDS),
+                trajectories=[[] for _ in ARENA_SEEDS],
+            )
+        return self._adam_arena_cache
+    def _compute_convergence_step(self, arena) -> Optional[int]:
+        """First step on first seed where f < 1% of initial f."""
+        if not arena.trajectories or not arena.trajectories[0]:
+            return None
+        traj = arena.trajectories[0]
+        if not traj:
+            return None
+        f0 = traj[0]["f"]
+        if f0 <= 0:
+            return None
+        threshold = 0.01 * f0
+        for t, snap in enumerate(traj):
+            if snap["f"] < threshold:
+                return t
+        return None
+    # ---------- Tier advancement API (used by trainer, not agent) ----------
+    def advance_tier(self, new_tier: str) -> None:
+        if new_tier not in TIER_MENU:
+            raise ValueError(f"Unknown tier {new_tier}")
+        self._tier = new_tier

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

tests/__init__.py ADDED Viewed

File without changes

tests/test_episode.py ADDED Viewed

	@@ -0,0 +1,150 @@

+"""End-to-end smoke test: scripted episode, in-process, no server.
+Runs: run_baseline(adam) -> draft(Adam-ish) -> inspect -> draft(SGD+momentum)
+      -> commit, and verifies the env threads state correctly and produces a
+      finite reward.
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+# Allow running directly: `python tests/test_episode.py`
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
+from landscapeforge.models import LandscapeforgeAction              # type: ignore
+from landscapeforge.server.landscapeforge_environment import (       # type: ignore
+    LandscapeforgeEnvironment,
+)
+ADAM_CODE = """
+import numpy as np
+class Optimizer:
+    def __init__(self, dim):
+        self.lr = 1e-3
+        self.b1 = 0.9
+        self.b2 = 0.999
+        self.eps = 1e-8
+        self.m = np.zeros(dim)
+        self.v = np.zeros(dim)
+        self.t = 0
+    def step(self, x, f_val, grad):
+        self.t += 1
+        self.m = self.b1 * self.m + (1 - self.b1) * grad
+        self.v = self.b2 * self.v + (1 - self.b2) * grad * grad
+        m_hat = self.m / (1 - self.b1 ** self.t)
+        v_hat = self.v / (1 - self.b2 ** self.t)
+        return x - self.lr * m_hat / (np.sqrt(v_hat) + self.eps)
+"""
+SGDM_CODE = """
+import numpy as np
+class Optimizer:
+    def __init__(self, dim):
+        self.lr = 0.05
+        self.beta = 0.9
+        self.v = np.zeros(dim)
+    def step(self, x, f_val, grad):
+        self.v = self.beta * self.v - self.lr * grad
+        return x + self.v
+"""
+def scripted_episode() -> None:
+    env = LandscapeforgeEnvironment(tier="T0", seed=42)
+    obs = env.reset()
+    print(f"[reset] landscape: {obs.landscape_description}")
+    print(f"        dim={obs.dim}, hints={obs.structural_hints}")
+    print(f"        budget={obs.budget_remaining}")
+    # 1. Run Adam baseline to see what it does.
+    obs = env.step(LandscapeforgeAction(
+        kind="run_baseline", baseline_name="adam",
+    ))
+    print(f"\n[run_baseline adam] result={obs.last_action_result}")
+    print(f"                    budget_remaining={obs.budget_remaining}")
+    # 2. Submit an Adam draft.
+    obs = env.step(LandscapeforgeAction(kind="draft", code=ADAM_CODE))
+    print(f"\n[draft adam] compile_error={obs.last_action_result.get('compile_error')}")
+    print(f"             summary={obs.last_action_result.get('summary')}")
+    print(f"             budget_remaining={obs.budget_remaining}")
+    # 3. Inspect the first draft.
+    obs = env.step(LandscapeforgeAction(
+        kind="inspect", draft_idx=0, step_range_start=10, step_range_end=20,
+    ))
+    print(f"\n[inspect 0 steps 10-20] result={obs.last_action_result}")
+    print(f"                        budget_remaining={obs.budget_remaining}")
+    # 4. Submit an SGD+momentum alternative.
+    obs = env.step(LandscapeforgeAction(kind="draft", code=SGDM_CODE))
+    print(f"\n[draft sgdm] compile_error={obs.last_action_result.get('compile_error')}")
+    print(f"             summary={obs.last_action_result.get('summary')}")
+    print(f"             budget_remaining={obs.budget_remaining}")
+    # 5. Commit.
+    obs = env.step(LandscapeforgeAction(kind="commit"))
+    print(f"\n[commit]")
+    print(f"  done={obs.done}")
+    print(f"  reward={obs.reward}")
+    print(f"  final_regret={obs.final_regret}")
+    print(f"  r_optcoder_breakdown={obs.r_optcoder_breakdown}")
+    print(f"  last_action_result={obs.last_action_result}")
+    # Sanity checks
+    assert obs.done is True, "should be done after commit"
+    assert obs.reward is not None, "reward must be produced"
+    assert obs.final_regret is not None, "final_regret must be produced"
+    assert obs.r_optcoder_breakdown, "breakdown must be populated"
+    print("\n✓ scripted_episode PASSED")
+def episode_with_broken_code() -> None:
+    """Submitting code that fails to compile should not crash the env."""
+    env = LandscapeforgeEnvironment(tier="T0", seed=7)
+    env.reset()
+    # Intentional syntax error
+    obs = env.step(LandscapeforgeAction(
+        kind="draft", code="this is not python",
+    ))
+    print(f"\n[broken draft] compile_error={obs.last_action_result.get('compile_error')}")
+    assert obs.last_action_result.get("compile_error") is not None
+    assert obs.done is False
+    # Commit with bad code — should produce worst-case regret, not crash
+    obs = env.step(LandscapeforgeAction(kind="commit"))
+    print(f"[broken commit] reward={obs.reward}, final_regret={obs.final_regret}")
+    assert obs.done is True
+    assert obs.reward is not None
+    print("\n✓ episode_with_broken_code PASSED")
+def budget_exhaustion() -> None:
+    """Spamming drafts until budget runs out should auto-commit."""
+    env = LandscapeforgeEnvironment(tier="T0", seed=3)
+    env.reset()
+    for i in range(10):
+        obs = env.step(LandscapeforgeAction(kind="draft", code=ADAM_CODE))
+        if obs.done:
+            print(f"\n[budget_exhaustion] auto-committed after {i+1} drafts")
+            print(f"                    reason={obs.last_action_result.get('reason')}")
+            assert obs.last_action_result.get("reason") == "budget_exhausted"
+            print("\n✓ budget_exhaustion PASSED")
+            return
+    raise AssertionError("Budget never exhausted — shouldn't happen with draft cost 2, budget 12")
+if __name__ == "__main__":
+    scripted_episode()
+    episode_with_broken_code()
+    budget_exhaustion()
+    print("\nAll tests passed.")

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff