Spaces:

Draken1606
/

Container-Port

Sleeping

App Files Files Community

Shabista Sehar commited on about 1 month ago

Commit

0eb4f6f

1 Parent(s): 5a30ea5

construtcion env

Browse files

Files changed (20) hide show

.env.example +7 -0
.gitignore +15 -0
Dockerfile +7 -4
README.md +49 -42
client/__init__.py +4 -0
client/container_env.py +4 -34
inference.py +164 -114
models.py +40 -0
openenv.yaml +8 -21
pyproject.toml +12 -3
pytest.ini +5 -0
requirements.txt +3 -3
server/app.py +36 -0
server/environment.py +102 -59
server/models.py +0 -36
server/server.py +0 -84
tests/conftest.py +9 -0
tests/test_env.py +0 -110
tests/test_openenv_env.py +170 -0
uv.lock +0 -0

.env.example ADDED Viewed

	@@ -0,0 +1,7 @@

+# Container Port OpenEnv — Environment Variables
+# Copy this file and fill in your actual values before running inference
+HF_TOKEN=your_huggingface_token_here
+API_BASE_URL=https://router.huggingface.co/v1
+MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
+ENV_URL=http://localhost:7860
+LOCAL_IMAGE_NAME=

.gitignore ADDED Viewed

	@@ -0,0 +1,15 @@

+__pycache__/
+*.pyc
+*.pyo
+.env
+.venv/
+.uv-cache/
+*.egg-info/
+dist/
+build/
+outputs/
+client/__pycache__/
+server/__pycache__/
+tests/__pycache__/
+.pytest_cache/
+pytest-cache-files-*/

Dockerfile CHANGED Viewed

@@ -2,13 +2,16 @@ FROM python:3.11-slim
 WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
 COPY . .
 ENV PYTHONPATH=/app
 EXPOSE 7860
-CMD ["uvicorn", "server.server:app", "--host", "0.0.0.0", "--port", "7860"]

 WORKDIR /app
 COPY . .
+RUN pip install --no-cache-dir -e .
 ENV PYTHONPATH=/app
+ENV ENABLE_WEB_INTERFACE=true
 EXPOSE 7860
+HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')"
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,71 +1,78 @@
 # Container Port Environment
-An OpenEnv-compatible RL environment for container yard management at a shipping terminal.
 ## Task
-A ship arrives with N containers (priority 1=urgent, 2=normal, 3=low). The agent places each into
-stacks. At regular intervals, specific containers are retrieved. If a target is buried under others,
-each container above it is a **rehandle** — expensive in real port operations.
-**Goal: minimize total rehandle operations across the episode.**
 ## Difficulty Levels
-| Parameter          | Easy     | Medium   | Hard     |
-|--------------------|----------|----------|----------|
-| Stacks             | 6        | 8        | 10       |
-| Max stack height   | 4        | 5        | 6        |
-| Containers         | 20       | 35       | 50       |
-| Retrieval interval | every 5  | every 5  | every 4  |
-| Lookahead shown    | 5        | 3        | 0        |
-## Reward
-| Event | Reward |
-|---|---|
-| Accessible placement of priority-1 (near top) | up to +0.45 |
-| General placement | +0.03 to +0.30 |
-| Burying high-priority under low-priority | -0.10 to -0.20 |
-| Invalid action (full stack / bad index) | -2.0 |
-| Each rehandle at retrieval time | -0.40 |
-## Score
-`score = 1.0 - (actual_rehandles / worst_case_rehandles)`, in [0.0, 1.0].
-## Setup
 ```bash
-pip install -r requirements.txt
-uvicorn server.server:app --host 0.0.0.0 --port 7860
 ```
-## Run inference
-```bash
-# Greedy agent, all difficulties
-python inference.py --difficulty all
-# LLM agent (requires HF token in env)
-export HF_TOKEN=hf_your_token_here
-python inference.py --use-llm --difficulty all
-# Against deployed HF Space
-python inference.py --url https://YOUR_USERNAME-container-port-env.hf.space --difficulty all
 ```
 ## Docker
 ```bash
 docker build -t container-port-env .
 docker run -p 7860:7860 container-port-env
 ```
-## API
-- `GET /ping` — health check
-- `GET /health` — server stats
-- `WS /ws` — WebSocket interface
-WebSocket messages:
-- `{"type": "reset", "difficulty": "easy"}` — start episode
-- `{"type": "step", "action": {"stack_index": 2}}` — place container
-- `{"type": "state"}` — get full state with score

 # Container Port Environment
+An OpenEnv environment for container-yard stack planning at a shipping terminal.
 ## Task
+Incoming containers have priority `1`, `2`, or `3`. The agent places each one into a bounded stack. During retrieval, every container sitting above the target counts as a rehandle and adds cost.
+Goal: minimize total rehandles across the episode.
 ## Difficulty Levels
+| Parameter | Easy | Medium | Hard |
+|---|---|---|---|
+| Stacks | 6 | 8 | 10 |
+| Max height | 4 | 5 | 6 |
+| Containers | 20 | 35 | 50 |
+| Retrieval interval | 5 | 5 | 4 |
+| Lookahead | 5 | 3 | 0 |
+## Run Locally
+```bash
+pip install -e .
+uvicorn server.app:app --host 0.0.0.0 --port 7860
+```
+Web UI: `http://127.0.0.1:7860/web`
+For manual stateful checks, use the web endpoints:
 ```bash
+curl http://127.0.0.1:7860/health
+curl -X POST http://127.0.0.1:7860/web/reset -H "Content-Type: application/json" -d "{\"difficulty\":\"easy\"}"
+curl -X POST http://127.0.0.1:7860/web/step -H "Content-Type: application/json" -d "{\"action\":{\"stack_index\":0}}"
 ```
+`/reset` and `/step` are stateless simulation endpoints in `openenv-core 0.2.3`. For browser-style interactive testing, use `/web`, `/web/reset`, `/web/step`, or the WebSocket flow used by `inference.py`.
+## Run Inference
+```bash
+python inference.py --difficulty all
+python inference.py --difficulty easy
+python inference.py --url http://127.0.0.1:7860 --difficulty all
 ```
+For LLM mode, set `HF_TOKEN` first.
 ## Docker
 ```bash
 docker build -t container-port-env .
 docker run -p 7860:7860 container-port-env
 ```
+## Tests
+Run the full test suite:
+```bash
+pytest tests/test_openenv_env.py -v
+```
+| Test | What it covers |
+|---|---|
+| test_reset_returns_valid_obs | Reset returns correct stack count, step=0, no rehandles |
+| test_step_valid_action | Valid placement increments step and fills stack |
+| test_step_invalid_action_penalized | Out-of-range stack index returns -2.0 reward |
+| test_score_in_range | Full episode score stays in [0.0, 1.0] |
+| test_full_episode_completes | All 3 difficulties reach done=True within 500 steps |
+| test_lookahead_visibility | Easy shows more upcoming retrievals than hard (hard=0) |
+| test_reward_is_dense | At least 50% of steps have non-zero reward |
+| test_no_double_retrieval | retrieval_pointer never exceeds queue length |
+| test_health_route | GET /health returns 200 |
+| test_web_ui_route | GET /web returns 200 (Gradio UI) |
+| test_http_reset_returns_observation | POST /reset returns valid easy-mode observation |
+| test_http_reset_then_step_preserves_state | Sequential reset+step operates on same episode |

client/__init__.py CHANGED Viewed

	@@ -1 +1,5 @@

1

+from client.container_env import ContainerEnvClient
+ContainerPortEnv = ContainerEnvClient
+__all__ = ["ContainerEnvClient", "ContainerPortEnv"]

client/container_env.py CHANGED Viewed

@@ -1,37 +1,7 @@
-import json
-import websockets
-from typing import Any, Dict, Tuple
-class ContainerEnvClient:
-    """Async client for Container Port OpenEnv."""
-    def __init__(self, base_url: str = "http://localhost:7860"):
-        ws_url = base_url.replace("http://", "ws://").replace("https://", "wss://")
-        self.ws_url = ws_url.rstrip("/") + "/ws"
-        self._ws = None
-    async def __aenter__(self):
-        self._ws = await websockets.connect(self.ws_url)
-        return self
-    async def __aexit__(self, *args):
-        if self._ws:
-            await self._ws.close()
-    async def reset(self, difficulty: str = "medium") -> Dict[str, Any]:
-        await self._ws.send(json.dumps({"type": "reset", "difficulty": difficulty}))
-        resp = json.loads(await self._ws.recv())
-        return resp["observation"]
-    async def step(self, stack_index: int) -> Tuple[Dict, float, bool, Dict]:
-        await self._ws.send(json.dumps({
-            "type": "step",
-            "action": {"stack_index": stack_index}
-        }))
-        resp = json.loads(await self._ws.recv())
-        return resp["observation"], resp["reward"], resp["done"], resp.get("info", {})
-    async def state(self) -> Dict[str, Any]:
-        await self._ws.send(json.dumps({"type": "state"}))
-        resp = json.loads(await self._ws.recv())
-        return resp["state"]

+from openenv import GenericEnvClient
+class ContainerEnvClient(GenericEnvClient):
+    """OpenEnv client for Container Port."""
+    pass

inference.py CHANGED Viewed

@@ -1,50 +1,90 @@
-#!/usr/bin/env python3
 """
 Container Port OpenEnv — Baseline Inference Script
-SST x Meta PyTorch OpenEnv Hackathon
-Required environment variables (or set below):
-  HF_TOKEN      - Your Hugging Face token
-  API_BASE_URL  - LLM API endpoint (default: https://router.huggingface.co/v1)
-  MODEL_NAME    - Model identifier (default: meta-llama/Llama-3.1-8B-Instruct)
 Usage:
   python inference.py
-  python inference.py --url https://YOUR_USERNAME-container-port-env.hf.space --difficulty all
   python inference.py --difficulty easy
 """
 import os
 import sys
 import json
-import asyncio
-import argparse
-import websockets
 from openai import OpenAI
 #  Required configuration variables
-API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
-MODEL_NAME   = os.getenv("MODEL_NAME",   "meta-llama/Llama-3.1-8B-Instruct")
-HF_TOKEN     = os.getenv("HF_TOKEN",     "")   # set your HF token here or via env var
 #
-ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
-def _llm_client() -> OpenAI:
-    """Return an OpenAI-compatible client pointed at HF Inference Router."""
-    return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
 def greedy_decide(obs: dict) -> int:
-    """
-    Greedy heuristic agent — no LLM call.
-    Scores each valid stack by accessibility and priority compatibility.
-    """
-    stacks    = obs["stack_states"]
-    current   = obs.get("current_container")
     max_height = obs["max_height"]
-    upcoming  = set(obs.get("upcoming_retrievals", []))
     if current is None:
         return 0
@@ -56,16 +96,15 @@ def greedy_decide(obs: dict) -> int:
         depth = len(stack)
         if depth >= max_height:
             continue
         score = 0.0
         accessibility = (max_height - depth) / max_height
         score += accessibility * (4 - cur_priority)
         if depth > 0:
-            top_priority = stack[-1]["priority"]
-            if cur_priority > top_priority:
-                score -= 10.0 * (cur_priority - top_priority)
-            elif cur_priority < top_priority:
                 score += 3.0
         if current["id"] in upcoming:
@@ -82,53 +121,48 @@ def greedy_decide(obs: dict) -> int:
         for i, stack in enumerate(stacks):
             if len(stack) < max_height:
                 return i
-    return best_stack
-def llm_decide(obs: dict) -> int:
-    """Use HF-hosted LLM via OpenAI-compatible client to choose a stack."""
-    stacks    = obs["stack_states"]
-    current   = obs.get("current_container")
-    n_stacks  = obs["n_stacks"]
     max_height = obs["max_height"]
-    upcoming  = obs.get("upcoming_retrievals", [])
     difficulty = obs.get("difficulty", "medium")
-    stack_lines = []
     for i, stack in enumerate(stacks):
         if not stack:
-            stack_lines.append(f"  Stack {i}: EMPTY (0/{max_height})")
         else:
             contents = ", ".join(f"{c['id']}(p{c['priority']})" for c in stack)
-            stack_lines.append(
                 f"  Stack {i}: [{contents}] depth={len(stack)}/{max_height},"
                 f" top=priority-{stack[-1]['priority']}"
             )
     prompt = (
-        f"You are an expert container yard planner.\n"
-        f"TASK: Place the incoming container into a stack to MINIMIZE future rehandle operations.\n"
-        f"RULE: When a container is retrieved, every container ON TOP of it must be moved (rehandle).\n"
-        f"Priority 1=URGENT (retrieved first), 2=Normal, 3=Low (retrieved last).\n\n"
         f"DIFFICULTY: {difficulty}\n"
-        f"UPCOMING RETRIEVALS (next to be retrieved, in order): "
-        f"{upcoming if upcoming else 'Unknown (hard mode)'}\n\n"
         f"CONTAINER TO PLACE: id={current['id']}, priority={current['priority']}, "
         f"weight={current['weight']}kg\n\n"
-        f"STACK STATES (bottomtop):\n" + "\n".join(stack_lines) + "\n\n"
-        f"Respond with ONLY valid JSON: {{\"stack_index\": <integer 0-{n_stacks-1}>}}"
     )
     try:
-        client = _llm_client()
-        response = client.chat.completions.create(
             model=MODEL_NAME,
             max_tokens=64,
             temperature=0.0,
             messages=[{"role": "user", "content": prompt}],
         )
-        text = response.choices[0].message.content.strip()
-        # strip markdown fences if model wraps in ```json ... ```
         if "```" in text:
             text = text.split("```")[1]
             if text.startswith("json"):
@@ -138,84 +172,100 @@ def llm_decide(obs: dict) -> int:
         if 0 <= idx < n_stacks and len(obs["stack_states"][idx]) < max_height:
             return idx
     except Exception as e:
-        print(f"  [LLM fallback: {e}]", file=sys.stderr)
     return greedy_decide(obs)
-async def run_episode(url: str, difficulty: str = "medium", use_llm: bool = False) -> float:
     ws_url = url.replace("http://", "ws://").replace("https://", "wss://")
     if not ws_url.endswith("/ws"):
         ws_url = ws_url.rstrip("/") + "/ws"
-    #  [START] log
-    print(json.dumps({"type": "[START]", "task": difficulty, "difficulty": difficulty,
-                      "env_url": url, "model": MODEL_NAME if use_llm else "greedy"}))
-    sys.stdout.flush()
-    total_reward = 0.0
-    step = 0
-    async with websockets.connect(ws_url) as ws:
-        await ws.send(json.dumps({"type": "reset", "difficulty": difficulty}))
-        resp = json.loads(await ws.recv())
-        obs  = resp["observation"]
-        while not obs.get("done", False):
-            action_idx = llm_decide(obs) if use_llm else greedy_decide(obs)
-            await ws.send(json.dumps({"type": "step", "action": {"stack_index": action_idx}}))
-            resp  = json.loads(await ws.recv())
-            obs   = resp["observation"]
-            reward = resp["reward"]
-            done  = resp["done"]
-            total_reward += reward
-            step += 1
-            #  [STEP] log
-            print(json.dumps({
-                "type": "[STEP]",
-                "step": step,
-                "action": action_idx,
-                "reward": round(reward, 4),
-                "total_reward": round(total_reward, 4),
-                "done": done,
-                "rehandle_count": obs["rehandle_count"],
-            }))
-            sys.stdout.flush()
-        # fetch final state for score
-        await ws.send(json.dumps({"type": "state"}))
-        state_resp = json.loads(await ws.recv())
-        state = state_resp["state"]
-    final_score = state.get("score", 0.0)
-    #  [END] log
-    print(json.dumps({
-        "type": "[END]",
-        "task": difficulty,
-        "difficulty": difficulty,
-        "total_reward": round(total_reward, 4),
-        "final_score": final_score,
-        "total_steps": step,
-        "rehandle_count": state.get("rehandle_count", 0),
-    }))
-    sys.stdout.flush()
-    return final_score
-async def run_all(url: str, use_llm: bool = False):
     for diff in ["easy", "medium", "hard"]:
         await run_episode(url, difficulty=diff, use_llm=use_llm)
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Container Port Baseline Agent")
     parser.add_argument("--url",        default=ENV_URL)
-    parser.add_argument("--difficulty", default="all", choices=["easy", "medium", "hard", "all"])
-    parser.add_argument("--use-llm",    action="store_true")
     args = parser.parse_args()
     if args.difficulty == "all":

+#!/usr/bin/env python3
 """
 Container Port OpenEnv — Baseline Inference Script
+SST x Meta PyTorch OpenEnv Hackathon 2026
+Stdout format (grader parses these exactly):
+  [START] task=<task> env=container-port-env model=<model>
+  [STEP]  step=<n> action=<stack_idx> reward=<0.00> done=<true|false> error=<msg|null>
+  [END]   success=<true|false> steps=<n> score=<0.000> rewards=<r1,r2,...>
 Usage:
   python inference.py
   python inference.py --difficulty easy
+  python inference.py --difficulty all
+  python inference.py --use-llm
+  python inference.py --url https://YOUR_USERNAME-container-port-env.hf.space
 """
+import asyncio
 import os
 import sys
 import json
+from typing import List, Optional
 from openai import OpenAI
+# Load .env file if present (before os.getenv calls)
+def _load_dotenv():
+    env_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".env")
+    if os.path.exists(env_path):
+        with open(env_path) as f:
+            for line in f:
+                line = line.strip()
+                if not line or line.startswith("#") or "=" not in line:
+                    continue
+                key, _, value = line.partition("=")
+                key = key.strip()
+                value = value.strip().strip('"').strip("'")
+                if key and key not in os.environ:
+                    os.environ[key] = value
+_load_dotenv()
 #  Required configuration variables
+API_BASE_URL      = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME        = os.getenv("MODEL_NAME",   "meta-llama/Llama-3.1-8B-Instruct")
+HF_TOKEN          = os.getenv("HF_TOKEN")
+LOCAL_IMAGE_NAME  = os.getenv("LOCAL_IMAGE_NAME")
+API_KEY           = HF_TOKEN or os.getenv("API_KEY")
 #
+ENV_URL       = os.getenv("ENV_URL", "http://localhost:7860")
+TASK_NAME     = "container-stacking"
+BENCHMARK     = "container-port-env"
+MAX_STEPS     = 200  # hard mode has 50 containers, safety ceiling
+SUCCESS_SCORE_THRESHOLD = 0.5
+#  Logging helpers (exact SST format)
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+#  Agents
 def greedy_decide(obs: dict) -> int:
+    stacks     = obs["stack_states"]
+    current    = obs.get("current_container")
     max_height = obs["max_height"]
+    upcoming   = set(obs.get("upcoming_retrievals", []))
     if current is None:
         return 0
         depth = len(stack)
         if depth >= max_height:
             continue
         score = 0.0
         accessibility = (max_height - depth) / max_height
         score += accessibility * (4 - cur_priority)
         if depth > 0:
+            top_p = stack[-1]["priority"]
+            if cur_priority > top_p:
+                score -= 10.0 * (cur_priority - top_p)
+            elif cur_priority < top_p:
                 score += 3.0
         if current["id"] in upcoming:
         for i, stack in enumerate(stacks):
             if len(stack) < max_height:
                 return i
+    return max(best_stack, 0)
+def llm_decide(obs: dict, client: OpenAI) -> int:
+    stacks     = obs["stack_states"]
+    current    = obs.get("current_container")
+    n_stacks   = obs["n_stacks"]
     max_height = obs["max_height"]
+    upcoming   = obs.get("upcoming_retrievals", [])
     difficulty = obs.get("difficulty", "medium")
+    lines = []
     for i, stack in enumerate(stacks):
         if not stack:
+            lines.append(f"  Stack {i}: EMPTY (0/{max_height})")
         else:
             contents = ", ".join(f"{c['id']}(p{c['priority']})" for c in stack)
+            lines.append(
                 f"  Stack {i}: [{contents}] depth={len(stack)}/{max_height},"
                 f" top=priority-{stack[-1]['priority']}"
             )
     prompt = (
+        f"You are a container yard planner. Minimize rehandle operations.\n"
+        f"Priority 1=URGENT (retrieved first), 2=Normal, 3=Low.\n"
+        f"RULE: containers above the target at retrieval = rehandles (costly).\n\n"
         f"DIFFICULTY: {difficulty}\n"
+        f"UPCOMING RETRIEVALS: {upcoming or 'Unknown (hard mode)'}\n\n"
         f"CONTAINER TO PLACE: id={current['id']}, priority={current['priority']}, "
         f"weight={current['weight']}kg\n\n"
+        f"STACKS (bottom->top):\n" + "\n".join(lines) + "\n\n"
+        f"Reply ONLY with valid JSON: {{\"stack_index\": <int 0-{n_stacks-1}>}}"
     )
     try:
+        resp = client.chat.completions.create(
             model=MODEL_NAME,
             max_tokens=64,
             temperature=0.0,
             messages=[{"role": "user", "content": prompt}],
         )
+        text = resp.choices[0].message.content.strip()
         if "```" in text:
             text = text.split("```")[1]
             if text.startswith("json"):
         if 0 <= idx < n_stacks and len(obs["stack_states"][idx]) < max_height:
             return idx
     except Exception as e:
+        print(f"[DEBUG] LLM fallback: {e}", file=sys.stderr, flush=True)
     return greedy_decide(obs)
+#  Episode runner
+async def run_episode(
+    url: str,
+    difficulty: str = "medium",
+    use_llm: bool = False,
+) -> float:
+    import websockets
     ws_url = url.replace("http://", "ws://").replace("https://", "wss://")
     if not ws_url.endswith("/ws"):
         ws_url = ws_url.rstrip("/") + "/ws"
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY) if use_llm else None
+    model_label = MODEL_NAME if use_llm else "greedy"
+    log_start(task=f"{TASK_NAME}-{difficulty}", env=BENCHMARK, model=model_label)
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    try:
+        async with websockets.connect(ws_url) as ws:
+            await ws.send(json.dumps({"type": "reset", "data": {"difficulty": difficulty}}))
+            resp = json.loads(await ws.recv())
+            payload = resp.get("data", {})
+            obs = payload.get("observation", payload)
+            for step in range(1, MAX_STEPS + 1):
+                if obs.get("done", False):
+                    break
+                action_idx = llm_decide(obs, client) if use_llm else greedy_decide(obs)
+                await ws.send(json.dumps({
+                    "type": "step",
+                    "data": {"stack_index": action_idx},
+                }))
+                resp = json.loads(await ws.recv())
+                payload = resp.get("data", {})
+                obs    = payload.get("observation", payload)
+                reward = float(payload.get("reward", obs.get("last_reward", 0.0)) or obs.get("last_reward", 0.0))
+                done   = payload.get("done", obs.get("done", False))
+                error  = payload.get("error", None)
+                rewards.append(reward)
+                steps_taken = step
+                log_step(step=step, action=str(action_idx), reward=reward, done=done, error=error)
+                if done:
+                    break
+            # Fetch final score
+            await ws.send(json.dumps({"type": "state"}))
+            state_resp = json.loads(await ws.recv())
+            state = state_resp.get("data", {})
+            score = float(state.get("score", obs.get("score", 0.0)))
+            score = min(max(score, 0.0), 1.0)
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    except Exception as exc:
+        print(f"[DEBUG] Episode error: {exc}", file=sys.stderr, flush=True)
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return score
+async def run_all(url: str, use_llm: bool = False) -> None:
     for diff in ["easy", "medium", "hard"]:
         await run_episode(url, difficulty=diff, use_llm=use_llm)
+#  Entry point
 if __name__ == "__main__":
+    import argparse
     parser = argparse.ArgumentParser(description="Container Port Baseline Agent")
     parser.add_argument("--url",        default=ENV_URL)
+    parser.add_argument("--difficulty", default="all",
+                        choices=["easy", "medium", "hard", "all"])
+    parser.add_argument("--use-llm",    action="store_true",
+                        help="Use LLM agent via HF router (requires HF_TOKEN)")
     args = parser.parse_args()
     if args.difficulty == "all":

models.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from __future__ import annotations
+from typing import Any
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field
+class ContainerAction(Action):
+    """Place the current container into a stack."""
+    stack_index: int = Field(
+        ...,
+        description="Zero-indexed stack to place the incoming container into",
+        ge=0,
+    )
+class ContainerObservation(Observation):
+    """Observation returned after each step."""
+    stack_states: list[list[dict[str, Any]]] = Field(
+        ..., description="Each stack is a list of {id, priority} dicts (bottom->top)"
+    )
+    current_container: dict[str, Any] | None = Field(
+        None, description="Container to place now: {id, priority, weight}"
+    )
+    upcoming_retrievals: list[str] = Field(
+        default_factory=list,
+        description="IDs of next containers to be retrieved (lookahead)",
+    )
+    rehandle_count: int = Field(0, description="Cumulative rehandles so far")
+    step: int = Field(0, description="Steps completed")
+    containers_remaining: int = Field(0)
+    n_stacks: int = Field(0)
+    max_height: int = Field(0)
+    difficulty: str = Field("medium")
+    last_reward: float = Field(0.0)
+    score: float = Field(0.0, description="Normalized score 0.0-1.0")
+    done: bool = Field(False)

openenv.yaml CHANGED Viewed

@@ -1,30 +1,17 @@
 name: container-port-env
 version: "0.1.0"
 description: >
   Container terminal yard RL environment. An agent places incoming ship
-  containers into stacks of limited height to minimize costly rehandle
-  operations during retrieval. Features 3 difficulty levels (easy/medium/hard)
-  with different stack configurations, retrieval frequencies, and lookahead
-  visibility. Models real port logistics decision-making.
 tags:
   - logistics
   - planning
   - real-world
   - combinatorial-optimization
-sdk: docker
-entry_point: server.server:app
-tools:
-  - name: place_container
-    description: >
-      Place the current incoming container into a specified stack index.
-      Priority 1=urgent (retrieved first), 2=normal, 3=low (retrieved last).
-      Burying high-priority under low-priority causes rehandle costs.
-    input_schema:
-      type: object
-      properties:
-        stack_index:
-          type: integer
-          description: "Zero-indexed stack to place the container into"
-          minimum: 0
-      required:
-        - stack_index

+spec_version: 1
 name: container-port-env
 version: "0.1.0"
+type: standard
+runtime: docker
+app: server.app:app
+port: 7860
 description: >
   Container terminal yard RL environment. An agent places incoming ship
+  containers into stacks to minimize costly rehandle operations during
+  retrieval. Three difficulty levels: easy (6 stacks, lookahead 5),
+  medium (8 stacks, lookahead 3), hard (10 stacks, no lookahead).
 tags:
   - logistics
   - planning
   - real-world
   - combinatorial-optimization

pyproject.toml CHANGED Viewed

@@ -2,16 +2,25 @@
 requires = ["setuptools>=68"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "openenv-container-port"
 version = "0.1.0"
-description = "Container yard RL environment for OpenEnv hackathon"
 requires-python = ">=3.10"
 dependencies = [
     "fastapi>=0.110.0",
     "uvicorn[standard]>=0.29.0",
-    "websockets>=12.0",
     "pydantic>=2.0.0",
-    "openenv-core>=0.1.0",
     "openai>=1.0.0",
 ]

 requires = ["setuptools>=68"]
 build-backend = "setuptools.build_meta"
+[tool.setuptools]
+packages = ["client", "server"]
+py-modules = ["models"]
 [project]
 name = "openenv-container-port"
 version = "0.1.0"
+description = "Container yard RL environment - SST x Meta PyTorch OpenEnv Hackathon"
 requires-python = ">=3.10"
 dependencies = [
+    "openenv-core>=0.2.0",
     "fastapi>=0.110.0",
     "uvicorn[standard]>=0.29.0",
     "pydantic>=2.0.0",
     "openai>=1.0.0",
+    "websockets>=12.0",
+    "huggingface_hub>=0.20.0",
+    "pytest>=8.0.0",
 ]
+[project.scripts]
+server = "server.app:main"

pytest.ini ADDED Viewed

	@@ -0,0 +1,5 @@

+[pytest]
+addopts = -p no:cacheprovider
+testpaths = tests
+python_files = test_openenv_env.py
+norecursedirs = .git .venv __pycache__ .pytest_cache pytest-cache-files-*

requirements.txt CHANGED Viewed

@@ -1,8 +1,8 @@
 fastapi>=0.110.0
 uvicorn[standard]>=0.29.0
-websockets>=12.0
 pydantic>=2.0.0
-openenv-core>=0.1.0
 openai>=1.0.0
-pytest>=8.0.0
 huggingface_hub>=0.20.0

+openenv-core>=0.2.0
 fastapi>=0.110.0
 uvicorn[standard]>=0.29.0
 pydantic>=2.0.0
 openai>=1.0.0
+websockets>=12.0
 huggingface_hub>=0.20.0
+pytest>=8.0.0

server/app.py ADDED Viewed

	@@ -0,0 +1,36 @@

+"""
+FastAPI app for Container Port Environment.
+"""
+from __future__ import annotations
+import os
+import sys
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
+from openenv.core.env_server import create_web_interface_app
+import uvicorn
+from models import ContainerAction, ContainerObservation
+from server.environment import ContainerYardEnvironment
+app = create_web_interface_app(
+    ContainerYardEnvironment,
+    ContainerAction,
+    ContainerObservation,
+    env_name="container-port-env",
+)
+def main() -> None:
+    uvicorn.run("server.app:app", host="0.0.0.0", port=7860)
+if __name__ == "__main__":
+    main()

server/environment.py CHANGED Viewed

@@ -1,11 +1,20 @@
 import random
 from dataclasses import dataclass
-from typing import List, Optional, Dict, Any, Tuple
-@dataclass
 class Container:
     id: str
-    priority: int   # 1=urgent, 2=normal, 3=low
     weight: float
 DIFFICULTY_CONFIG = {
@@ -35,11 +44,18 @@ DIFFICULTY_CONFIG = {
     },
 }
-class ContainerYardEnv:
-    def __init__(self, difficulty: str = "medium", seed: Optional[int] = None):
-        assert difficulty in DIFFICULTY_CONFIG, f"difficulty must be one of {list(DIFFICULTY_CONFIG.keys())}"
-        self.difficulty = difficulty
-        self.seed = seed
         cfg = DIFFICULTY_CONFIG[difficulty]
         self.n_stacks = cfg["n_stacks"]
         self.max_height = cfg["max_height"]
@@ -47,23 +63,18 @@ class ContainerYardEnv:
         self.retrieval_interval = cfg["retrieval_interval"]
         self.lookahead = cfg["lookahead"]
         self.priority_weights = cfg["priority_weights"]
-        self.reset()
-    def reset(self) -> Dict[str, Any]:
-        if self.seed is not None:
-            random.seed(self.seed)
-        self.stacks: List[List[Container]] = [[] for _ in range(self.n_stacks)]
         self.rehandle_count = 0
-        self.step_count = 0
         self.total_reward = 0.0
         self.done = False
-        self.manifest: List[Container] = self._generate_manifest()
-        self.retrieval_queue: List[str] = self._generate_retrieval_queue()
         self.retrieval_pointer = 0
         self.current_idx = 0
-        return self._observe(last_reward=0.0)
-    def _generate_manifest(self) -> List[Container]:
         containers = []
         for i in range(self.n_containers):
             priority = random.choices([1, 2, 3], weights=self.priority_weights)[0]
@@ -74,7 +85,7 @@ class ContainerYardEnv:
             ))
         return containers
-    def _generate_retrieval_queue(self) -> List[str]:
         ids_by_priority = {1: [], 2: [], 3: []}
         for c in self.manifest:
             ids_by_priority[c.priority].append(c.id)
@@ -83,45 +94,62 @@ class ContainerYardEnv:
         queue = ids_by_priority[1] + ids_by_priority[2] + ids_by_priority[3]
         return queue
-    def step(self, stack_index: int) -> Tuple[Dict[str, Any], float, bool, Dict[str, Any]]:
         if self.done:
-            return self._observe(0.0), 0.0, True, {"error": "episode already done"}
         if stack_index < 0 or stack_index >= self.n_stacks:
             reward = -2.0
             self.total_reward += reward
-            return self._observe(reward), reward, False, {"error": f"invalid stack_index {stack_index}, must be 0-{self.n_stacks-1}"}
         if len(self.stacks[stack_index]) >= self.max_height:
             reward = -2.0
             self.total_reward += reward
-            return self._observe(reward), reward, False, {"error": f"stack {stack_index} is full (height {self.max_height})"}
         current = self.manifest[self.current_idx]
         self.stacks[stack_index].append(current)
         placement_reward = self._placement_reward(stack_index, current)
         self.current_idx += 1
-        self.step_count += 1
         retrieval_cost = 0.0
-        retrievals_done = []
-        if self.step_count % self.retrieval_interval == 0:
-            cost, done_ids = self._trigger_retrieval()
             retrieval_cost = cost
-            retrievals_done = done_ids
         reward = placement_reward - retrieval_cost
         self.total_reward += reward
         self.done = (self.current_idx >= len(self.manifest))
-        return self._observe(reward), reward, self.done, {
-            "rehandles": self.rehandle_count,
-            "step": self.step_count,
-            "placement_reward": round(placement_reward, 4),
-            "retrieval_cost": round(retrieval_cost, 4),
-            "retrievals_done": retrievals_done,
-        }
     def _placement_reward(self, stack_index: int, container: Container) -> float:
         # stack_depth = zero-based index of the just-placed container
@@ -143,7 +171,7 @@ class ContainerYardEnv:
         return round(base, 4)
-    def _trigger_retrieval(self) -> Tuple[float, List[str]]:
         total_cost = 0.0
         done_ids = []
         for _ in range(2):
@@ -166,12 +194,16 @@ class ContainerYardEnv:
                     return round(rehandles * 0.4, 4)
         return 0.0  # container not yet in yard — no penalty
-    def _get_upcoming_retrievals(self) -> List[str]:
         start = self.retrieval_pointer
         end = min(start + self.lookahead, len(self.retrieval_queue))
         return self.retrieval_queue[start:end]
-    def _observe(self, last_reward: float = 0.0) -> Dict[str, Any]:
         stack_states = []
         for s in self.stacks:
             stack_states.append([{"id": c.id, "priority": c.priority} for c in s])
@@ -181,25 +213,20 @@ class ContainerYardEnv:
             c = self.manifest[self.current_idx]
             current = {"id": c.id, "priority": c.priority, "weight": c.weight}
-        return {
-            "stack_states": stack_states,
-            "current_container": current,
-            "upcoming_retrievals": self._get_upcoming_retrievals(),
-            "rehandle_count": self.rehandle_count,
-            "step": self.step_count,
-            "containers_remaining": len(self.manifest) - self.current_idx,
-            "n_stacks": self.n_stacks,
-            "max_height": self.max_height,
-            "difficulty": self.difficulty,
-            "last_reward": last_reward,
-            "done": self.done,
-        }
-    def get_state(self) -> Dict[str, Any]:
-        obs = self._observe()
-        obs["score"] = self.score()
-        obs["total_reward"] = round(self.total_reward, 4)
-        return obs
     def score(self) -> float:
         """Normalized score in [0.0, 1.0]. Based on actual retrievals attempted."""
@@ -209,3 +236,19 @@ class ContainerYardEnv:
             return 1.0
         score = max(0.0, 1.0 - self.rehandle_count / worst_case)
         return round(min(score, 1.0), 4)

+from __future__ import annotations
 import random
+import uuid
 from dataclasses import dataclass
+from typing import Any
+from openenv.core.env_server import Environment, State
+from openenv.core.env_server.types import EnvironmentMetadata
+from models import ContainerAction, ContainerObservation
+@dataclass(slots=True)
 class Container:
     id: str
+    priority: int
     weight: float
 DIFFICULTY_CONFIG = {
     },
 }
+class ContainerYardEnvironment(Environment):
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    def __init__(self) -> None:
+        self._difficulty = "medium"
+        self._state = State(episode_id=str(uuid.uuid4()), step_count=0)
+        self._init_env("medium", seed=None)
+    def _init_env(self, difficulty: str, seed: int | None) -> None:
+        if difficulty not in DIFFICULTY_CONFIG:
+            difficulty = "medium"
+        self._difficulty = difficulty
         cfg = DIFFICULTY_CONFIG[difficulty]
         self.n_stacks = cfg["n_stacks"]
         self.max_height = cfg["max_height"]
         self.retrieval_interval = cfg["retrieval_interval"]
         self.lookahead = cfg["lookahead"]
         self.priority_weights = cfg["priority_weights"]
+        if seed is not None:
+            random.seed(seed)
+        self.stacks: list[list[Container]] = [[] for _ in range(self.n_stacks)]
         self.rehandle_count = 0
         self.total_reward = 0.0
         self.done = False
+        self.manifest: list[Container] = self._generate_manifest()
+        self.retrieval_queue: list[str] = self._generate_retrieval_queue()
         self.retrieval_pointer = 0
         self.current_idx = 0
+    def _generate_manifest(self) -> list[Container]:
         containers = []
         for i in range(self.n_containers):
             priority = random.choices([1, 2, 3], weights=self.priority_weights)[0]
             ))
         return containers
+    def _generate_retrieval_queue(self) -> list[str]:
         ids_by_priority = {1: [], 2: [], 3: []}
         for c in self.manifest:
             ids_by_priority[c.priority].append(c.id)
         queue = ids_by_priority[1] + ids_by_priority[2] + ids_by_priority[3]
         return queue
+    def reset(
+        self,
+        seed: int | None = None,
+        episode_id: str | None = None,
+        **kwargs: Any,
+    ) -> ContainerObservation:
+        difficulty = kwargs.get("difficulty", "medium")
+        self._state = State(
+            episode_id=episode_id or str(uuid.uuid4()),
+            step_count=0,
+        )
+        self._init_env(difficulty, seed)
+        return self._observe(last_reward=0.0)
+    def step(
+        self,
+        action: ContainerAction | int,
+        timeout_s: float | None = None,
+        **kwargs: Any,
+    ) -> ContainerObservation:
         if self.done:
+            return self._observe(0.0)
+        if isinstance(action, int):
+            action = ContainerAction(stack_index=action)
+        stack_index = action.stack_index
         if stack_index < 0 or stack_index >= self.n_stacks:
             reward = -2.0
             self.total_reward += reward
+            self._state.step_count += 1
+            return self._observe(reward)
         if len(self.stacks[stack_index]) >= self.max_height:
             reward = -2.0
             self.total_reward += reward
+            self._state.step_count += 1
+            return self._observe(reward)
         current = self.manifest[self.current_idx]
         self.stacks[stack_index].append(current)
         placement_reward = self._placement_reward(stack_index, current)
         self.current_idx += 1
+        self._state.step_count += 1
         retrieval_cost = 0.0
+        if self._state.step_count % self.retrieval_interval == 0:
+            cost, _ = self._trigger_retrieval()
             retrieval_cost = cost
         reward = placement_reward - retrieval_cost
         self.total_reward += reward
         self.done = (self.current_idx >= len(self.manifest))
+        return self._observe(reward)
     def _placement_reward(self, stack_index: int, container: Container) -> float:
         # stack_depth = zero-based index of the just-placed container
         return round(base, 4)
+    def _trigger_retrieval(self) -> tuple[float, list[str]]:
         total_cost = 0.0
         done_ids = []
         for _ in range(2):
                     return round(rehandles * 0.4, 4)
         return 0.0  # container not yet in yard — no penalty
+    def _get_upcoming_retrievals(self) -> list[str]:
         start = self.retrieval_pointer
         end = min(start + self.lookahead, len(self.retrieval_queue))
         return self.retrieval_queue[start:end]
+    @property
+    def state(self) -> State:
+        return self._state
+    def _observe(self, last_reward: float = 0.0) -> ContainerObservation:
         stack_states = []
         for s in self.stacks:
             stack_states.append([{"id": c.id, "priority": c.priority} for c in s])
             c = self.manifest[self.current_idx]
             current = {"id": c.id, "priority": c.priority, "weight": c.weight}
+        return ContainerObservation(
+            stack_states=stack_states,
+            current_container=current,
+            upcoming_retrievals=self._get_upcoming_retrievals(),
+            rehandle_count=self.rehandle_count,
+            step=self._state.step_count,
+            containers_remaining=len(self.manifest) - self.current_idx,
+            n_stacks=self.n_stacks,
+            max_height=self.max_height,
+            difficulty=self._difficulty,
+            last_reward=last_reward,
+            score=self.score(),
+            done=self.done,
+        )
     def score(self) -> float:
         """Normalized score in [0.0, 1.0]. Based on actual retrievals attempted."""
             return 1.0
         score = max(0.0, 1.0 - self.rehandle_count / worst_case)
         return round(min(score, 1.0), 4)
+    def get_state(self) -> dict[str, Any]:
+        return self._observe().model_dump()
+    def get_metadata(self) -> EnvironmentMetadata:
+        return EnvironmentMetadata(
+            name="container-port-env",
+            description=(
+                "Container terminal yard environment where agents place incoming "
+                "containers into stacks to minimize rehandle cost during retrieval."
+            ),
+            version="0.1.0",
+        )
+ContainerYardEnv = ContainerYardEnvironment

server/models.py DELETED Viewed

@@ -1,36 +0,0 @@
-from pydantic import BaseModel, Field
-from typing import List, Optional, Dict, Any
-class ContainerInfo(BaseModel):
-    id: str
-    priority: int = Field(..., ge=1, le=3)
-    weight: float
-class StackEntry(BaseModel):
-    id: str
-    priority: int
-class ContainerAction(BaseModel):
-    stack_index: int = Field(..., description="Which stack (0-indexed) to place the current container into")
-class ContainerObservation(BaseModel):
-    stack_states: List[List[Dict[str, Any]]]
-    current_container: Optional[Dict[str, Any]]
-    upcoming_retrievals: List[str]
-    rehandle_count: int
-    step: int
-    containers_remaining: int
-    n_stacks: int
-    max_height: int
-    difficulty: str
-    last_reward: float
-    done: bool
-class ContainerState(BaseModel):
-    stack_states: List[List[Dict[str, Any]]]
-    rehandle_count: int
-    step: int
-    score: float
-    difficulty: str
-    done: bool
-    total_reward: float

server/server.py DELETED Viewed

@@ -1,84 +0,0 @@
-import json
-import uuid
-from fastapi import FastAPI, WebSocket, WebSocketDisconnect
-from server.environment import ContainerYardEnv
-from server.models import ContainerAction
-app = FastAPI(title="Container Port OpenEnv", version="0.1.0")
-sessions: dict = {}
-@app.get("/ping")
-def ping():
-    return {"status": "ok", "env": "container-port-env"}
-@app.get("/health")
-def health():
-    return {
-        "status": "healthy",
-        "active_sessions": len(sessions),
-        "difficulties": ["easy", "medium", "hard"],
-    }
-@app.websocket("/ws")
-async def websocket_endpoint(websocket: WebSocket):
-    await websocket.accept()
-    session_id = str(uuid.uuid4())
-    sessions[session_id] = ContainerYardEnv(difficulty="medium")
-    try:
-        while True:
-            raw = await websocket.receive_text()
-            msg = json.loads(raw)
-            msg_type = msg.get("type")
-            env = sessions[session_id]
-            if msg_type == "reset":
-                difficulty = msg.get("difficulty", "medium")
-                if difficulty not in ["easy", "medium", "hard"]:
-                    difficulty = "medium"
-                sessions[session_id] = ContainerYardEnv(difficulty=difficulty)
-                env = sessions[session_id]
-                obs = env.reset()
-                await websocket.send_text(json.dumps({
-                    "type": "reset",
-                    "observation": obs,
-                    "reward": 0.0,
-                    "done": False,
-                    "session_id": session_id,
-                }))
-            elif msg_type == "step":
-                try:
-                    action = ContainerAction(**msg["action"])
-                    obs, reward, done, info = env.step(action.stack_index)
-                    await websocket.send_text(json.dumps({
-                        "type": "step",
-                        "observation": obs,
-                        "reward": reward,
-                        "done": done,
-                        "info": info,
-                    }))
-                except Exception as e:
-                    await websocket.send_text(json.dumps({
-                        "type": "error",
-                        "message": str(e),
-                    }))
-            elif msg_type == "state":
-                state = env.get_state()
-                await websocket.send_text(json.dumps({
-                    "type": "state",
-                    "state": state,
-                }))
-            else:
-                await websocket.send_text(json.dumps({
-                    "type": "error",
-                    "message": f"Unknown message type: {msg_type}",
-                }))
-    except WebSocketDisconnect:
-        pass
-    finally:
-        sessions.pop(session_id, None)

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,9 @@

+import sys
+from pathlib import Path
+sys.dont_write_bytecode = True
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))

tests/test_env.py DELETED Viewed

@@ -1,110 +0,0 @@
-import pytest
-from server.environment import ContainerYardEnv, DIFFICULTY_CONFIG
-@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
-def test_reset_returns_valid_obs(difficulty):
-    env = ContainerYardEnv(difficulty=difficulty, seed=42)
-    obs = env.reset()
-    cfg = DIFFICULTY_CONFIG[difficulty]
-    assert len(obs["stack_states"]) == cfg["n_stacks"]
-    assert obs["current_container"] is not None
-    assert obs["step"] == 0
-    assert obs["rehandle_count"] == 0
-    assert obs["difficulty"] == difficulty
-    assert obs["done"] == False
-@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
-def test_step_valid_action(difficulty):
-    env = ContainerYardEnv(difficulty=difficulty, seed=42)
-    env.reset()
-    obs, reward, done, info = env.step(0)
-    assert isinstance(reward, float)
-    assert obs["step"] == 1
-    assert len(obs["stack_states"][0]) == 1
-    assert "rehandles" in info
-@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
-def test_step_invalid_stack_index(difficulty):
-    env = ContainerYardEnv(difficulty=difficulty, seed=42)
-    env.reset()
-    obs, reward, done, info = env.step(999)
-    assert reward == -2.0
-    assert "error" in info
-    assert done == False
-@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
-def test_full_episode_completes(difficulty):
-    env = ContainerYardEnv(difficulty=difficulty, seed=42)
-    env.reset()
-    done = False
-    steps = 0
-    cfg = DIFFICULTY_CONFIG[difficulty]
-    n_stacks   = cfg["n_stacks"]
-    max_height = cfg["max_height"]
-    while not done:
-        stacks = env._observe()["stack_states"]
-        chosen = 0
-        for i in range(n_stacks):
-            if len(stacks[i]) < max_height:
-                chosen = i
-                break
-        _, _, done, _ = env.step(chosen)
-        steps += 1
-        assert steps < 1000, "Episode did not complete in time"
-    assert done
-@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
-def test_score_in_range(difficulty):
-    env = ContainerYardEnv(difficulty=difficulty, seed=42)
-    env.reset()
-    done = False
-    cfg = DIFFICULTY_CONFIG[difficulty]
-    n_stacks   = cfg["n_stacks"]
-    max_height = cfg["max_height"]
-    while not done:
-        stacks = env._observe()["stack_states"]
-        chosen = 0
-        for i in range(n_stacks):
-            if len(stacks[i]) < max_height:
-                chosen = i
-                break
-        _, _, done, _ = env.step(chosen)
-    score = env.score()
-    assert 0.0 <= score <= 1.0
-def test_lookahead_visibility():
-    easy_env = ContainerYardEnv(difficulty="easy", seed=42)
-    hard_env = ContainerYardEnv(difficulty="hard", seed=42)
-    easy_obs = easy_env.reset()
-    hard_obs = hard_env.reset()
-    assert len(easy_obs["upcoming_retrievals"]) > len(hard_obs["upcoming_retrievals"])
-    assert len(hard_obs["upcoming_retrievals"]) == 0
-def test_reward_is_dense():
-    env = ContainerYardEnv(difficulty="medium", seed=42)
-    env.reset()
-    rewards = []
-    done = False
-    step = 0
-    while not done and step < 20:
-        stacks = env._observe()["stack_states"]
-        chosen = step % 8
-        if len(stacks[chosen]) >= 5:
-            chosen = 0
-        _, r, done, _ = env.step(chosen)
-        rewards.append(r)
-        step += 1
-    nonzero = sum(1 for r in rewards if abs(r) > 1e-6)
-    assert nonzero >= len(rewards) * 0.5, f"Too many zero rewards: {rewards}"
-def test_no_double_retrieval():
-    """Retrieval pointer advances correctly — no container retrieved twice."""
-    env = ContainerYardEnv(difficulty="easy", seed=42)
-    env.reset()
-    seen_ids = set()
-    for _ in range(env.n_containers):
-        if env.done:
-            break
-        env.step(0 if len(env.stacks[0]) < env.max_height else 1)
-    # retrieval_pointer should be <= queue length
-    assert env.retrieval_pointer <= len(env.retrieval_queue)

tests/test_openenv_env.py ADDED Viewed

	@@ -0,0 +1,170 @@

+import pytest
+from fastapi.testclient import TestClient
+from models import ContainerAction
+from server.app import app
+from server.environment import ContainerYardEnvironment, DIFFICULTY_CONFIG
+def as_dict(observation):
+    return observation.model_dump() if hasattr(observation, "model_dump") else observation
+#  Unit tests: pure environment logic (no HTTP)
+@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
+def test_reset_returns_valid_obs(difficulty):
+    env = ContainerYardEnvironment()
+    obs = as_dict(env.reset(difficulty=difficulty, seed=42))
+    cfg = DIFFICULTY_CONFIG[difficulty]
+    assert len(obs["stack_states"]) == cfg["n_stacks"]
+    assert obs["current_container"] is not None
+    assert obs["step"] == 0
+    assert obs["rehandle_count"] == 0
+    assert obs["difficulty"] == difficulty
+    assert obs["done"] is False
+@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
+def test_step_valid_action(difficulty):
+    env = ContainerYardEnvironment()
+    env.reset(difficulty=difficulty, seed=42)
+    obs = as_dict(env.step(ContainerAction(stack_index=0)))
+    assert obs["step"] == 1
+    assert len(obs["stack_states"][0]) == 1
+    assert isinstance(obs["last_reward"], float)
+@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
+def test_step_invalid_action_penalized(difficulty):
+    env = ContainerYardEnvironment()
+    env.reset(difficulty=difficulty, seed=42)
+    obs = as_dict(env.step(ContainerAction(stack_index=999)))
+    assert obs["last_reward"] == -2.0
+def test_score_in_range():
+    env = ContainerYardEnvironment()
+    env.reset(difficulty="medium", seed=42)
+    done = False
+    while not done:
+        stacks = as_dict(env._observe())["stack_states"]
+        chosen = next(
+            (i for i, stack in enumerate(stacks) if len(stack) < env.max_height), 0
+        )
+        obs = as_dict(env.step(ContainerAction(stack_index=chosen)))
+        done = obs["done"]
+    assert 0.0 <= env.score() <= 1.0
+@pytest.mark.parametrize("difficulty", ["easy", "medium", "hard"])
+def test_full_episode_completes(difficulty):
+    env = ContainerYardEnvironment()
+    env.reset(difficulty=difficulty, seed=42)
+    cfg = DIFFICULTY_CONFIG[difficulty]
+    done = False
+    steps = 0
+    while not done:
+        stacks = as_dict(env._observe())["stack_states"]
+        chosen = next(
+            (i for i, s in enumerate(stacks) if len(s) < cfg["max_height"]), 0
+        )
+        obs = as_dict(env.step(ContainerAction(stack_index=chosen)))
+        done = obs["done"]
+        steps += 1
+        assert steps < 500, "Episode did not complete"
+    assert done is True
+def test_lookahead_visibility():
+    easy_env = ContainerYardEnvironment()
+    hard_env = ContainerYardEnvironment()
+    easy_obs = as_dict(easy_env.reset(difficulty="easy", seed=42))
+    hard_obs = as_dict(hard_env.reset(difficulty="hard", seed=42))
+    assert len(easy_obs["upcoming_retrievals"]) > len(hard_obs["upcoming_retrievals"])
+    assert len(hard_obs["upcoming_retrievals"]) == 0
+def test_reward_is_dense():
+    env = ContainerYardEnvironment()
+    env.reset(difficulty="medium", seed=42)
+    rewards = []
+    done = False
+    step = 0
+    while not done and step < 20:
+        stacks = as_dict(env._observe())["stack_states"]
+        chosen = step % env.n_stacks
+        if len(stacks[chosen]) >= env.max_height:
+            chosen = 0
+        obs = as_dict(env.step(ContainerAction(stack_index=chosen)))
+        rewards.append(obs["last_reward"])
+        done = obs["done"]
+        step += 1
+    nonzero = sum(1 for r in rewards if abs(r) > 1e-6)
+    assert nonzero >= len(rewards) * 0.5, f"Too many zero rewards: {rewards}"
+def test_no_double_retrieval():
+    env = ContainerYardEnvironment()
+    env.reset(difficulty="easy", seed=42)
+    for _ in range(env.n_containers):
+        if env.done:
+            break
+        stacks = env.stacks
+        chosen = next(
+            (i for i, s in enumerate(stacks) if len(s) < env.max_height), 0
+        )
+        env.step(ContainerAction(stack_index=chosen))
+    assert env.retrieval_pointer <= len(env.retrieval_queue)
+#  HTTP integration tests
+def test_health_route():
+    client = TestClient(app)
+    resp = client.get("/health")
+    assert resp.status_code == 200
+def test_web_ui_route():
+    client = TestClient(app, follow_redirects=True)
+    resp = client.get("/web")
+    assert resp.status_code == 200
+def test_http_reset_returns_observation():
+    client = TestClient(app)
+    resp = client.post("/reset", json={"difficulty": "easy"})
+    assert resp.status_code == 200
+    body = resp.json()
+    obs = body.get("observation", body)
+    assert obs.get("difficulty") == "easy"
+    assert obs.get("step") == 0
+    assert obs.get("containers_remaining") == DIFFICULTY_CONFIG["easy"]["n_containers"]
+def test_http_reset_then_step_preserves_state():
+    client = TestClient(app)
+    reset_resp = client.post("/web/reset", json={"difficulty": "easy"})
+    assert reset_resp.status_code == 200
+    reset_body = reset_resp.json()
+    session_id = reset_body.get("session_id") or reset_body.get("id")
+    obs_after_reset = reset_body.get("observation", reset_body)
+    assert obs_after_reset.get("step") == 0
+    n_containers = DIFFICULTY_CONFIG["easy"]["n_containers"]
+    assert obs_after_reset.get("containers_remaining") == n_containers
+    step_payload = {"action": {"stack_index": 0}}
+    if session_id:
+        step_payload["session_id"] = session_id
+    step_resp = client.post("/web/step", json=step_payload)
+    assert step_resp.status_code == 200
+    step_body = step_resp.json()
+    obs_after_step = step_body.get("observation", step_body)
+    assert obs_after_step.get("step") == 1
+    assert obs_after_step.get("containers_remaining") == n_containers - 1
+    assert len(obs_after_step["stack_states"][0]) == 1

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff