Spaces:

roshan5emerald
/

logiflow-rl

Sleeping

App Files Files Community

roshan5emerald commited on Apr 8

Commit

ddb6ffa

verified ·

1 Parent(s): a825e9d

Upload folder using huggingface_hub

Browse files

Files changed (24) hide show

Dockerfile +76 -0
README.md +194 -5
__init__.py +25 -0
client.py +63 -0
graders.py +40 -0
gym_env.py +118 -0
inference.py +138 -0
models.py +73 -0
openenv.yaml +6 -0
openenv_crisis_logistics_env.egg-info/PKG-INFO +12 -0
openenv_crisis_logistics_env.egg-info/SOURCES.txt +19 -0
openenv_crisis_logistics_env.egg-info/dependency_links.txt +1 -0
openenv_crisis_logistics_env.egg-info/entry_points.txt +2 -0
openenv_crisis_logistics_env.egg-info/requires.txt +8 -0
openenv_crisis_logistics_env.egg-info/top_level.txt +1 -0
pyproject.toml +47 -0
server/__init__.py +11 -0
server/app.py +116 -0
server/crisis_logistics_env_environment.py +208 -0
server/requirements.txt +6 -0
tasks.py +119 -0
test_engine.py +34 -0
train_and_evaluate.py +71 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,76 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=crisis_logistics_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Create an isolated environment and install dependencies cleanly inside the image.
+RUN rm -rf /app/env/.venv
+ENV UV_PROJECT_ENVIRONMENT=/app/env/.venv
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen; \
+    else \
+        uv sync; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,199 @@
 ---
-title: Logiflow Rl
-emoji: 🐢
-colorFrom: red
-colorTo: gray
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: LogiFlow-RL
+emoji: 🚚
+colorFrom: blue
+colorTo: green
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+  - logistics
+  - reinforcement-learning
 ---
+# LogiFlow-RL
+## Overview
+LogiFlow-RL is an OpenEnv environment for dynamic supply-chain workload balancing. The agent acts as a regional routing controller that must assign each incoming shipment to one of three hubs while keeping the network stable under predictable demand, flash-sale bursts, and cascading disruptions.
+This is a real-world task rather than a toy game: the agent is learning a simplified version of freight routing and capacity balancing, which are core operations in modern logistics networks.
+## Why this environment is useful
+Static routing rules such as round robin often fail under bursty demand because they react too slowly to overload risk. LogiFlow-RL provides a reproducible environment for training and evaluating routing agents that must reason about load, capacity, drift, and operational resilience.
+## Action Space
+The action is a typed Pydantic model:
+- `target_hub: int`
+  - `0` route shipment to Hub A
+  - `1` route shipment to Hub B
+  - `2` route shipment to Hub C
+## Observation Space
+The observation is a typed Pydantic model containing:
+- `task_id`: active benchmark task
+- `difficulty`: easy, medium, or hard
+- `objective`: natural-language task goal
+- `hub_loads`: current utilization for the 3 hubs
+- `drain_rates`: per-step clearing rates for non-selected hubs
+- `incoming_load`: next scheduled shipment size
+- `step_count` and `max_steps`
+- `overloaded_hubs`
+- `cumulative_score`: normalized task score in `[0.0, 1.0]`
+- `last_reward`: shaped reward for the previous action in `[0.0, 1.0]`
+- `event_label`: normal, flash_sale, weather_disruption, or completed
+## Reward Design
+The environment provides dense reward over the full trajectory:
+- Positive reward when the selected hub stays in the optimal 30-70 utilization zone
+- Penalties when the chosen hub drifts too far from the center of the target range
+- Penalties for global imbalance across hubs
+- Strong penalty when a routing decision causes overload above 100 utilization
+Each step reward is normalized to `[0.0, 1.0]`.
+## Tasks and Graders
+The environment ships with three deterministic benchmark tasks and programmatic graders.
+### Easy: `easy`
+Title: Steady-State Rebalancing
+Objective: Keep all hubs in the 30-70 utilization band during predictable daytime demand.
+Expected challenge: The agent should quickly learn basic balancing behavior and avoid unnecessary skew.
+### Medium: `medium`
+Title: Flash Sale Containment
+Objective: Absorb an afternoon flash-sale surge without letting any single hub overload.
+Expected challenge: The agent must react to bursty schedules and preserve balance under transient spikes.
+### Hard: `hard`
+Title: Cascading Disruption Recovery
+Objective: Stabilize the network through repeated surge waves and weather disruptions while preserving throughput.
+Expected challenge: The agent must recover from repeated shocks and still maintain usable balance.
+### Grader behavior
+Each task has a deterministic grader that returns a final score in `[0.0, 1.0]` based on:
+- Bottleneck avoidance
+- Average inter-hub balance gap
+- Average trajectory reward
+- Fraction of steps spent in the optimal operating zone
+The grader implementation is in [graders.py](C:\Users\rosha\crisis-logistics-env\crisis_logistics_env\graders.py).
+## Baselines
+The repo includes two reproducible local baselines:
+- `round_robin`
+- `heuristic`
+Run:
+```bash
+python train_and_evaluate.py
+```
+Current local benchmark scores:
+- `round_robin`
+  - easy: `0.800`
+  - medium: `0.886`
+  - hard: `0.863`
+- `heuristic`
+  - easy: `0.800`
+  - medium: `0.957`
+  - hard: `0.900`
+The submission baseline entrypoint is [inference.py](C:\Users\rosha\crisis-logistics-env\inference.py), which uses the OpenAI client and emits the required structured logs.
+## Project Structure
+```text
+crisis_logistics_env/
+├── __init__.py
+├── client.py
+├── graders.py
+├── gym_env.py
+├── models.py
+├── openenv.yaml
+├── pyproject.toml
+├── README.md
+├── tasks.py
+├── test_engine.py
+├── train_and_evaluate.py
+└── server/
+    ├── app.py
+    ├── crisis_logistics_env_environment.py
+    └── Dockerfile
+```
+## Setup
+### Local run
+```bash
+python test_engine.py
+python train_and_evaluate.py
+```
+### Baseline inference
+Set the required environment variables:
+- `API_BASE_URL`
+- `MODEL_NAME`
+- `HF_TOKEN` or `OPENAI_API_KEY`
+Then run from the repository root:
+```bash
+python inference.py
+```
+### Local server
+```bash
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+### Docker
+```bash
+docker build -t logiflow-rl -f server/Dockerfile .
+docker run -p 8000:8000 logiflow-rl
+```
+## Validation Notes
+The environment includes:
+- Typed action, observation, and state models
+- Deterministic tasks with increasing difficulty
+- Programmatic graders with scores in `[0.0, 1.0]`
+- Dense partial-progress reward shaping
+- Root-level `inference.py`
+- Dockerfile and OpenEnv manifest
+## Submission One-Liner
+LogiFlow-RL is an OpenEnv benchmark for dynamic freight routing where an agent must balance shipments across regional hubs under flash-sale and disruption scenarios, with deterministic tasks and normalized graders for reproducible evaluation.

__init__.py ADDED Viewed

	@@ -0,0 +1,25 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Crisis Logistics Env Environment."""
+from .client import CrisisLogisticsEnv
+from .graders import EpisodeMetrics, grade_episode
+from .gym_env import LogiFlowGymEnv
+from .models import CrisisLogisticsAction, CrisisLogisticsObservation, CrisisLogisticsState
+from .tasks import get_task, list_tasks
+__all__ = [
+    "CrisisLogisticsAction",
+    "CrisisLogisticsObservation",
+    "CrisisLogisticsState",
+    "CrisisLogisticsEnv",
+    "LogiFlowGymEnv",
+    "EpisodeMetrics",
+    "grade_episode",
+    "get_task",
+    "list_tasks",
+]

client.py ADDED Viewed

	@@ -0,0 +1,63 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Client for the LogiFlow-RL environment server."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import CrisisLogisticsAction, CrisisLogisticsObservation, CrisisLogisticsState
+class CrisisLogisticsEnv(
+    EnvClient[CrisisLogisticsAction, CrisisLogisticsObservation, CrisisLogisticsState]
+):
+    """Thin client that talks to the HTTP or WebSocket server."""
+    def _step_payload(self, action: CrisisLogisticsAction) -> Dict:
+        return {"target_hub": action.target_hub}
+    def _parse_result(self, payload: Dict) -> StepResult[CrisisLogisticsObservation]:
+        obs_data = payload.get("observation", {})
+        observation = CrisisLogisticsObservation(
+            task_id=obs_data.get("task_id", "easy"),
+            difficulty=obs_data.get("difficulty", "easy"),
+            objective=obs_data.get("objective", ""),
+            hub_loads=obs_data.get("hub_loads", [0.0, 0.0, 0.0]),
+            drain_rates=obs_data.get("drain_rates", [6.0, 5.0, 4.0]),
+            incoming_load=obs_data.get("incoming_load", 0.0),
+            step_count=obs_data.get("step_count", 0),
+            max_steps=obs_data.get("max_steps", 100),
+            overloaded_hubs=obs_data.get("overloaded_hubs", 0),
+            cumulative_score=obs_data.get("cumulative_score", 0.0),
+            last_reward=obs_data.get("last_reward", 0.0),
+            event_label=obs_data.get("event_label", "normal"),
+            message=obs_data.get("message", ""),
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> CrisisLogisticsState:
+        return CrisisLogisticsState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+            task_id=payload.get("task_id", "easy"),
+            difficulty=payload.get("difficulty", "easy"),
+            hub_loads=payload.get("hub_loads", [0.0, 0.0, 0.0]),
+            incoming_index=payload.get("incoming_index", 0),
+            bottlenecks=payload.get("bottlenecks", 0),
+            score=payload.get("score", 0.0),
+        )

graders.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from __future__ import annotations
+from dataclasses import dataclass
+try:
+    from .tasks import TaskConfig
+except ImportError:
+    from tasks import TaskConfig
+@dataclass
+class EpisodeMetrics:
+    total_reward: float
+    average_reward: float
+    bottlenecks: int
+    optimal_steps: int
+    average_balance_gap: float
+    throughput_served: float
+    steps_completed: int
+def grade_episode(task: TaskConfig, metrics: EpisodeMetrics) -> float:
+    bottleneck_score = max(
+        0.0,
+        1.0 - max(0, metrics.bottlenecks - task.target_bottlenecks) / max(1, task.max_steps / 4),
+    )
+    balance_score = max(
+        0.0,
+        1.0 - max(0.0, metrics.average_balance_gap - task.target_balance_gap) / 40.0,
+    )
+    efficiency_score = min(1.0, metrics.average_reward / max(task.minimum_avg_reward, 0.01))
+    stability_score = metrics.optimal_steps / task.max_steps
+    final_score = (
+        0.35 * bottleneck_score
+        + 0.25 * balance_score
+        + 0.20 * efficiency_score
+        + 0.20 * stability_score
+    )
+    return round(max(0.0, min(1.0, final_score)), 3)

gym_env.py ADDED Viewed

	@@ -0,0 +1,118 @@

+from __future__ import annotations
+import random
+from typing import Any
+import gymnasium as gym
+import numpy as np
+from gymnasium import spaces
+class LogiFlowGymEnv(gym.Env):
+    """Gymnasium wrapper for the 3-hub logistics balancing problem."""
+    metadata = {"render_modes": ["human"], "render_fps": 4}
+    def __init__(self, max_steps: int = 100):
+        super().__init__()
+        self.max_steps = max_steps
+        self.base_drain_rates = np.array([8.0, 7.0, 6.0], dtype=np.float32)
+        self.action_space = spaces.Discrete(3)
+        self.observation_space = spaces.Box(
+            low=np.array([0, 0, 0, 0, 0, 0, 0, 0], dtype=np.float32),
+            high=np.array([150, 150, 150, 20, 20, 20, 40, 2], dtype=np.float32),
+            dtype=np.float32,
+        )
+        self.reset()
+    def reset(
+        self, *, seed: int | None = None, options: dict[str, Any] | None = None
+    ) -> tuple[np.ndarray, dict[str, Any]]:
+        super().reset(seed=seed)
+        if seed is not None:
+            random.seed(seed)
+        self.hub_loads = np.array([24.0, 36.0, 30.0], dtype=np.float32)
+        self.drain_rates = self.base_drain_rates.copy()
+        self.step_count = 0
+        self.event_label = "normal"
+        self.incoming_load = self._sample_incoming_load()
+        return self._get_obs(), self._get_info(0.0)
+    def step(self, action: int):
+        self.step_count += 1
+        for idx in range(3):
+            if idx != action:
+                self.hub_loads[idx] = max(0.0, self.hub_loads[idx] - self.drain_rates[idx])
+        self.hub_loads[action] += self.incoming_load
+        reward = self._calculate_reward(action)
+        self.event_label = self._sample_event_label()
+        self.incoming_load = self._sample_incoming_load(self.event_label)
+        terminated = False
+        truncated = self.step_count >= self.max_steps
+        return self._get_obs(), reward, terminated, truncated, self._get_info(reward)
+    def render(self):
+        print(
+            f"step={self.step_count} loads={self.hub_loads.round(1).tolist()} "
+            f"incoming={self.incoming_load:.1f} event={self.event_label}"
+        )
+    def _get_obs(self) -> np.ndarray:
+        event_id = {"normal": 0.0, "weather_disruption": 1.0, "flash_sale": 2.0}[self.event_label]
+        return np.array(
+            [
+                self.hub_loads[0],
+                self.hub_loads[1],
+                self.hub_loads[2],
+                self.drain_rates[0],
+                self.drain_rates[1],
+                self.drain_rates[2],
+                self.incoming_load,
+                event_id,
+            ],
+            dtype=np.float32,
+        )
+    def _get_info(self, reward: float) -> dict[str, Any]:
+        return {
+            "reward": reward,
+            "overloaded_hubs": int(np.sum(self.hub_loads > 100.0)),
+            "event_label": self.event_label,
+        }
+    def _sample_event_label(self) -> str:
+        roll = random.random()
+        if roll < 0.15:
+            return "flash_sale"
+        if roll < 0.25:
+            return "weather_disruption"
+        return "normal"
+    def _sample_incoming_load(self, event_label: str | None = None) -> float:
+        label = event_label or self.event_label
+        if label == "flash_sale":
+            return random.uniform(16.0, 24.0)
+        if label == "weather_disruption":
+            return random.uniform(11.0, 18.0)
+        return random.uniform(6.0, 12.0)
+    def _calculate_reward(self, action: int) -> float:
+        reward = 0.5
+        target_load = self.hub_loads[action]
+        if 30.0 <= target_load <= 70.0:
+            reward += 5.0
+        else:
+            reward -= min(abs(target_load - 50.0) / 15.0, 4.0)
+        if target_load > 100.0:
+            reward -= 20.0
+        reward -= float(np.max(self.hub_loads) - np.min(self.hub_loads)) / 50.0
+        reward -= float(np.sum(self.hub_loads > 100.0)) * 3.0
+        return round(reward, 2)

inference.py ADDED Viewed

	@@ -0,0 +1,138 @@

+import os
+from typing import List, Optional
+from openai import OpenAI
+from crisis_logistics_env import CrisisLogisticsAction
+from crisis_logistics_env.server.crisis_logistics_env_environment import (
+    CrisisLogisticsEnvironment,
+    choose_balancing_action,
+)
+from crisis_logistics_env.tasks import list_tasks
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+BENCHMARK = os.getenv("BENCHMARK") or "logiflow_rl"
+MAX_STEPS_OVERRIDE = os.getenv("MAX_STEPS")
+SYSTEM_PROMPT = (
+    "You are controlling a logistics routing environment with 3 hubs. "
+    "Reply with exactly one digit: 0, 1, or 2. "
+    "Choose the hub that best keeps the network balanced, avoids overload above 100, "
+    "and keeps hubs near the 30-70 utilization band."
+)
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+def build_user_prompt(task_title: str, objective: str, step: int, hub_loads: List[float], incoming_load: float, event_label: str, score: float) -> str:
+    return (
+        f"Task: {task_title}\n"
+        f"Objective: {objective}\n"
+        f"Step: {step}\n"
+        f"Hub loads: {hub_loads}\n"
+        f"Incoming shipment: {incoming_load}\n"
+        f"Traffic event: {event_label}\n"
+        f"Current score: {score:.3f}\n"
+        "Return only one hub id: 0, 1, or 2."
+    )
+def choose_action_with_model(client: OpenAI, prompt: str) -> int:
+    response = client.chat.completions.create(
+        model=MODEL_NAME,
+        temperature=0.0,
+        max_tokens=4,
+        messages=[
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": prompt},
+        ],
+    )
+    text = (response.choices[0].message.content or "").strip()
+    if text and text[0] in {"0", "1", "2"}:
+        return int(text[0])
+    raise ValueError(f"invalid_model_output:{text}")
+def run_task(task_id: str, client: Optional[OpenAI]) -> float:
+    env = CrisisLogisticsEnvironment()
+    observation = env.reset(task_id=task_id)
+    rewards: List[float] = []
+    last_error: Optional[str] = None
+    max_steps = min(
+        env.task.max_steps,
+        int(MAX_STEPS_OVERRIDE) if MAX_STEPS_OVERRIDE else env.task.max_steps,
+    )
+    log_start(task_id, BENCHMARK, MODEL_NAME)
+    try:
+        while not observation.done and observation.step_count < max_steps:
+            action_value = choose_balancing_action(observation)
+            if client is not None:
+                prompt = build_user_prompt(
+                    env.task.title,
+                    observation.objective,
+                    observation.step_count + 1,
+                    observation.hub_loads,
+                    observation.incoming_load,
+                    observation.event_label,
+                    observation.cumulative_score,
+                )
+                try:
+                    action_value = choose_action_with_model(client, prompt)
+                    last_error = None
+                except Exception as exc:
+                    last_error = str(exc)
+            action = CrisisLogisticsAction(target_hub=action_value)
+            observation = env.step(action)
+            reward = float(observation.reward or 0.0)
+            rewards.append(reward)
+            log_step(
+                step=observation.step_count,
+                action=f"route({action_value})",
+                reward=reward,
+                done=observation.done,
+                error=last_error,
+            )
+        final_score = observation.cumulative_score
+        success = final_score >= 0.65
+        return_score = final_score
+        log_end(success, observation.step_count, return_score, rewards)
+        return return_score
+    except Exception:
+        log_end(False, observation.step_count, 0.0, rewards)
+        raise
+def main() -> None:
+    client = OpenAI(api_key=API_KEY, base_url=API_BASE_URL) if API_KEY else None
+    for task in list_tasks():
+        run_task(task.task_id, client)
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,73 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Data models for the LogiFlow-RL environment."""
+from typing import List
+from openenv.core.env_server.types import Action, Observation, State
+from pydantic import Field
+class CrisisLogisticsAction(Action):
+    """Route the next shipment to one of the available hubs."""
+    target_hub: int = Field(
+        ...,
+        ge=0,
+        le=2,
+        description="Index of the hub that should receive the incoming shipment: 0, 1, or 2.",
+    )
+class CrisisLogisticsObservation(Observation):
+    """Current state of the regional hub network."""
+    task_id: str = Field(default="easy", description="Active benchmark task identifier.")
+    difficulty: str = Field(default="easy", description="Difficulty label for the active task.")
+    objective: str = Field(default="", description="Task objective visible to the agent.")
+    hub_loads: List[float] = Field(
+        default_factory=lambda: [0.0, 0.0, 0.0],
+        description="Current utilization percentage for each hub.",
+    )
+    drain_rates: List[float] = Field(
+        default_factory=lambda: [6.0, 5.0, 4.0],
+        description="How much load each hub clears per timestep when not selected.",
+    )
+    incoming_load: float = Field(
+        default=0.0,
+        description="Load percentage of the shipment that must be assigned this step.",
+    )
+    step_count: int = Field(default=0, description="Current step in the episode.")
+    max_steps: int = Field(default=100, description="Maximum episode length.")
+    overloaded_hubs: int = Field(
+        default=0,
+        description="Number of hubs currently above 100 percent utilization.",
+    )
+    cumulative_score: float = Field(
+        default=0.0,
+        description="Normalized score in the range [0.0, 1.0] for progress so far in the episode.",
+    )
+    last_reward: float = Field(default=0.0, description="Reward from the previous action.")
+    event_label: str = Field(
+        default="normal",
+        description="Traffic condition for the current shipment, e.g. normal or flash_sale.",
+    )
+    message: str = Field(default="", description="Human-readable state summary.")
+class CrisisLogisticsState(State):
+    """Internal environment state exposed for validation and debugging."""
+    task_id: str = Field(default="easy", description="Active task identifier.")
+    difficulty: str = Field(default="easy", description="Task difficulty label.")
+    hub_loads: List[float] = Field(
+        default_factory=lambda: [0.0, 0.0, 0.0],
+        description="Current utilization values for each hub.",
+    )
+    incoming_index: int = Field(default=0, description="Index of the next scheduled shipment.")
+    bottlenecks: int = Field(default=0, description="Total bottlenecks encountered this episode.")
+    score: float = Field(default=0.0, description="Current normalized task score.")

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: logiflow_rl
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

openenv_crisis_logistics_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,12 @@

+Metadata-Version: 2.4
+Name: openenv-crisis_logistics_env
+Version: 0.1.0
+Summary: LogiFlow-RL environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: openenv-core[core]>=0.2.2
+Requires-Dist: numpy>=1.26.0
+Requires-Dist: gymnasium>=0.29.0
+Requires-Dist: openai>=2.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_crisis_logistics_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+README.md
+pyproject.toml
+./__init__.py
+./client.py
+./graders.py
+./gym_env.py
+./models.py
+./tasks.py
+./test_engine.py
+./train_and_evaluate.py
+openenv_crisis_logistics_env.egg-info/PKG-INFO
+openenv_crisis_logistics_env.egg-info/SOURCES.txt
+openenv_crisis_logistics_env.egg-info/dependency_links.txt
+openenv_crisis_logistics_env.egg-info/entry_points.txt
+openenv_crisis_logistics_env.egg-info/requires.txt
+openenv_crisis_logistics_env.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/crisis_logistics_env_environment.py

openenv_crisis_logistics_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_crisis_logistics_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = crisis_logistics_env.server.app:main

openenv_crisis_logistics_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+openenv-core[core]>=0.2.2
+numpy>=1.26.0
+gymnasium>=0.29.0
+openai>=2.0.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_crisis_logistics_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ crisis_logistics_env

pyproject.toml ADDED Viewed

	@@ -0,0 +1,47 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-crisis_logistics_env"
+version = "0.1.0"
+description = "LogiFlow-RL environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    "numpy>=1.26.0",
+    "gymnasium>=0.29.0",
+    "openai>=2.0.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m crisis_logistics_env.server.app
+server = "crisis_logistics_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["crisis_logistics_env", "crisis_logistics_env.server"]
+package-dir = { "crisis_logistics_env" = ".", "crisis_logistics_env.server" = "server" }

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Crisis Logistics Env environment server components."""
+from .crisis_logistics_env_environment import CrisisLogisticsEnvironment
+__all__ = ["CrisisLogisticsEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,116 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""FastAPI application for the LogiFlow-RL OpenEnv environment."""
+from fastapi import FastAPI
+try:
+    from openenv.core.env_server.types import (
+        EnvironmentMetadata,
+        HealthResponse,
+        HealthStatus,
+        ResetRequest,
+        ResetResponse,
+        SchemaResponse,
+        StepRequest,
+        StepResponse,
+    )
+    from ..models import (
+        CrisisLogisticsAction,
+        CrisisLogisticsObservation,
+        CrisisLogisticsState,
+    )
+    from .crisis_logistics_env_environment import CrisisLogisticsEnvironment
+except ImportError:
+    from openenv.core.env_server.types import (
+        EnvironmentMetadata,
+        HealthResponse,
+        HealthStatus,
+        ResetRequest,
+        ResetResponse,
+        SchemaResponse,
+        StepRequest,
+        StepResponse,
+    )
+    from models import (
+        CrisisLogisticsAction,
+        CrisisLogisticsObservation,
+        CrisisLogisticsState,
+    )
+    from server.crisis_logistics_env_environment import CrisisLogisticsEnvironment
+app = FastAPI(
+    title="OpenEnv Environment HTTP API",
+    version="1.0.0",
+    description=(
+        "HTTP API for interacting with the LogiFlow-RL environment through "
+        "a standardized OpenEnv-style interface."
+    ),
+)
+env = CrisisLogisticsEnvironment()
+@app.post("/reset", response_model=ResetResponse, tags=["Environment Control"])
+async def reset_environment(request: ResetRequest) -> ResetResponse:
+    task_id = getattr(request, "task_id", None) or "easy"
+    observation = env.reset(seed=request.seed, episode_id=request.episode_id, task_id=task_id)
+    return ResetResponse(
+        observation=observation.model_dump(),
+        reward=float(observation.reward or 0.0),
+        done=observation.done,
+    )
+@app.post("/step", response_model=StepResponse, tags=["Environment Control"])
+async def step_environment(request: StepRequest) -> StepResponse:
+    action = CrisisLogisticsAction(**request.action)
+    observation = env.step(action, timeout_s=request.timeout_s)
+    return StepResponse(
+        observation=observation.model_dump(),
+        reward=float(observation.reward or 0.0),
+        done=observation.done,
+    )
+@app.get("/state", response_model=CrisisLogisticsState, tags=["State Management"])
+async def get_state() -> CrisisLogisticsState:
+    return env.state
+@app.get("/metadata", response_model=EnvironmentMetadata, tags=["Environment Info"])
+async def get_metadata() -> EnvironmentMetadata:
+    return EnvironmentMetadata(
+        name="LogiFlow-RL",
+        description="Deterministic logistics routing benchmark with easy, medium, and hard tasks.",
+        version="1.0.0",
+    )
+@app.get("/schema", response_model=SchemaResponse, tags=["Schema"])
+async def get_schema() -> SchemaResponse:
+    return SchemaResponse(
+        action=CrisisLogisticsAction.model_json_schema(),
+        observation=CrisisLogisticsObservation.model_json_schema(),
+        state=CrisisLogisticsState.model_json_schema(),
+    )
+@app.get("/health", response_model=HealthResponse, tags=["Health"])
+async def health() -> HealthResponse:
+    return HealthResponse(status=HealthStatus.HEALTHY)
+def main() -> None:
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/crisis_logistics_env_environment.py ADDED Viewed

	@@ -0,0 +1,208 @@

+from __future__ import annotations
+import uuid
+from typing import List
+from openenv.core.env_server import Environment
+try:
+    from ..graders import EpisodeMetrics, grade_episode
+    from ..models import (
+        CrisisLogisticsAction,
+        CrisisLogisticsObservation,
+        CrisisLogisticsState,
+    )
+    from ..tasks import get_task, list_tasks
+except ImportError:
+    from graders import EpisodeMetrics, grade_episode
+    from models import (
+        CrisisLogisticsAction,
+        CrisisLogisticsObservation,
+        CrisisLogisticsState,
+    )
+    from tasks import get_task, list_tasks
+class CrisisLogisticsEnvironment(
+    Environment[CrisisLogisticsAction, CrisisLogisticsObservation, CrisisLogisticsState]
+):
+    """
+    LogiFlow-RL: a deterministic supply-chain balancing benchmark.
+    The environment exposes three benchmark tasks with increasing difficulty.
+    Each episode provides a fixed shipment schedule so scores are reproducible.
+    """
+    def __init__(self):
+        super().__init__()
+        self.optimal_zone = (30.0, 70.0)
+        self.available_tasks = list_tasks()
+        self.reset(task_id="easy")
+    def reset(
+        self, seed=None, episode_id=None, task_id: str = "easy", **kwargs
+    ) -> CrisisLogisticsObservation:
+        self.task = get_task(task_id)
+        self.episode_id = episode_id or str(uuid.uuid4())
+        self.hub_loads = self.task.initial_loads[:]
+        self.drain_rates = self.task.drain_rates[:]
+        self.step_count = 0
+        self.schedule_index = 0
+        self.done = False
+        self.last_reward = 0.0
+        self.total_reward = 0.0
+        self.optimal_steps = 0
+        self.bottlenecks = 0
+        self.balance_gap_history: List[float] = []
+        self.throughput_served = 0.0
+        self.event_label = self.task.event_schedule[0]
+        self.incoming_load = self.task.incoming_schedule[0]
+        self.score = 0.0
+        return self._get_observation(
+            f"Task '{self.task.title}' initialized. Route the scheduled shipment stream."
+        )
+    def step(
+        self, action: CrisisLogisticsAction, timeout_s=None, **kwargs
+    ) -> CrisisLogisticsObservation:
+        if self.done:
+            observation = self._get_observation("Episode already finished.")
+            observation.reward = 0.0
+            return observation
+        selected = action.target_hub
+        if selected not in (0, 1, 2):
+            observation = self._get_observation("Invalid hub selected.")
+            observation.reward = 0.0
+            return observation
+        self.step_count += 1
+        for hub_index in range(3):
+            if hub_index != selected:
+                self.hub_loads[hub_index] = max(
+                    0.0, self.hub_loads[hub_index] - self.drain_rates[hub_index]
+                )
+        self.hub_loads[selected] += self.incoming_load
+        self.throughput_served += self.incoming_load
+        overloaded = any(load > 100.0 for load in self.hub_loads)
+        if overloaded:
+            self.bottlenecks += 1
+        reward = self._calculate_step_reward(selected)
+        self.total_reward += reward
+        self.last_reward = reward
+        if self._is_optimal_state():
+            self.optimal_steps += 1
+        self.balance_gap_history.append(max(self.hub_loads) - min(self.hub_loads))
+        if self.step_count >= self.task.max_steps:
+            self.done = True
+        else:
+            self.schedule_index = self.step_count
+            self.event_label = self.task.event_schedule[self.schedule_index]
+            self.incoming_load = self.task.incoming_schedule[self.schedule_index]
+        self.score = self._compute_score()
+        observation = self._get_observation(
+            f"Shipment routed to Hub {selected}. Loads: {[round(load, 1) for load in self.hub_loads]}"
+        )
+        observation.reward = reward
+        observation.done = self.done
+        return observation
+    def _calculate_step_reward(self, selected: int) -> float:
+        target_load = self.hub_loads[selected]
+        center = 50.0
+        zone_bonus = 1.0 if 30.0 <= target_load <= 70.0 else 0.0
+        load_penalty = min(abs(target_load - center) / 50.0, 1.0)
+        balance_gap = max(self.hub_loads) - min(self.hub_loads)
+        balance_penalty = min(balance_gap / 100.0, 1.0)
+        overload_penalty = 1.0 if target_load > 100.0 else 0.0
+        reward = 0.55 + 0.35 * zone_bonus - 0.25 * load_penalty - 0.20 * balance_penalty - 0.45 * overload_penalty
+        return round(max(0.0, min(1.0, reward)), 2)
+    def _is_optimal_state(self) -> bool:
+        low, high = self.optimal_zone
+        return all(low <= load <= high for load in self.hub_loads)
+    def _compute_score(self) -> float:
+        metrics = EpisodeMetrics(
+            total_reward=self.total_reward,
+            average_reward=self.total_reward / max(self.step_count, 1),
+            bottlenecks=self.bottlenecks,
+            optimal_steps=self.optimal_steps,
+            average_balance_gap=sum(self.balance_gap_history) / max(len(self.balance_gap_history), 1),
+            throughput_served=self.throughput_served,
+            steps_completed=self.step_count,
+        )
+        return grade_episode(self.task, metrics)
+    def _get_observation(self, message: str) -> CrisisLogisticsObservation:
+        overloaded_hubs = sum(1 for load in self.hub_loads if load > 100.0)
+        next_incoming = 0.0 if self.done else self.incoming_load
+        next_event = "completed" if self.done else self.event_label
+        return CrisisLogisticsObservation(
+            task_id=self.task.task_id,
+            difficulty=self.task.difficulty,
+            objective=self.task.objective,
+            hub_loads=[round(load, 2) for load in self.hub_loads],
+            drain_rates=self.drain_rates[:],
+            incoming_load=next_incoming,
+            step_count=self.step_count,
+            max_steps=self.task.max_steps,
+            overloaded_hubs=overloaded_hubs,
+            cumulative_score=self.score,
+            last_reward=self.last_reward,
+            event_label=next_event,
+            message=message,
+            reward=self.last_reward,
+            done=self.done,
+            metadata={
+                "title": self.task.title,
+                "available_tasks": [task.task_id for task in self.available_tasks],
+                "bottlenecks": self.bottlenecks,
+            },
+        )
+    @property
+    def state(self) -> CrisisLogisticsState:
+        return CrisisLogisticsState(
+            episode_id=self.episode_id,
+            task_id=self.task.task_id,
+            difficulty=self.task.difficulty,
+            step_count=self.step_count,
+            hub_loads=[round(load, 2) for load in self.hub_loads],
+            incoming_index=self.schedule_index,
+            bottlenecks=self.bottlenecks,
+            score=self.score,
+        )
+def choose_balancing_action(observation: CrisisLogisticsObservation) -> int:
+    """Deterministic heuristic baseline used for smoke tests and offline fallback."""
+    best_idx = 0
+    best_score = float("inf")
+    for index in range(3):
+        projected = observation.hub_loads[:]
+        for drain_idx in range(3):
+            if drain_idx != index:
+                projected[drain_idx] = max(
+                    0.0, projected[drain_idx] - observation.drain_rates[drain_idx]
+                )
+        projected[index] += observation.incoming_load
+        balance_gap = max(projected) - min(projected)
+        overload_penalty = 40.0 if projected[index] > 100.0 else 0.0
+        zone_penalty = sum(abs(load - 50.0) for load in projected) / 3.0
+        projected_score = overload_penalty + balance_gap + zone_penalty
+        if projected_score < best_score:
+            best_score = projected_score
+            best_idx = index
+    return best_idx

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0
+numpy>=1.26.0
+gymnasium>=0.29.0
+openai>=2.0.0

tasks.py ADDED Viewed

	@@ -0,0 +1,119 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from typing import List
+@dataclass(frozen=True)
+class TaskConfig:
+    task_id: str
+    difficulty: str
+    title: str
+    objective: str
+    max_steps: int
+    initial_loads: List[float]
+    drain_rates: List[float]
+    incoming_schedule: List[float]
+    event_schedule: List[str]
+    target_bottlenecks: int
+    target_balance_gap: float
+    minimum_avg_reward: float
+TASKS: dict[str, TaskConfig] = {
+    "easy": TaskConfig(
+        task_id="easy",
+        difficulty="easy",
+        title="Steady-State Rebalancing",
+        objective="Keep all hubs in the 30-70 utilization band during predictable daytime demand.",
+        max_steps=12,
+        initial_loads=[28.0, 42.0, 35.0],
+        drain_rates=[8.0, 7.0, 6.0],
+        incoming_schedule=[9.0, 8.0, 11.0, 10.0, 9.0, 12.0, 8.0, 10.0, 9.0, 11.0, 8.0, 10.0],
+        event_schedule=[
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+        ],
+        target_bottlenecks=0,
+        target_balance_gap=25.0,
+        minimum_avg_reward=0.55,
+    ),
+    "medium": TaskConfig(
+        task_id="medium",
+        difficulty="medium",
+        title="Flash Sale Containment",
+        objective="Absorb an afternoon flash-sale surge without letting any single hub overload.",
+        max_steps=14,
+        initial_loads=[34.0, 48.0, 31.0],
+        drain_rates=[8.0, 7.0, 6.0],
+        incoming_schedule=[10.0, 12.0, 18.0, 22.0, 20.0, 16.0, 11.0, 13.0, 17.0, 14.0, 10.0, 9.0, 8.0, 10.0],
+        event_schedule=[
+            "normal",
+            "normal",
+            "flash_sale",
+            "flash_sale",
+            "flash_sale",
+            "weather_disruption",
+            "normal",
+            "normal",
+            "flash_sale",
+            "weather_disruption",
+            "normal",
+            "normal",
+            "normal",
+            "normal",
+        ],
+        target_bottlenecks=0,
+        target_balance_gap=28.0,
+        minimum_avg_reward=0.48,
+    ),
+    "hard": TaskConfig(
+        task_id="hard",
+        difficulty="hard",
+        title="Cascading Disruption Recovery",
+        objective="Stabilize the network through repeated surge waves and weather disruptions while preserving throughput.",
+        max_steps=16,
+        initial_loads=[45.0, 57.0, 40.0],
+        drain_rates=[8.0, 7.0, 6.0],
+        incoming_schedule=[16.0, 22.0, 19.0, 24.0, 18.0, 20.0, 23.0, 14.0, 25.0, 17.0, 15.0, 21.0, 13.0, 18.0, 14.0, 12.0],
+        event_schedule=[
+            "weather_disruption",
+            "flash_sale",
+            "weather_disruption",
+            "flash_sale",
+            "normal",
+            "weather_disruption",
+            "flash_sale",
+            "normal",
+            "flash_sale",
+            "weather_disruption",
+            "normal",
+            "flash_sale",
+            "normal",
+            "weather_disruption",
+            "normal",
+            "normal",
+        ],
+        target_bottlenecks=1,
+        target_balance_gap=35.0,
+        minimum_avg_reward=0.40,
+    ),
+}
+def list_tasks() -> List[TaskConfig]:
+    return [TASKS["easy"], TASKS["medium"], TASKS["hard"]]
+def get_task(task_id: str) -> TaskConfig:
+    return TASKS[task_id]

test_engine.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import asyncio
+from models import CrisisLogisticsAction
+from server.crisis_logistics_env_environment import (
+    CrisisLogisticsEnvironment,
+    choose_balancing_action,
+)
+async def test_run():
+    print("--- BOOTING LOGIFLOW-RL SIMULATOR ---")
+    env = CrisisLogisticsEnvironment()
+    print("\n[INIT] Resetting Environment...")
+    obs = env.reset(task_id="medium")
+    print(f"Task: {obs.task_id} ({obs.difficulty})")
+    print(f"Objective: {obs.objective}")
+    print(f"Hub Loads: {obs.hub_loads}")
+    print(f"Incoming Load: {obs.incoming_load}")
+    print(f"Event Type: {obs.event_label}")
+    suggested_hub = choose_balancing_action(obs)
+    print(f"\n[POLICY] Baseline sends shipment to hub {suggested_hub}...")
+    obs = env.step(CrisisLogisticsAction(target_hub=suggested_hub))
+    print(f"Updated Loads: {obs.hub_loads}")
+    print(f"Reward Received: {obs.reward}")
+    print(f"Current Score: {obs.cumulative_score}")
+    print(f"Next Incoming Load: {obs.incoming_load}")
+    print(f"Message: {obs.message}")
+if __name__ == "__main__":
+    asyncio.run(test_run())

train_and_evaluate.py ADDED Viewed

	@@ -0,0 +1,71 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from graders import EpisodeMetrics, grade_episode
+from models import CrisisLogisticsAction
+from server.crisis_logistics_env_environment import (
+    CrisisLogisticsEnvironment,
+    choose_balancing_action,
+)
+from tasks import list_tasks
+@dataclass
+class EpisodeSummary:
+    task_id: str
+    total_reward: float
+    score: float
+    bottlenecks: int
+def run_policy(task_id: str, policy: str) -> EpisodeSummary:
+    env = CrisisLogisticsEnvironment()
+    observation = env.reset(task_id=task_id)
+    round_robin_step = 0
+    while not observation.done:
+        if policy == "round_robin":
+            action = round_robin_step % 3
+            round_robin_step += 1
+        else:
+            action = choose_balancing_action(observation)
+        observation = env.step(CrisisLogisticsAction(target_hub=action))
+    metrics = EpisodeMetrics(
+        total_reward=env.total_reward,
+        average_reward=env.total_reward / max(env.step_count, 1),
+        bottlenecks=env.bottlenecks,
+        optimal_steps=env.optimal_steps,
+        average_balance_gap=sum(env.balance_gap_history) / max(len(env.balance_gap_history), 1),
+        throughput_served=env.throughput_served,
+        steps_completed=env.step_count,
+    )
+    score = grade_episode(env.task, metrics)
+    return EpisodeSummary(
+        task_id=task_id,
+        total_reward=round(env.total_reward, 2),
+        score=score,
+        bottlenecks=env.bottlenecks,
+    )
+def main() -> None:
+    print("LogiFlow-RL Benchmarks")
+    print("----------------------")
+    for policy in ("round_robin", "heuristic"):
+        print(f"\nPolicy: {policy}")
+        scores = []
+        for task in list_tasks():
+            summary = run_policy(task.task_id, policy)
+            scores.append(summary.score)
+            print(
+                f"{summary.task_id:6} | reward={summary.total_reward:6.2f} | "
+                f"score={summary.score:0.3f} | bottlenecks={summary.bottlenecks}"
+            )
+        avg_score = sum(scores) / len(scores)
+        print(f"average | score={avg_score:0.3f}")
+if __name__ == "__main__":
+    main()

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff