Spaces:

akashrajeev
/

incident-response-env

Sleeping

App Files Files Community

akashrajeev commited on 26 days ago

Commit

f493683

verified ·

1 Parent(s): 20fc6a8

Upload folder using huggingface_hub

Browse files

Files changed (23) hide show

Dockerfile +15 -0
README.md +206 -11
__init__.py +16 -0
client.py +112 -0
inference.py +470 -0
models.py +65 -0
openenv.yaml +52 -0
openenv_incident_response_env.egg-info/PKG-INFO +11 -0
openenv_incident_response_env.egg-info/SOURCES.txt +18 -0
openenv_incident_response_env.egg-info/dependency_links.txt +1 -0
openenv_incident_response_env.egg-info/entry_points.txt +2 -0
openenv_incident_response_env.egg-info/requires.txt +7 -0
openenv_incident_response_env.egg-info/top_level.txt +1 -0
pyproject.toml +47 -0
server/Dockerfile +13 -0
server/__init__.py +11 -0
server/app.py +152 -0
server/environment.py +203 -0
server/graders.py +121 -0
server/incident_response_env_environment.py +104 -0
server/requirements.txt +7 -0
server/scenarios.py +214 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,15 @@

+# Root-level Dockerfile for Hugging Face Docker Spaces (default build path).
+# Identical to server/Dockerfile; keep in sync when changing dependencies.
+FROM python:3.11-slim
+WORKDIR /app
+COPY server/requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 8000
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,11 +1,206 @@
----
-title: Incident Response Env
-emoji: 📉
-colorFrom: indigo
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Incident Response OpenEnv
+emoji: 🚨
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+app_port: 8000
+pinned: false
+tags:
+  - openenv
+base_path: /web
+---
+# Incident Response OpenEnv
+Realistic **site reliability / incident triage** environment for [OpenEnv](https://github.com/meta-pytorch/OpenEnv): agents read firing alerts, choose a remediation **`action_type`**, target the correct **`alert_id`**, and receive graded rewards with partial credit. Three benchmark tasks (**easy -> medium -> hard**) simulate single-incident triage, root-cause among symptoms, and ordered cascade resolution.
+## Why this submission stands out
+- **Real-world domain** - not a toy grid or guessing game; models must reason about dependencies and severities.
+- **Full OpenEnv surface** - typed `Action` / `Observation` / `State`, `reset` / `step` / `state`, `openenv.yaml`, HTTP API, Docker.
+- **Meaningful rewards** - sparse success on wrong targets, partial signals for "right direction," chain order on hard tasks.
+- **Reproducible baseline** - root `inference.py` using the official OpenAI client, env vars `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, structured `[START]` / `[STEP]` / `[END]` logs only on stdout by default (no extra lines).
+## Tasks
+| Task          | Focus                                      |
+|---------------|---------------------------------------------|
+| `task_easy`   | Single disk pressure alert -> scale storage. |
+| `task_medium` | Multiple alerts -> remediate DB root cause.  |
+| `task_hard`   | Ordered `svc-001`... cascade -> full chain.    |
+Rewards are always in **\[0, 1]** per step; the baseline caps **episode score** at **1.0** (sum of step rewards).
+## Action & observation
+**Action** (`IncidentResponseAction`): `alert_id`, `action_type`, `notes`.
+**Observation**: `alerts[]` (id, title, severity, description, source), `resolved_alerts`, `system_health`, `step_number`, `message` (grader feedback), plus top-level `reward` / `done` from the HTTP wrapper.
+**State** (`GET /state`): `episode_id`, `step_count`, `task_id`, `max_steps`, `total_reward`, `scenario_name`.
+## Quick start (local)
+```bash
+cd incident_response_env
+uv sync
+uv run uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+Health:
+```bash
+curl -s http://127.0.0.1:8000/health
+```
+Reset & step (example):
+```bash
+curl -s -X POST http://127.0.0.1:8000/reset -H "Content-Type: application/json" -d "{\"task_id\":\"task_easy\"}"
+curl -s -X POST http://127.0.0.1:8000/step -H "Content-Type: application/json" -d "{\"action\":{\"alert_id\":\"disk-alert-1\",\"action_type\":\"scale_up\",\"notes\":\"Relieve disk pressure\"}}"
+```
+## Baseline inference (`inference.py`)
+Required for the hackathon harness (OpenAI-compatible client):
+```powershell
+$env:ENV_URL       = "http://127.0.0.1:8000"
+$env:API_BASE_URL  = "https://router.huggingface.co/v1"
+$env:MODEL_NAME    = "<model id>"
+$env:HF_TOKEN      = "<hf token>"
+uv run python inference.py
+```
+Optional:
+- `INFERENCE_STUB=1` - run without an LLM (deterministic policy) for CI or smoke tests.
+- `INFERENCE_SUMMARY=1` - print an extra `[SUMMARY]` line (omit for strict stdout parsers).
+**Windows / `openenv push`:** if you see `charmap` codec errors, set UTF-8 mode before push: `set PYTHONUTF8=1` (cmd) or `$env:PYTHONUTF8="1"` (PowerShell).
+Runtime: keep total wall clock **under 20 minutes**; use a small instruct model if needed.
+## Docker
+```bash
+docker build -t incident-response-env -f server/Dockerfile .
+docker run --rm -p 8000:8000 incident-response-env
+```
+## Phase 8 - Deploy to Hugging Face Spaces
+Create a **Docker** Space on Hugging Face named e.g. `incident-response-env` (must match your repo id if you use CLI defaults).
+### Method A - OpenEnv CLI (recommended)
+Current OpenEnv packages expose **`openenv push`**, not `openenv deploy` (if your course PDF says `deploy`, use **`push`** with the same repo id).
+```bash
+# one-time
+huggingface-cli login   # paste HF token when prompted
+cd incident_response_env
+openenv validate
+openenv push --repo-id YOUR_HF_USERNAME/incident-response-env
+# Windows: try PYTHONUTF8=1 if push fails on encoding
+# add --private if required; use --no-interface if you hit Gradio/UI issues
+```
+`openenv push` reads **`openenv.yaml`** and uploads the environment; the repo root **`Dockerfile`** is used by HF's Docker SDK builder.
+### Method B - Manual Git push
+```bash
+cd incident_response_env
+git init
+git remote add origin https://huggingface.co/spaces/YOUR_HF_USERNAME/incident-response-env
+git add .
+git commit -m "OpenEnv incident-response submission"
+git push -u origin main
+```
+Ensure the Space **SDK** is **Docker** on the Hugging Face UI (README front matter already sets `sdk: docker`, `app_port: 8000`).
+### After deploy
+Public app URL is usually:
+`https://YOUR_HF_USERNAME-incident-response-env.hf.space`
+Smoke test (no trailing slash issues - use exact host Hugging Face shows):
+```bash
+curl -sS https://YOUR_HF_USERNAME-incident-response-env.hf.space/health
+curl -sS -X POST https://YOUR_HF_USERNAME-incident-response-env.hf.space/reset \
+  -H "Content-Type: application/json" -d "{}"
+```
+Update **`docker_image`** in `openenv.yaml` to `YOUR_HF_USERNAME/incident-response-env` for documentation consistency.
+Then run **`openenv validate --url https://...hf.space`** and your organizer's pre-submission script.
+## Validate before submit
+```bash
+openenv validate
+# optional: runtime check against a deployed URL
+openenv validate --url https://<your-space>.hf.space
+```
+## Pre-submission checklist (Round 1)
+Cross-check with the official dashboard (e.g. Scaler / Meta OpenEnv Round 1):
+- [ ] **`inference.py`** at repo root; uses **`OpenAI`** client + **`API_BASE_URL`**, **`MODEL_NAME`**, **`HF_TOKEN`**
+- [ ] Stdout: **`[START]`**, **`[STEP]`**, **`[END]`** only (avoid `INFERENCE_SUMMARY` for automated parsing)
+- [ ] **`openenv validate`** OK; **`uv.lock`** present if required
+- [ ] **Dockerfile** builds in CI
+- [ ] **HF Space** up; health + reset respond
+- [ ] **>= 3 tasks** with graders; rewards in **[0, 1]**
+- [ ] **README** describes domain, spaces, setup (this file)
+- [ ] No secrets in git; rotate any leaked tokens
+## Project layout
+```
+incident_response_env/
+|-- Dockerfile             # HF Spaces default path (same image as server/Dockerfile)
+|-- .dockerignore
+|-- inference.py          # Hackathon baseline (LLM + env HTTP)
+|-- openenv.yaml
+|-- models.py
+|-- client.py             # WebSocket EnvClient wrapper
+|-- pyproject.toml
+|-- uv.lock
+`-- server/
+    |-- app.py            # FastAPI app
+    |-- environment.py    # reset / step / state
+    |-- scenarios.py
+    |-- graders.py
+    |-- Dockerfile
+    `-- requirements.txt
+```
+## Client (WebSocket)
+```python
+from incident_response_env import IncidentResponseEnv, IncidentResponseAction
+with IncidentResponseEnv(base_url="http://localhost:8000") as env:
+    r = env.reset(task_id="task_easy")  # pass kwargs your server accepts
+    r = env.step(
+        IncidentResponseAction(
+            alert_id="disk-alert-1",
+            action_type="scale_up",
+            notes="Expand storage.",
+        )
+    )
+```
+(See `client.py` for `_step_payload` / parsing details.)
+## License
+See `LICENSE` in the repository.

__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Incident Response Env Environment."""
+from .client import IncidentResponseEnv
+from .models import IncidentResponseAction, IncidentResponseObservation
+__all__ = [
+    "IncidentResponseAction",
+    "IncidentResponseObservation",
+    "IncidentResponseEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,112 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Incident Response Env Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+try:
+    from .models import (
+        IncidentResponseAction,
+        IncidentResponseObservation,
+        IncidentState,
+    )
+except ImportError:  # pragma: no cover
+    from models import (
+        IncidentResponseAction,
+        IncidentResponseObservation,
+        IncidentState,
+    )
+class IncidentResponseEnv(
+    EnvClient[IncidentResponseAction, IncidentResponseObservation, IncidentState]
+):
+    """
+    Client for the Incident Response Env Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with IncidentResponseEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.alerts)
+        ...
+        ...     result = client.step(
+        ...         IncidentResponseAction(
+        ...             alert_id="disk-alert-1",
+        ...             action_type="scale_up",
+        ...             notes="Relieve disk pressure.",
+        ...         )
+        ...     )
+        ...     print(result.observation.message)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = IncidentResponseEnv.from_docker_image("incident_response_env-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(IncidentResponseAction(message="Test"))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: IncidentResponseAction) -> Dict:
+        """
+        Convert IncidentResponseAction to JSON payload for step message.
+        Args:
+            action: IncidentResponseAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "action": {
+                "alert_id": getattr(action, "alert_id", ""),
+                "action_type": getattr(action, "action_type", "investigate"),
+                "notes": getattr(action, "notes", ""),
+            }
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[IncidentResponseObservation]:
+        """
+        Parse server response into StepResult[IncidentResponseObservation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with IncidentResponseObservation
+        """
+        obs_data = dict(payload.get("observation") or {})
+        obs_data.setdefault("done", payload.get("done", False))
+        obs_data.setdefault("reward", payload.get("reward"))
+        observation = IncidentResponseObservation.model_validate(obs_data)
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> IncidentState:
+        """
+        Parse server response into IncidentState.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            IncidentState for this environment
+        """
+        return IncidentState.model_validate(payload)

inference.py ADDED Viewed

	@@ -0,0 +1,470 @@

+import json
+import os
+import time
+import re
+from typing import Any, Dict, List, Optional, Tuple
+import requests
+from openai import OpenAI
+ENV_URL = os.environ.get("ENV_URL", "http://localhost:8000")
+# Benchmark name for [START] line (hackathon sample uses env=<benchmark>)
+BENCHMARK = os.environ.get("INCIDENT_BENCHMARK", "incident_response_env")
+SUCCESS_SCORE_THRESHOLD = float(os.environ.get("SUCCESS_SCORE_THRESHOLD", "0.1"))
+# Stricter bar for reporting "all tasks strong" (e.g. leaderboard psychologics).
+STRICT_TASK_SCORE = float(os.environ.get("STRICT_TASK_SCORE", "0.95"))
+SYSTEM_PROMPT = """You are an expert Site Reliability Engineer (SRE).
+You receive a JSON object each turn (not raw alert list only). It includes task_id, step, alerts,
+resolved_alert_ids, and environment_message from the simulator.
+CRITICAL RULES:
+- alert_id MUST be copied exactly from one of the "id" fields in the CURRENT alerts array.
+- NEVER invent IDs. NEVER reuse IDs from prior tasks or examples.
+- Read environment_message every step. If it says "Out of order" or reward was low, change strategy:
+  follow the rules for your task_id below.
+TASK-SPECIFIC POLICY (use task_id from the JSON):
+- task_easy: Usually one disk/storage alert. action_type scale_up on THAT alert's id.
+- task_medium: Database pool / root cause (often id db-001). Remediate root before symptoms;
+  scale_up or fix on the DB alert id is appropriate.
+- task_hard: Cascading service failure. Alert ids look like svc-001, svc-002, svc-003, ...
+  You MUST remediate in strict numeric order: among alerts still present, pick the id with the
+  **smallest** N in svc-NNN (e.g. svc-001 before svc-005). Use action_type **fix** for that
+  upstream failing service unless the alert text clearly indicates capacity-only (then scale_up).
+  Do not pick a higher-numbered svc while a lower-numbered one is still in alerts.
+Available action_type values:
+- scale_up, fix, restart, rollback, mitigate, remediate, isolate, block
+Always provide non-empty "notes" (one sentence). When the chosen alert has a "source" field
+(e.g. auth-service, database), mention that exact string in notes - it aligns with grading bonuses.
+Respond ONLY as JSON:
+{
+  "alert_id": "string (required)",
+  "action_type": "string (required)",
+  "notes": "string (required - brief, include source when present)"
+}
+"""
+def _task_hard_chain_head_id(alerts: Any) -> Optional[str]:
+    """Smallest svc-NNN among active alerts, for ordering hints."""
+    if not isinstance(alerts, list):
+        return None
+    best: Optional[Tuple[int, str]] = None
+    pattern = re.compile(r"^svc-(\d+)$", re.IGNORECASE)
+    for a in alerts:
+        if not isinstance(a, dict):
+            continue
+        raw = str(a.get("id", "")).strip()
+        m = pattern.match(raw)
+        if not m:
+            continue
+        n = int(m.group(1))
+        if best is None or n < best[0]:
+            best = (n, raw)
+    return best[1] if best else None
+def _build_llm_user_payload(*, task_id: str, step: int, obs: Dict[str, Any]) -> str:
+    alerts = obs.get("alerts") or []
+    payload: Dict[str, Any] = {
+        "task_id": task_id,
+        "step": step,
+        "alerts": alerts,
+        "resolved_alert_ids": obs.get("resolved_alerts") or [],
+        "environment_message": obs.get("message") or "",
+    }
+    if task_id == "task_hard":
+        head = _task_hard_chain_head_id(alerts)
+        if head:
+            payload["cascade_next_id_hint"] = (
+                f"Lowest-index unresolved svc in this list is {head!r}; prefer that alert_id."
+            )
+    return json.dumps(payload, ensure_ascii=False)
+_TASK_MAX_STEPS = {"task_easy": 5, "task_medium": 10, "task_hard": 20}
+def _truthy_env(name: str) -> bool:
+    return os.environ.get(name, "").strip().lower() in ("1", "true", "yes")
+def _stub_action(
+    task_id: str, obs: Dict[str, Any], episode_alert_ids: List[str]
+) -> Dict[str, Any]:
+    """Deterministic policy for local runs without an LLM (INFERENCE_STUB=1)."""
+    alerts = obs.get("alerts") or []
+    resolved = set(
+        str(x) for x in (obs.get("resolved_alerts") or []) if x is not None
+    )
+    active_ids = [
+        str(a["id"])
+        for a in alerts
+        if isinstance(a, dict) and a.get("id")
+    ]
+    workable = [i for i in episode_alert_ids if i not in resolved]
+    if task_id == "task_easy":
+        aid = (
+            "disk-alert-1"
+            if "disk-alert-1" in active_ids
+            else (active_ids[0] if active_ids else "")
+        )
+        return {"alert_id": aid, "action_type": "scale_up", "notes": "stub policy"}
+    if task_id == "task_medium":
+        aid = "db-001" if "db-001" in active_ids else ""
+        if not aid:
+            pick = _pick_fallback_alert_id(alerts, workable)
+            aid = pick or ""
+        return {"alert_id": aid, "action_type": "scale_up", "notes": "stub policy"}
+    aid = active_ids[0] if active_ids else ""
+    if not aid:
+        pick = _pick_fallback_alert_id(alerts, workable)
+        aid = pick or ""
+    return {"alert_id": aid, "action_type": "fix", "notes": "stub policy"}
+def log_start(*, task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(
+    *,
+    step: int,
+    action_str: str,
+    reward: float,
+    done: bool,
+    error: Optional[str],
+) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP]  step={step} action={action_str} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(
+    *,
+    success: bool,
+    steps: int,
+    score: float,
+    rewards: List[float],
+) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+def _normalize_step_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
+    if "observation" in payload and isinstance(payload["observation"], dict):
+        obs = dict(payload["observation"])
+        if "done" in payload:
+            obs["done"] = payload["done"]
+        if "reward" in payload:
+            obs["reward"] = payload["reward"]
+        return obs
+    return payload
+def _parse_action(text: str) -> Tuple[Dict[str, Any], str]:
+    raw = text.strip()
+    try:
+        data = json.loads(raw)
+    except Exception:
+        start = raw.find("{")
+        end = raw.rfind("}")
+        if start != -1 and end != -1 and end > start:
+            try:
+                data = json.loads(raw[start : end + 1])
+            except Exception:
+                data = {}
+        else:
+            data = {}
+    alert_id = str(data.get("alert_id", "")).strip()
+    action_type = str(data.get("action_type", "")).strip() or "investigate"
+    raw_notes = data.get("notes", data.get("reasoning", ""))
+    notes = str(raw_notes).strip()
+    action = {"alert_id": alert_id, "action_type": action_type, "notes": notes}
+    return action, notes
+def _action_str(action: Dict[str, Any]) -> str:
+    return json.dumps(action, ensure_ascii=False, separators=(",", ":"))
+def _alert_ids_from_obs(alerts: Any) -> List[str]:
+    out: List[str] = []
+    if not isinstance(alerts, list):
+        return out
+    for a in alerts:
+        if isinstance(a, dict) and a.get("id"):
+            out.append(str(a["id"]))
+    return out
+def _pick_fallback_alert_id(
+    alerts: Any, unresolved_ordered: List[str]
+) -> Optional[str]:
+    """
+    Pick a valid unresolved id. Only considers rows in `alerts` whose id is in
+    unresolved_ordered - never return an id that leaked into `alerts` from elsewhere.
+    Prefer critical among those rows, else first matching row, else first in episode order.
+    """
+    if not unresolved_ordered:
+        return None
+    allowed = set(unresolved_ordered)
+    if isinstance(alerts, list) and alerts:
+        for a in alerts:
+            if not isinstance(a, dict):
+                continue
+            aid = str(a.get("id", "")).strip()
+            if aid not in allowed:
+                continue
+            if a.get("severity") == "critical":
+                return aid
+        for a in alerts:
+            if not isinstance(a, dict):
+                continue
+            aid = str(a.get("id", "")).strip()
+            if aid in allowed:
+                return aid
+    return unresolved_ordered[0]
+def _sanitize_action(
+    action: Dict[str, Any],
+    obs: Dict[str, Any],
+    episode_alert_ids: List[str],
+) -> Dict[str, Any]:
+    """
+    If the model hallucinates an alert_id (e.g. disk-alert-1 from a prior task), repair.
+    Only ids that appeared in the initial reset for THIS episode are valid - never trust
+    the model to invent ids. Also avoid targeting an id already in resolved_alerts.
+    """
+    if not episode_alert_ids:
+        return action
+    alerts = obs.get("alerts", [])
+    resolved = set(str(x) for x in (obs.get("resolved_alerts") or []) if x is not None)
+    aid = str(action.get("alert_id", "")).strip()
+    epi_set = set(episode_alert_ids)
+    workable = [i for i in episode_alert_ids if i not in resolved]
+    ok = aid in epi_set and aid not in resolved
+    if ok:
+        return action
+    out = dict(action)
+    # Preserve reset order (important for cascade chain: svc-001 before svc-002, ...)
+    unresolved_for_pick = workable
+    chosen = _pick_fallback_alert_id(alerts, unresolved_for_pick)
+    if chosen is None:
+        return out
+    bad = aid or "(empty)"
+    note = str(out.get("notes", "")).strip()
+    reason = (
+        "not part of this episode's alerts"
+        if aid not in epi_set
+        else "already resolved"
+    )
+    repair = f"Invalid alert_id {bad!r} ({reason}); using {chosen}."
+    out["alert_id"] = chosen
+    out["notes"] = f"{repair} {note}".strip()
+    return out
+def run_episode(
+    *,
+    task_id: str,
+    client: Optional[OpenAI],
+    model_name: str,
+    use_stub: bool,
+) -> float:
+    """Run one benchmark task; return episode score in [0, 1] ( capped sum of step rewards)."""
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=task_id, env=BENCHMARK, model=model_name)
+    try:
+        reset_payload = requests.post(
+            f"{ENV_URL}/reset", json={"task_id": task_id}, timeout=15
+        ).json()
+        obs = _normalize_step_payload(reset_payload)
+        episode_alert_ids = _alert_ids_from_obs(obs.get("alerts", []))
+        max_loops = int(
+            obs.get("max_steps") or _TASK_MAX_STEPS.get(task_id, 20) or 20
+        )
+        t0 = time.time()
+        step = 0
+        while not bool(obs.get("done", False)):
+            if time.time() - t0 > 60 * 15:
+                break
+            if step >= max_loops:
+                break
+            step += 1
+            err: Optional[str] = None
+            reward = 0.0
+            done = False
+            action_line = "{}"
+            action: Dict[str, Any] = {}
+            try:
+                alerts = obs.get("alerts", [])
+                if use_stub:
+                    action = _stub_action(task_id, obs, episode_alert_ids)
+                else:
+                    assert client is not None
+                    user_content = _build_llm_user_payload(
+                        task_id=task_id, step=step, obs=obs
+                    )
+                    response = client.chat.completions.create(
+                        model=model_name,
+                        messages=[
+                            {"role": "system", "content": SYSTEM_PROMPT},
+                            {"role": "user", "content": user_content},
+                        ],
+                        temperature=0.0,
+                    )
+                    action, _notes = _parse_action(
+                        response.choices[0].message.content or ""
+                    )
+                    if not action.get("alert_id"):
+                        if isinstance(alerts, list) and alerts:
+                            first = alerts[0]
+                            if isinstance(first, dict) and "id" in first:
+                                action["alert_id"] = first["id"]
+                action = _sanitize_action(action, obs, episode_alert_ids)
+                action_line = _action_str(action)
+                step_payload = requests.post(
+                    f"{ENV_URL}/step", json={"action": action}, timeout=15
+                ).json()
+                obs = _normalize_step_payload(step_payload)
+                reward = float(obs.get("reward") or 0.0)
+                done = bool(obs.get("done", False))
+                rewards.append(reward)
+                steps_taken = step
+            except Exception as exc:
+                err = str(exc).replace("\n", " ")
+                rewards.append(0.0)
+                steps_taken = step
+                # Avoid empty action in logs when the LLM or HTTP provider fails (e.g. 402).
+                if not action:
+                    fb_id = _pick_fallback_alert_id(
+                        obs.get("alerts", []),
+                        [
+                            i
+                            for i in episode_alert_ids
+                            if i
+                            not in set(
+                                str(x)
+                                for x in (obs.get("resolved_alerts") or [])
+                                if x is not None
+                            )
+                        ],
+                    )
+                    if fb_id:
+                        action = {
+                            "alert_id": fb_id,
+                            "action_type": "investigate",
+                            "notes": "LLM/API error; no step sent.",
+                        }
+                action_line = _action_str(action) if action else "{}"
+            log_step(
+                step=step,
+                action_str=action_line,
+                reward=reward,
+                done=done,
+                error=err,
+            )
+            if err is not None:
+                break
+            if done:
+                break
+        total = sum(rewards) if rewards else 0.0
+        score = min(max(total, 0.0), 1.0)
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return score
+def main() -> None:
+    use_stub = _truthy_env("INFERENCE_STUB")
+    if use_stub:
+        model_name = os.environ.get("MODEL_NAME", "stub-local")
+        client: Optional[OpenAI] = None
+    else:
+        required = ("API_BASE_URL", "MODEL_NAME", "HF_TOKEN")
+        missing = [k for k in required if not os.environ.get(k)]
+        if missing:
+            raise SystemExit(
+                "Set these environment variables before running inference.py: "
+                + ", ".join(missing)
+                + " - or set INFERENCE_STUB=1 to run against the env with a built-in policy "
+                "(no LLM)."
+            )
+        client = OpenAI(
+            api_key=os.environ["HF_TOKEN"],
+            base_url=os.environ["API_BASE_URL"],
+        )
+        model_name = os.environ["MODEL_NAME"]
+    tasks = ["task_easy", "task_medium", "task_hard"]
+    episode_scores: List[Tuple[str, float]] = []
+    for task in tasks:
+        ep_score = run_episode(
+            task_id=task,
+            client=client,
+            model_name=model_name,
+            use_stub=use_stub,
+        )
+        episode_scores.append((task, ep_score))
+    # Hackathon evaluators may parse stdout strictly ([START]/[STEP]/[END] only).
+    # Set INFERENCE_SUMMARY=1 for an extra aggregate line (local leaderboards).
+    if _truthy_env("INFERENCE_SUMMARY"):
+        scores_only = [s for _, s in episode_scores]
+        mean_score = sum(scores_only) / len(scores_only) if scores_only else 0.0
+        min_score = min(scores_only) if scores_only else 0.0
+        strict_ok = all(s >= STRICT_TASK_SCORE for s in scores_only)
+        parts = ",".join(f"{t}:{v:.3f}" for t, v in episode_scores)
+        print(
+            f"[SUMMARY] mean_score={mean_score:.3f} min_score={min_score:.3f} "
+            f"strict_all_ge_{STRICT_TASK_SCORE:g}={str(strict_ok).lower()} "
+            f"per_task={parts}",
+            flush=True,
+        )
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,65 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Incident Response Env Environment.
+These types form the environment API contract: the agent sends an Action and receives
+an Observation, and external validators can query State.
+"""
+from __future__ import annotations
+from typing import List, Literal, Optional
+from openenv.core.env_server.types import Action, Observation, State
+from pydantic import Field
+Severity = Literal["low", "medium", "high", "critical"]
+class Alert(Observation):
+    id: str = Field(..., description="Stable unique alert identifier")
+    title: str = Field(..., description="Short alert title")
+    severity: Severity = Field(..., description="Alert severity")
+    description: str = Field(..., description="Human-readable alert context")
+    source: str = Field(default="unknown", description="Alert source system")
+class IncidentAction(Action):
+    alert_id: str = Field(default="", description="Which alert the agent is acting on")
+    action_type: str = Field(
+        default="investigate",
+        description=(
+            "investigate|scale_up|restart|rollback|fix|mitigate|remediate|isolate|block"
+        ),
+    )
+    notes: str = Field(default="", description="Optional reasoning or context for the action")
+class IncidentObservation(Observation):
+    alerts: List[Alert] = Field(default_factory=list, description="Active alerts")
+    resolved_alerts: List[str] = Field(
+        default_factory=list, description="Alert IDs resolved so far"
+    )
+    system_health: float = Field(
+        default=1.0, ge=0.0, le=1.0, description="0.0–1.0 overall health"
+    )
+    step_number: int = Field(default=0, ge=0, description="Current step count")
+    message: str = Field(default="", description="Environment feedback to the agent")
+class IncidentState(State):
+    task_id: str = Field(default="task_easy")
+    max_steps: int = Field(default=0, ge=0)
+    total_reward: float = Field(default=0.0)
+    scenario_name: str = Field(default="unknown")
+# Backwards-compat aliases (older template names).
+IncidentResponseAction = IncidentAction
+IncidentResponseObservation = IncidentObservation

openenv.yaml ADDED Viewed

	@@ -0,0 +1,52 @@

+name: incident-response-env
+version: "1.0.0"
+description: >
+  Train AI agents to triage real-world server incidents.
+  Agent receives firing alerts and must identify root causes,
+  prioritize correctly, and resolve cascading failures.
+tasks:
+  - id: task_easy
+    name: Single Alert Triage
+    difficulty: easy
+    max_steps: 5
+  - id: task_medium
+    name: Root Cause Identification
+    difficulty: medium
+    max_steps: 10
+  - id: task_hard
+    name: Cascading Failure Resolution
+    difficulty: hard
+    max_steps: 20
+observation_space:
+  type: structured
+  description: >
+    Active incident state: alerts (list of {id, title, severity, description, source}),
+    resolved_alerts (alert IDs cleared so far), system_health (0.0-1.0), step_number,
+    and message (environment feedback). Reward and done are returned by the OpenEnv
+    step wrapper alongside the observation.
+action_space:
+  type: structured
+  description: >
+    Structured remediation. action_type must be one of: investigate, scale_up,
+    restart, rollback, fix, mitigate, remediate, isolate, block (see models.IncidentAction).
+  fields:
+    - alert_id: string
+    - action_type: string
+    - notes: string
+  action_type_allowed:
+    - investigate
+    - scale_up
+    - restart
+    - rollback
+    - fix
+    - mitigate
+    - remediate
+    - isolate
+    - block
+reward_range: [0.0, 1.0]
+# Set to your Hugging Face Docker Space repo before deploy, e.g. username/incident-response-env
+docker_image: your-hf-username/incident-response-env

openenv_incident_response_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,11 @@

+Metadata-Version: 2.4
+Name: openenv-incident_response_env
+Version: 0.1.0
+Summary: Incident Response Env environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: openenv-core[core]>=0.2.2
+Requires-Dist: openai>=1.0.0
+Requires-Dist: requests>=2.32.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_incident_response_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+README.md
+pyproject.toml
+./__init__.py
+./client.py
+./inference.py
+./models.py
+openenv_incident_response_env.egg-info/PKG-INFO
+openenv_incident_response_env.egg-info/SOURCES.txt
+openenv_incident_response_env.egg-info/dependency_links.txt
+openenv_incident_response_env.egg-info/entry_points.txt
+openenv_incident_response_env.egg-info/requires.txt
+openenv_incident_response_env.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/environment.py
+server/graders.py
+server/incident_response_env_environment.py
+server/scenarios.py

openenv_incident_response_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_incident_response_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = incident_response_env.server.app:main

openenv_incident_response_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+openenv-core[core]>=0.2.2
+openai>=1.0.0
+requests>=2.32.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_incident_response_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ incident_response_env

pyproject.toml ADDED Viewed

	@@ -0,0 +1,47 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-incident_response_env"
+version = "0.1.0"
+description = "Incident Response Env environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+    "openai>=1.0.0",
+    "requests>=2.32.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m incident_response_env.server.app
+server = "incident_response_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["incident_response_env", "incident_response_env.server"]
+package-dir = { "incident_response_env" = ".", "incident_response_env.server" = "server" }

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+# Local / CI alternate path. Hugging Face Docker Spaces use the repo-root Dockerfile.
+FROM python:3.11-slim
+WORKDIR /app
+COPY server/requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 8000
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Incident Response Env environment server components."""
+from .environment import IncidentResponseEnvironment
+__all__ = ["IncidentResponseEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,152 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Incident Response Env Environment.
+This module creates an HTTP server that exposes the IncidentResponseEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+from __future__ import annotations
+import inspect
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    # Docker/validator mode: run as `uvicorn server.app:app` from repo root.
+    from fastapi import FastAPI
+    from fastapi.routing import APIRoute
+    from models import (
+        IncidentResponseAction,
+        IncidentResponseObservation,
+        IncidentState,
+    )
+    from server.environment import IncidentResponseEnvironment
+except Exception:  # pragma: no cover
+    # Package mode: `uvicorn incident_response_env.server.app:app`
+    from fastapi import FastAPI
+    from fastapi.routing import APIRoute
+    from models import (
+        IncidentResponseAction,
+        IncidentResponseObservation,
+        IncidentState,
+    )
+    from .environment import IncidentResponseEnvironment
+# OpenEnv's HTTP /reset and /step handlers invoke the factory for every request.
+# A fresh Environment per request breaks episode state (each /step would hit
+# scenario=None and fall back to task_easy). Use a single shared instance so
+# stateless HTTP clients behave like one continuous episode.
+_shared_incident_env = IncidentResponseEnvironment()
+def incident_env_factory() -> IncidentResponseEnvironment:
+    return _shared_incident_env
+# Create the app. Prefer the hackathon-style signature: create_app(factory)
+sig = None
+try:  # pragma: no cover
+    sig = inspect.signature(create_app)
+except Exception:  # pragma: no cover
+    sig = None
+if sig is not None and len(sig.parameters) == 1:
+    app = create_app(incident_env_factory)  # type: ignore[misc]
+else:
+    # Older signature used by the OpenEnv template.
+    app = create_app(  # type: ignore[misc]
+        incident_env_factory,
+        IncidentResponseAction,
+        IncidentResponseObservation,
+        env_name="incident_response_env",
+        max_concurrent_envs=1,
+    )
+def _reregister_state_route(application: FastAPI) -> None:
+    """OpenEnv registers GET /state with base State, which omits IncidentState fields."""
+    application.router.routes = [
+        route
+        for route in application.router.routes
+        if not (
+            isinstance(route, APIRoute)
+            and route.path == "/state"
+            and "GET" in route.methods
+        )
+    ]
+    @application.get(
+        "/state",
+        response_model=IncidentState,
+        tags=["State Management"],
+        summary="Get current environment state",
+    )
+    async def incident_state() -> IncidentState:
+        return _shared_incident_env.state
+_reregister_state_route(app)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m incident_response_env.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn incident_response_env.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    # openenv validate expects a literal `main()` substring in this file
+    if args.port == 8000:
+        main()
+    else:
+        main(port=args.port)

server/environment.py ADDED Viewed

	@@ -0,0 +1,203 @@

+"""
+Core environment logic for the Incident Response playground.
+This module intentionally keeps "ground truth" (e.g., root-cause flags) internal and
+never returns it in observations. Agents must infer root cause from context.
+"""
+from __future__ import annotations
+import uuid
+from dataclasses import dataclass
+from typing import List, Optional
+from openenv.core.env_server.interfaces import Environment
+try:
+    from ..models import Alert, IncidentAction, IncidentObservation, IncidentState
+    from .graders import IncidentGrader
+    from .scenarios import Scenario, ScenarioGenerator
+except ImportError:  # pragma: no cover
+    from models import Alert, IncidentAction, IncidentObservation, IncidentState
+    from server.graders import IncidentGrader
+    from server.scenarios import Scenario, ScenarioGenerator
+@dataclass
+class _InternalAlert:
+    alert: Alert
+    is_root_cause: bool
+class IncidentResponseEnvironment(Environment):
+    # HTTP server uses a process-wide shared instance for /reset + /step; only
+    # one logical episode/client should drive it at a time.
+    SUPPORTS_CONCURRENT_SESSIONS: bool = False
+    def __init__(self):
+        self.grader = IncidentGrader()
+        self.scenario: Optional[Scenario] = None
+        self.resolved: List[str] = []
+        self.step_count: int = 0
+        self.episode_id: str = ""
+        self.task_id: str = "task_easy"
+        self.total_reward: float = 0.0
+        self.current_health: float = 1.0
+        self._alerts: List[_InternalAlert] = []
+    def reset(self, task_id: str = "task_easy", seed: int | None = None):  # type: ignore[override]
+        self.scenario = ScenarioGenerator.generate(task_id, seed=seed)
+        self.resolved = []
+        self.step_count = 0
+        self.episode_id = str(uuid.uuid4())[:8]
+        self.task_id = task_id
+        self.total_reward = 0.0
+        self.current_health = self.scenario.initial_health
+        self._alerts = [
+            _InternalAlert(alert=a, is_root_cause=is_rc)
+            for a, is_rc in self.scenario.initial_alerts_internal
+        ]
+        return IncidentObservation(
+            alerts=self._get_active_alerts(),
+            resolved_alerts=[],
+            system_health=self.current_health,
+            step_number=0,
+            done=False,
+            reward=0.0,
+            message="Incident detected. Begin triage.",
+        )
+    def step(self, action: IncidentAction) -> IncidentObservation:  # type: ignore[override]
+        if self.scenario is None:
+            # Be forgiving if a judge/runner forgets to call reset first.
+            self.reset(task_id=getattr(action, "task_id", "task_easy"))
+        self.step_count += 1
+        reward, feedback = self.grader.grade(
+            action=action,
+            scenario=self.scenario,  # type: ignore[arg-type]
+            step=self.step_count,
+            resolved=self.resolved,
+        )
+        self.total_reward += reward
+        self._maybe_resolve(action)
+        done = self._episode_goal_satisfied() or (
+            self.scenario is not None and self.step_count >= self.scenario.max_steps
+        )
+        return IncidentObservation(
+            alerts=self._get_active_alerts(),
+            resolved_alerts=list(self.resolved),
+            system_health=self.current_health,
+            step_number=self.step_count,
+            done=done,
+            reward=reward,
+            message=feedback,
+        )
+    @property
+    def state(self) -> IncidentState:  # type: ignore[override]
+        scenario_name = self.scenario.name if self.scenario is not None else "unknown"
+        max_steps = self.scenario.max_steps if self.scenario is not None else 0
+        return IncidentState(
+            episode_id=self.episode_id,
+            task_id=self.task_id,
+            step_count=self.step_count,
+            max_steps=max_steps,
+            total_reward=self.total_reward,
+            scenario_name=scenario_name,
+        )
+    def _update_health(self, action: IncidentAction) -> None:
+        # Simple deterministic health update: remediation actions improve health more.
+        delta = 0.02
+        if action.action_type in {
+            "scale_up",
+            "restart",
+            "rollback",
+            "fix",
+            "mitigate",
+            "remediate",
+            "isolate",
+            "block",
+        }:
+            delta = 0.05
+        self.current_health = max(0.0, min(1.0, self.current_health + delta))
+    def _maybe_resolve(self, action: IncidentAction) -> None:
+        if self.scenario is None:
+            return
+        if not action.alert_id or action.alert_id in self.resolved:
+            return
+        action_type = (action.action_type or "").lower().strip()
+        resolution_actions = {
+            "scale_up",
+            "restart",
+            "rollback",
+            "fix",
+            "mitigate",
+            "remediate",
+            "isolate",
+            "block",
+        }
+        if action_type not in resolution_actions:
+            return
+        # Hard task: only allow resolving the next upstream link in the chain.
+        if self.scenario.kind == "full_cascade_failure" and self.scenario.cascade_chain_alert_ids:
+            chain = list(self.scenario.cascade_chain_alert_ids)
+            expected_index = 0
+            for cid in chain:
+                if cid in self.resolved:
+                    expected_index += 1
+                else:
+                    break
+            expected_id = chain[expected_index] if expected_index < len(chain) else None
+            if action.alert_id != expected_id:
+                return
+        self.resolved.append(action.alert_id)
+        self._update_health(action)
+    def _get_active_alerts(self) -> List[Alert]:
+        # Never reveal internal root-cause flags.
+        active = []
+        for entry in self._alerts:
+            if entry.alert.id not in self.resolved:
+                active.append(entry.alert)
+        return active
+    def _all_critical_resolved(self) -> bool:
+        for entry in self._alerts:
+            if entry.alert.severity == "critical" and entry.alert.id not in self.resolved:
+                return False
+        return True
+    def _episode_goal_satisfied(self) -> bool:
+        """
+        Episode ends when the task's success condition is met.
+        Cascade (hard) tasks require every link in cascade_chain_alert_ids to be
+        resolved - not only severity:critical rows - so agents earn graded rewards
+        along the full chain and total score can reach 1.0.
+        """
+        if self.scenario is None:
+            return False
+        if (
+            self.scenario.kind == "full_cascade_failure"
+            and self.scenario.cascade_chain_alert_ids
+        ):
+            return all(
+                cid in self.resolved
+                for cid in self.scenario.cascade_chain_alert_ids
+            )
+        return self._all_critical_resolved()

server/graders.py ADDED Viewed

	@@ -0,0 +1,121 @@

+"""
+Deterministic scoring logic for the incident response tasks.
+Implements the 3 required tasks for judging:
+- Task 1 (easy): single obvious alert, single correct action.
+- Task 2 (medium): identify root cause among symptoms, penalize wasted steps.
+- Task 3 (hard): resolve a cascade chain in order.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import List, Tuple
+try:
+    from ..models import IncidentAction
+    from .scenarios import Scenario
+except ImportError:  # pragma: no cover
+    from models import IncidentAction
+    from server.scenarios import Scenario
+@dataclass(frozen=True)
+class IncidentGrader:
+    _RESOLUTION_ACTIONS = {
+        "scale_up",
+        "restart",
+        "rollback",
+        "fix",
+        "mitigate",
+        "remediate",
+        "isolate",
+        "block",
+    }
+    def grade(
+        self,
+        *,
+        action: IncidentAction,
+        scenario: Scenario,
+        step: int,
+        resolved: List[str],
+    ) -> Tuple[float, str]:
+        if not action.alert_id:
+            return 0.0, "No alert selected. Choose an alert_id to investigate or remediate."
+        if action.alert_id in resolved:
+            return 0.0, "That alert was already resolved. Pick an unresolved alert."
+        alert_by_id = {a.id: a for a, _ in scenario.initial_alerts_internal}
+        if action.alert_id not in alert_by_id:
+            return 0.0, "Unknown alert_id. Pick one of the active alerts."
+        action_type = (action.action_type or "").lower().strip()
+        is_resolution = action_type in self._RESOLUTION_ACTIONS
+        if scenario.kind == "disk_full":
+            # Required Task 1 grading.
+            if action.alert_id != "disk-alert-1":
+                return 0.0, "Wrong alert. Triage the disk alert."
+            if action_type == "scale_up":
+                return 1.0, "Correct: scaled storage to relieve disk pressure."
+            return 0.4, "Correct alert, but wrong action_type. Use scale_up."
+        if scenario.kind == "cascading_db_failure":
+            # Required Task 2 grading (meaningful reward across steps).
+            root_id = scenario.root_cause_alert_id or "db-001"
+            if action.alert_id == root_id:
+                reward = 1.0 if step == 1 else 0.5
+                feedback = "Addressed root cause." + (" Great first move." if step == 1 else "")
+            else:
+                reward = 0.1
+                feedback = "You addressed a symptom; root cause remains unresolved."
+            # End bonus: if this action (once resolved by environment) would complete all critical.
+            # We approximate deterministically: if action targets the root cause with a resolution action.
+            if is_resolution and action.alert_id == root_id:
+                # Count remaining critical alerts besides this one.
+                remaining_critical = [
+                    a.id
+                    for a, _ in scenario.initial_alerts_internal
+                    if a.severity == "critical" and a.id not in resolved and a.id != action.alert_id
+                ]
+                if not remaining_critical:
+                    reward += 0.3
+                    feedback += " All critical alerts resolved. Bonus awarded."
+            return min(1.0, reward), feedback
+        # scenario.kind == "full_cascade_failure"
+        chain = list(scenario.cascade_chain_alert_ids)
+        if not chain:
+            return 0.0, "Scenario misconfigured: missing cascade chain."
+        # Determine expected next link in chain based on what's already resolved.
+        expected_index = 0
+        for cid in chain:
+            if cid in resolved:
+                expected_index += 1
+            else:
+                break
+        expected_id = chain[expected_index] if expected_index < len(chain) else None
+        if expected_id is None:
+            return 0.0, "Cascade already resolved."
+        if action.alert_id == expected_id:
+            reward = 0.25
+            feedback = "Correct next step in the cascade chain."
+            # Bonus if notes mention the correct service/source.
+            svc = alert_by_id[expected_id].source
+            if svc and svc.lower() in (action.notes or "").lower():
+                reward = min(1.0, reward + 0.1)
+                feedback += " Reasoning mentions the correct service."
+            # If this is the final link, cap to 1.0 total (environment accumulates).
+            if expected_index == len(chain) - 1:
+                feedback += " Chain complete."
+            return reward, feedback
+        return 0.05, "Out of order. Trace dependencies and resolve the next upstream failure first."

server/incident_response_env_environment.py ADDED Viewed

	@@ -0,0 +1,104 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Incident Response Env Environment Implementation.
+A simple test environment that echoes back messages sent to it.
+Perfect for testing HTTP server infrastructure.
+"""
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import IncidentResponseAction, IncidentResponseObservation
+except ImportError:
+    from models import IncidentResponseAction, IncidentResponseObservation
+class IncidentResponseEnvironment(Environment):
+    """
+    A simple echo environment that echoes back messages.
+    This environment is designed for testing the HTTP server infrastructure.
+    It maintains minimal state and simply echoes back whatever message it receives.
+    Example:
+        >>> env = IncidentResponseEnvironment()
+        >>> obs = env.reset()
+        >>> print(obs.echoed_message)  # "Incident Response Env environment ready!"
+        >>>
+        >>> obs = env.step(IncidentResponseAction(message="Hello"))
+        >>> print(obs.echoed_message)  # "Hello"
+        >>> print(obs.message_length)  # 5
+    """
+    # Enable concurrent WebSocket sessions.
+    # Set to True if your environment isolates state between instances.
+    # When True, multiple WebSocket clients can connect simultaneously, each
+    # getting their own environment instance (when using factory mode in app.py).
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        """Initialize the incident_response_env environment."""
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count = 0
+    def reset(self) -> IncidentResponseObservation:
+        """
+        Reset the environment.
+        Returns:
+            IncidentResponseObservation with a ready message
+        """
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count += 1
+        return IncidentResponseObservation(
+            echoed_message="Incident Response Env environment ready!",
+            message_length=0,
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: IncidentResponseAction) -> IncidentResponseObservation:  # type: ignore[override]
+        """
+        Execute a step in the environment by echoing the message.
+        Args:
+            action: IncidentResponseAction containing the message to echo
+        Returns:
+            IncidentResponseObservation with the echoed message and its length
+        """
+        self._state.step_count += 1
+        message = action.message
+        length = len(message)
+        # Simple reward: longer messages get higher rewards
+        reward = length * 0.1
+        return IncidentResponseObservation(
+            echoed_message=message,
+            message_length=length,
+            done=False,
+            reward=reward,
+            metadata={"original_message": message, "step": self._state.step_count},
+        )
+    @property
+    def state(self) -> State:
+        """
+        Get the current environment state.
+        Returns:
+            Current State with episode_id and step_count
+        """
+        return self._state

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+openenv-core[core]>=0.2.2
+fastapi>=0.104.0
+uvicorn>=0.24.0
+pydantic>=2.0.0
+openai>=1.0.0
+requests>=2.32.0

server/scenarios.py ADDED Viewed

	@@ -0,0 +1,214 @@

+"""
+Synthetic incident scenario generation (Tasks 1–3).
+Scenarios contain internal ground truth (e.g., root-cause IDs / chain order) that
+must never be returned to agents directly.
+"""
+from __future__ import annotations
+import random
+from dataclasses import dataclass
+from typing import Literal, Sequence, Tuple
+try:
+    from ..models import Alert
+except ImportError:  # pragma: no cover
+    from models import Alert
+ScenarioKind = Literal["disk_full", "cascading_db_failure", "full_cascade_failure"]
+@dataclass(frozen=True)
+class Scenario:
+    name: str
+    kind: ScenarioKind
+    max_steps: int
+    initial_health: float
+    initial_alerts_internal: Sequence[Tuple[Alert, bool]]
+    # Internal ground truth (never shown to agent)
+    root_cause_alert_id: str | None = None
+    cascade_chain_alert_ids: Sequence[str] = ()
+class ScenarioGenerator:
+    SERVICE_NAMES = [
+        "auth-service",
+        "payment-service",
+        "user-db",
+        "order-service",
+        "cache-layer",
+        "api-gateway",
+        "storage-service",
+        "database",
+        "user-service",
+    ]
+    @staticmethod
+    def generate(
+        task_id: str, seed: int | None = None, *, n_services: int | None = None, chain_length: int | None = None
+    ) -> Scenario:
+        """
+        Produce unlimited variations via randomness.
+        Note: env.reset() may also seed randomness; passing seed here makes generation
+        self-contained for judge harnesses that call ScenarioGenerator directly.
+        """
+        if seed is not None:
+            random.seed(seed)
+        if task_id == "task_easy":
+            return ScenarioGenerator._single_alert()
+        if task_id == "task_medium":
+            return ScenarioGenerator._root_cause(n_services=n_services or random.randint(3, 5))
+        # task_hard (or anything else) maps to cascade chain
+        return ScenarioGenerator._cascade_chain(chain_length=chain_length or random.randint(3, 5))
+    @staticmethod
+    def _pick_services(k: int) -> list[str]:
+        names = list(ScenarioGenerator.SERVICE_NAMES)
+        random.shuffle(names)
+        return names[:k]
+    @staticmethod
+    def _single_alert() -> Scenario:
+        # Required Task 1 scenario: "disk_full"
+        return Scenario(
+            name="disk_full",
+            kind="disk_full",
+            max_steps=3,
+            initial_health=0.55,
+            initial_alerts_internal=[
+                (
+                    Alert(
+                        id="disk-alert-1",
+                        title="Disk at 99%",
+                        severity="critical",
+                        description="Storage node nearly out of space. Writes failing intermittently.",
+                        source="storage-service",
+                    ),
+                    True,
+                )
+            ],
+            root_cause_alert_id="disk-alert-1",
+        )
+    @staticmethod
+    def _root_cause(*, n_services: int) -> Scenario:
+        # Required Task 2 scenario: "cascading_db_failure"
+        services = ScenarioGenerator._pick_services(max(3, n_services))
+        db_service = "database"
+        if db_service not in services:
+            services[0] = db_service
+        api_service = "api-gateway" if "api-gateway" in services else services[1]
+        pay_service = "payment-service" if "payment-service" in services else services[2]
+        alerts: list[Tuple[Alert, bool]] = [
+            (
+                Alert(
+                    id="db-001",
+                    title="DB connection timeout",
+                    severity="critical",
+                    description="Database pool exhausted; connections timing out. Downstream services likely impacted.",
+                    source=db_service,
+                ),
+                True,
+            ),
+            (
+                Alert(
+                    id="api-002",
+                    title="High error rate",
+                    severity="medium",
+                    description="5xx rate elevated. Errors correlate with DB timeout spikes.",
+                    source=api_service,
+                ),
+                False,
+            ),
+            (
+                Alert(
+                    id="pay-003",
+                    title="Requests failing",
+                    severity="medium",
+                    description="Payment calls failing with dependency errors (DB).",
+                    source=pay_service,
+                ),
+                False,
+            ),
+        ]
+        # Optionally add one extra noisy alert for variety.
+        if n_services >= 4:
+            noise_src = services[3]
+            alerts.append(
+                (
+                    Alert(
+                        id="aux-004",
+                        title="Cache miss rate increased",
+                        severity="low",
+                        description="Cache miss rate above baseline; could be secondary effect.",
+                        source=noise_src,
+                    ),
+                    False,
+                )
+            )
+        return Scenario(
+            name="cascading_db_failure",
+            kind="cascading_db_failure",
+            max_steps=8,
+            initial_health=0.6,
+            initial_alerts_internal=alerts,
+            root_cause_alert_id="db-001",
+        )
+    @staticmethod
+    def _cascade_chain(*, chain_length: int) -> Scenario:
+        # Required Task 3 scenario: "full_cascade_failure"
+        chain_services = ["auth-service", "user-service", "order-service", "payment-service"]
+        if chain_length != 4:
+            # Allow variable length, but keep the "auth → user → order → payment" prefix
+            extras = [s for s in ScenarioGenerator.SERVICE_NAMES if s not in chain_services]
+            random.shuffle(extras)
+            chain_services = (chain_services + extras)[: max(3, chain_length)]
+        chain_ids: list[str] = []
+        internal: list[Tuple[Alert, bool]] = []
+        for i, svc in enumerate(chain_services):
+            aid = f"svc-{i+1:03d}"
+            chain_ids.append(aid)
+            next_svc = chain_services[i + 1] if i + 1 < len(chain_services) else None
+            hint = (
+                f"Downstream impact observed: {next_svc} reporting dependency errors."
+                if next_svc
+                else "Downstream impact widespread."
+            )
+            internal.append(
+                (
+                    Alert(
+                        id=aid,
+                        title=f"{svc} failing",
+                        severity="critical" if i == 0 else "high",
+                        description=f"{svc} error spike. {hint}",
+                        source=svc,
+                    ),
+                    i == 0,  # treat first link as "root cause" internally
+                )
+            )
+        return Scenario(
+            name="full_cascade_failure",
+            kind="full_cascade_failure",
+            max_steps=max(10, len(chain_ids) * 3),
+            initial_health=0.45,
+            initial_alerts_internal=internal,
+            root_cause_alert_id=chain_ids[0],
+            cascade_chain_alert_ids=tuple(chain_ids),
+        )

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff