akashrajeev commited on
Commit
f493683
·
verified ·
1 Parent(s): 20fc6a8

Upload folder using huggingface_hub

Browse files
Dockerfile ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Root-level Dockerfile for Hugging Face Docker Spaces (default build path).
2
+ # Identical to server/Dockerfile; keep in sync when changing dependencies.
3
+ FROM python:3.11-slim
4
+
5
+ WORKDIR /app
6
+
7
+ COPY server/requirements.txt .
8
+ RUN pip install --no-cache-dir -r requirements.txt
9
+
10
+ COPY . .
11
+
12
+ EXPOSE 8000
13
+
14
+ ENV ENABLE_WEB_INTERFACE=true
15
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md CHANGED
@@ -1,11 +1,206 @@
1
- ---
2
- title: Incident Response Env
3
- emoji: 📉
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Incident Response OpenEnv
3
+ emoji: 🚨
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_port: 8000
8
+ pinned: false
9
+ tags:
10
+ - openenv
11
+ base_path: /web
12
+ ---
13
+
14
+ # Incident Response OpenEnv
15
+
16
+ Realistic **site reliability / incident triage** environment for [OpenEnv](https://github.com/meta-pytorch/OpenEnv): agents read firing alerts, choose a remediation **`action_type`**, target the correct **`alert_id`**, and receive graded rewards with partial credit. Three benchmark tasks (**easy -> medium -> hard**) simulate single-incident triage, root-cause among symptoms, and ordered cascade resolution.
17
+
18
+ ## Why this submission stands out
19
+
20
+ - **Real-world domain** - not a toy grid or guessing game; models must reason about dependencies and severities.
21
+ - **Full OpenEnv surface** - typed `Action` / `Observation` / `State`, `reset` / `step` / `state`, `openenv.yaml`, HTTP API, Docker.
22
+ - **Meaningful rewards** - sparse success on wrong targets, partial signals for "right direction," chain order on hard tasks.
23
+ - **Reproducible baseline** - root `inference.py` using the official OpenAI client, env vars `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, structured `[START]` / `[STEP]` / `[END]` logs only on stdout by default (no extra lines).
24
+
25
+ ## Tasks
26
+
27
+ | Task | Focus |
28
+ |---------------|---------------------------------------------|
29
+ | `task_easy` | Single disk pressure alert -> scale storage. |
30
+ | `task_medium` | Multiple alerts -> remediate DB root cause. |
31
+ | `task_hard` | Ordered `svc-001`... cascade -> full chain. |
32
+
33
+ Rewards are always in **\[0, 1]** per step; the baseline caps **episode score** at **1.0** (sum of step rewards).
34
+
35
+ ## Action & observation
36
+
37
+ **Action** (`IncidentResponseAction`): `alert_id`, `action_type`, `notes`.
38
+
39
+ **Observation**: `alerts[]` (id, title, severity, description, source), `resolved_alerts`, `system_health`, `step_number`, `message` (grader feedback), plus top-level `reward` / `done` from the HTTP wrapper.
40
+
41
+ **State** (`GET /state`): `episode_id`, `step_count`, `task_id`, `max_steps`, `total_reward`, `scenario_name`.
42
+
43
+ ## Quick start (local)
44
+
45
+ ```bash
46
+ cd incident_response_env
47
+ uv sync
48
+ uv run uvicorn server.app:app --host 0.0.0.0 --port 8000
49
+ ```
50
+
51
+ Health:
52
+
53
+ ```bash
54
+ curl -s http://127.0.0.1:8000/health
55
+ ```
56
+
57
+ Reset & step (example):
58
+
59
+ ```bash
60
+ curl -s -X POST http://127.0.0.1:8000/reset -H "Content-Type: application/json" -d "{\"task_id\":\"task_easy\"}"
61
+ curl -s -X POST http://127.0.0.1:8000/step -H "Content-Type: application/json" -d "{\"action\":{\"alert_id\":\"disk-alert-1\",\"action_type\":\"scale_up\",\"notes\":\"Relieve disk pressure\"}}"
62
+ ```
63
+
64
+ ## Baseline inference (`inference.py`)
65
+
66
+ Required for the hackathon harness (OpenAI-compatible client):
67
+
68
+ ```powershell
69
+ $env:ENV_URL = "http://127.0.0.1:8000"
70
+ $env:API_BASE_URL = "https://router.huggingface.co/v1"
71
+ $env:MODEL_NAME = "<model id>"
72
+ $env:HF_TOKEN = "<hf token>"
73
+ uv run python inference.py
74
+ ```
75
+
76
+ Optional:
77
+
78
+ - `INFERENCE_STUB=1` - run without an LLM (deterministic policy) for CI or smoke tests.
79
+ - `INFERENCE_SUMMARY=1` - print an extra `[SUMMARY]` line (omit for strict stdout parsers).
80
+
81
+ **Windows / `openenv push`:** if you see `charmap` codec errors, set UTF-8 mode before push: `set PYTHONUTF8=1` (cmd) or `$env:PYTHONUTF8="1"` (PowerShell).
82
+
83
+ Runtime: keep total wall clock **under 20 minutes**; use a small instruct model if needed.
84
+
85
+ ## Docker
86
+
87
+ ```bash
88
+ docker build -t incident-response-env -f server/Dockerfile .
89
+ docker run --rm -p 8000:8000 incident-response-env
90
+ ```
91
+
92
+ ## Phase 8 - Deploy to Hugging Face Spaces
93
+
94
+ Create a **Docker** Space on Hugging Face named e.g. `incident-response-env` (must match your repo id if you use CLI defaults).
95
+
96
+ ### Method A - OpenEnv CLI (recommended)
97
+
98
+ Current OpenEnv packages expose **`openenv push`**, not `openenv deploy` (if your course PDF says `deploy`, use **`push`** with the same repo id).
99
+
100
+ ```bash
101
+ # one-time
102
+ huggingface-cli login # paste HF token when prompted
103
+
104
+ cd incident_response_env
105
+ openenv validate
106
+ openenv push --repo-id YOUR_HF_USERNAME/incident-response-env
107
+ # Windows: try PYTHONUTF8=1 if push fails on encoding
108
+ # add --private if required; use --no-interface if you hit Gradio/UI issues
109
+ ```
110
+
111
+ `openenv push` reads **`openenv.yaml`** and uploads the environment; the repo root **`Dockerfile`** is used by HF's Docker SDK builder.
112
+
113
+ ### Method B - Manual Git push
114
+
115
+ ```bash
116
+ cd incident_response_env
117
+ git init
118
+ git remote add origin https://huggingface.co/spaces/YOUR_HF_USERNAME/incident-response-env
119
+ git add .
120
+ git commit -m "OpenEnv incident-response submission"
121
+ git push -u origin main
122
+ ```
123
+
124
+ Ensure the Space **SDK** is **Docker** on the Hugging Face UI (README front matter already sets `sdk: docker`, `app_port: 8000`).
125
+
126
+ ### After deploy
127
+
128
+ Public app URL is usually:
129
+
130
+ `https://YOUR_HF_USERNAME-incident-response-env.hf.space`
131
+
132
+ Smoke test (no trailing slash issues - use exact host Hugging Face shows):
133
+
134
+ ```bash
135
+ curl -sS https://YOUR_HF_USERNAME-incident-response-env.hf.space/health
136
+ curl -sS -X POST https://YOUR_HF_USERNAME-incident-response-env.hf.space/reset \
137
+ -H "Content-Type: application/json" -d "{}"
138
+ ```
139
+
140
+ Update **`docker_image`** in `openenv.yaml` to `YOUR_HF_USERNAME/incident-response-env` for documentation consistency.
141
+
142
+ Then run **`openenv validate --url https://...hf.space`** and your organizer's pre-submission script.
143
+
144
+ ## Validate before submit
145
+
146
+ ```bash
147
+ openenv validate
148
+ # optional: runtime check against a deployed URL
149
+ openenv validate --url https://<your-space>.hf.space
150
+ ```
151
+
152
+ ## Pre-submission checklist (Round 1)
153
+
154
+ Cross-check with the official dashboard (e.g. Scaler / Meta OpenEnv Round 1):
155
+
156
+ - [ ] **`inference.py`** at repo root; uses **`OpenAI`** client + **`API_BASE_URL`**, **`MODEL_NAME`**, **`HF_TOKEN`**
157
+ - [ ] Stdout: **`[START]`**, **`[STEP]`**, **`[END]`** only (avoid `INFERENCE_SUMMARY` for automated parsing)
158
+ - [ ] **`openenv validate`** OK; **`uv.lock`** present if required
159
+ - [ ] **Dockerfile** builds in CI
160
+ - [ ] **HF Space** up; health + reset respond
161
+ - [ ] **>= 3 tasks** with graders; rewards in **[0, 1]**
162
+ - [ ] **README** describes domain, spaces, setup (this file)
163
+ - [ ] No secrets in git; rotate any leaked tokens
164
+
165
+ ## Project layout
166
+
167
+ ```
168
+ incident_response_env/
169
+ |-- Dockerfile # HF Spaces default path (same image as server/Dockerfile)
170
+ |-- .dockerignore
171
+ |-- inference.py # Hackathon baseline (LLM + env HTTP)
172
+ |-- openenv.yaml
173
+ |-- models.py
174
+ |-- client.py # WebSocket EnvClient wrapper
175
+ |-- pyproject.toml
176
+ |-- uv.lock
177
+ `-- server/
178
+ |-- app.py # FastAPI app
179
+ |-- environment.py # reset / step / state
180
+ |-- scenarios.py
181
+ |-- graders.py
182
+ |-- Dockerfile
183
+ `-- requirements.txt
184
+ ```
185
+
186
+ ## Client (WebSocket)
187
+
188
+ ```python
189
+ from incident_response_env import IncidentResponseEnv, IncidentResponseAction
190
+
191
+ with IncidentResponseEnv(base_url="http://localhost:8000") as env:
192
+ r = env.reset(task_id="task_easy") # pass kwargs your server accepts
193
+ r = env.step(
194
+ IncidentResponseAction(
195
+ alert_id="disk-alert-1",
196
+ action_type="scale_up",
197
+ notes="Expand storage.",
198
+ )
199
+ )
200
+ ```
201
+
202
+ (See `client.py` for `_step_payload` / parsing details.)
203
+
204
+ ## License
205
+
206
+ See `LICENSE` in the repository.
__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Incident Response Env Environment."""
8
+
9
+ from .client import IncidentResponseEnv
10
+ from .models import IncidentResponseAction, IncidentResponseObservation
11
+
12
+ __all__ = [
13
+ "IncidentResponseAction",
14
+ "IncidentResponseObservation",
15
+ "IncidentResponseEnv",
16
+ ]
client.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Incident Response Env Environment Client."""
8
+
9
+ from typing import Dict
10
+
11
+ from openenv.core import EnvClient
12
+ from openenv.core.client_types import StepResult
13
+
14
+ try:
15
+ from .models import (
16
+ IncidentResponseAction,
17
+ IncidentResponseObservation,
18
+ IncidentState,
19
+ )
20
+ except ImportError: # pragma: no cover
21
+ from models import (
22
+ IncidentResponseAction,
23
+ IncidentResponseObservation,
24
+ IncidentState,
25
+ )
26
+
27
+
28
+ class IncidentResponseEnv(
29
+ EnvClient[IncidentResponseAction, IncidentResponseObservation, IncidentState]
30
+ ):
31
+ """
32
+ Client for the Incident Response Env Environment.
33
+
34
+ This client maintains a persistent WebSocket connection to the environment server,
35
+ enabling efficient multi-step interactions with lower latency.
36
+ Each client instance has its own dedicated environment session on the server.
37
+
38
+ Example:
39
+ >>> # Connect to a running server
40
+ >>> with IncidentResponseEnv(base_url="http://localhost:8000") as client:
41
+ ... result = client.reset()
42
+ ... print(result.observation.alerts)
43
+ ...
44
+ ... result = client.step(
45
+ ... IncidentResponseAction(
46
+ ... alert_id="disk-alert-1",
47
+ ... action_type="scale_up",
48
+ ... notes="Relieve disk pressure.",
49
+ ... )
50
+ ... )
51
+ ... print(result.observation.message)
52
+
53
+ Example with Docker:
54
+ >>> # Automatically start container and connect
55
+ >>> client = IncidentResponseEnv.from_docker_image("incident_response_env-env:latest")
56
+ >>> try:
57
+ ... result = client.reset()
58
+ ... result = client.step(IncidentResponseAction(message="Test"))
59
+ ... finally:
60
+ ... client.close()
61
+ """
62
+
63
+ def _step_payload(self, action: IncidentResponseAction) -> Dict:
64
+ """
65
+ Convert IncidentResponseAction to JSON payload for step message.
66
+
67
+ Args:
68
+ action: IncidentResponseAction instance
69
+
70
+ Returns:
71
+ Dictionary representation suitable for JSON encoding
72
+ """
73
+ return {
74
+ "action": {
75
+ "alert_id": getattr(action, "alert_id", ""),
76
+ "action_type": getattr(action, "action_type", "investigate"),
77
+ "notes": getattr(action, "notes", ""),
78
+ }
79
+ }
80
+
81
+ def _parse_result(self, payload: Dict) -> StepResult[IncidentResponseObservation]:
82
+ """
83
+ Parse server response into StepResult[IncidentResponseObservation].
84
+
85
+ Args:
86
+ payload: JSON response data from server
87
+
88
+ Returns:
89
+ StepResult with IncidentResponseObservation
90
+ """
91
+ obs_data = dict(payload.get("observation") or {})
92
+ obs_data.setdefault("done", payload.get("done", False))
93
+ obs_data.setdefault("reward", payload.get("reward"))
94
+ observation = IncidentResponseObservation.model_validate(obs_data)
95
+
96
+ return StepResult(
97
+ observation=observation,
98
+ reward=payload.get("reward"),
99
+ done=payload.get("done", False),
100
+ )
101
+
102
+ def _parse_state(self, payload: Dict) -> IncidentState:
103
+ """
104
+ Parse server response into IncidentState.
105
+
106
+ Args:
107
+ payload: JSON response from state request
108
+
109
+ Returns:
110
+ IncidentState for this environment
111
+ """
112
+ return IncidentState.model_validate(payload)
inference.py ADDED
@@ -0,0 +1,470 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import time
4
+ import re
5
+ from typing import Any, Dict, List, Optional, Tuple
6
+
7
+ import requests
8
+ from openai import OpenAI
9
+
10
+ ENV_URL = os.environ.get("ENV_URL", "http://localhost:8000")
11
+ # Benchmark name for [START] line (hackathon sample uses env=<benchmark>)
12
+ BENCHMARK = os.environ.get("INCIDENT_BENCHMARK", "incident_response_env")
13
+ SUCCESS_SCORE_THRESHOLD = float(os.environ.get("SUCCESS_SCORE_THRESHOLD", "0.1"))
14
+ # Stricter bar for reporting "all tasks strong" (e.g. leaderboard psychologics).
15
+ STRICT_TASK_SCORE = float(os.environ.get("STRICT_TASK_SCORE", "0.95"))
16
+
17
+ SYSTEM_PROMPT = """You are an expert Site Reliability Engineer (SRE).
18
+ You receive a JSON object each turn (not raw alert list only). It includes task_id, step, alerts,
19
+ resolved_alert_ids, and environment_message from the simulator.
20
+
21
+ CRITICAL RULES:
22
+ - alert_id MUST be copied exactly from one of the "id" fields in the CURRENT alerts array.
23
+ - NEVER invent IDs. NEVER reuse IDs from prior tasks or examples.
24
+ - Read environment_message every step. If it says "Out of order" or reward was low, change strategy:
25
+ follow the rules for your task_id below.
26
+
27
+ TASK-SPECIFIC POLICY (use task_id from the JSON):
28
+ - task_easy: Usually one disk/storage alert. action_type scale_up on THAT alert's id.
29
+ - task_medium: Database pool / root cause (often id db-001). Remediate root before symptoms;
30
+ scale_up or fix on the DB alert id is appropriate.
31
+ - task_hard: Cascading service failure. Alert ids look like svc-001, svc-002, svc-003, ...
32
+ You MUST remediate in strict numeric order: among alerts still present, pick the id with the
33
+ **smallest** N in svc-NNN (e.g. svc-001 before svc-005). Use action_type **fix** for that
34
+ upstream failing service unless the alert text clearly indicates capacity-only (then scale_up).
35
+ Do not pick a higher-numbered svc while a lower-numbered one is still in alerts.
36
+
37
+ Available action_type values:
38
+ - scale_up, fix, restart, rollback, mitigate, remediate, isolate, block
39
+
40
+ Always provide non-empty "notes" (one sentence). When the chosen alert has a "source" field
41
+ (e.g. auth-service, database), mention that exact string in notes - it aligns with grading bonuses.
42
+
43
+ Respond ONLY as JSON:
44
+ {
45
+ "alert_id": "string (required)",
46
+ "action_type": "string (required)",
47
+ "notes": "string (required - brief, include source when present)"
48
+ }
49
+ """
50
+
51
+
52
+ def _task_hard_chain_head_id(alerts: Any) -> Optional[str]:
53
+ """Smallest svc-NNN among active alerts, for ordering hints."""
54
+ if not isinstance(alerts, list):
55
+ return None
56
+ best: Optional[Tuple[int, str]] = None
57
+ pattern = re.compile(r"^svc-(\d+)$", re.IGNORECASE)
58
+ for a in alerts:
59
+ if not isinstance(a, dict):
60
+ continue
61
+ raw = str(a.get("id", "")).strip()
62
+ m = pattern.match(raw)
63
+ if not m:
64
+ continue
65
+ n = int(m.group(1))
66
+ if best is None or n < best[0]:
67
+ best = (n, raw)
68
+ return best[1] if best else None
69
+
70
+
71
+ def _build_llm_user_payload(*, task_id: str, step: int, obs: Dict[str, Any]) -> str:
72
+ alerts = obs.get("alerts") or []
73
+ payload: Dict[str, Any] = {
74
+ "task_id": task_id,
75
+ "step": step,
76
+ "alerts": alerts,
77
+ "resolved_alert_ids": obs.get("resolved_alerts") or [],
78
+ "environment_message": obs.get("message") or "",
79
+ }
80
+ if task_id == "task_hard":
81
+ head = _task_hard_chain_head_id(alerts)
82
+ if head:
83
+ payload["cascade_next_id_hint"] = (
84
+ f"Lowest-index unresolved svc in this list is {head!r}; prefer that alert_id."
85
+ )
86
+ return json.dumps(payload, ensure_ascii=False)
87
+
88
+ _TASK_MAX_STEPS = {"task_easy": 5, "task_medium": 10, "task_hard": 20}
89
+
90
+
91
+ def _truthy_env(name: str) -> bool:
92
+ return os.environ.get(name, "").strip().lower() in ("1", "true", "yes")
93
+
94
+
95
+ def _stub_action(
96
+ task_id: str, obs: Dict[str, Any], episode_alert_ids: List[str]
97
+ ) -> Dict[str, Any]:
98
+ """Deterministic policy for local runs without an LLM (INFERENCE_STUB=1)."""
99
+ alerts = obs.get("alerts") or []
100
+ resolved = set(
101
+ str(x) for x in (obs.get("resolved_alerts") or []) if x is not None
102
+ )
103
+ active_ids = [
104
+ str(a["id"])
105
+ for a in alerts
106
+ if isinstance(a, dict) and a.get("id")
107
+ ]
108
+ workable = [i for i in episode_alert_ids if i not in resolved]
109
+
110
+ if task_id == "task_easy":
111
+ aid = (
112
+ "disk-alert-1"
113
+ if "disk-alert-1" in active_ids
114
+ else (active_ids[0] if active_ids else "")
115
+ )
116
+ return {"alert_id": aid, "action_type": "scale_up", "notes": "stub policy"}
117
+
118
+ if task_id == "task_medium":
119
+ aid = "db-001" if "db-001" in active_ids else ""
120
+ if not aid:
121
+ pick = _pick_fallback_alert_id(alerts, workable)
122
+ aid = pick or ""
123
+ return {"alert_id": aid, "action_type": "scale_up", "notes": "stub policy"}
124
+
125
+ aid = active_ids[0] if active_ids else ""
126
+ if not aid:
127
+ pick = _pick_fallback_alert_id(alerts, workable)
128
+ aid = pick or ""
129
+ return {"alert_id": aid, "action_type": "fix", "notes": "stub policy"}
130
+
131
+
132
+ def log_start(*, task: str, env: str, model: str) -> None:
133
+ print(f"[START] task={task} env={env} model={model}", flush=True)
134
+
135
+
136
+ def log_step(
137
+ *,
138
+ step: int,
139
+ action_str: str,
140
+ reward: float,
141
+ done: bool,
142
+ error: Optional[str],
143
+ ) -> None:
144
+ error_val = error if error else "null"
145
+ done_val = str(done).lower()
146
+ print(
147
+ f"[STEP] step={step} action={action_str} reward={reward:.2f} done={done_val} error={error_val}",
148
+ flush=True,
149
+ )
150
+
151
+
152
+ def log_end(
153
+ *,
154
+ success: bool,
155
+ steps: int,
156
+ score: float,
157
+ rewards: List[float],
158
+ ) -> None:
159
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
160
+ print(
161
+ f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
162
+ flush=True,
163
+ )
164
+
165
+
166
+ def _normalize_step_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
167
+ if "observation" in payload and isinstance(payload["observation"], dict):
168
+ obs = dict(payload["observation"])
169
+ if "done" in payload:
170
+ obs["done"] = payload["done"]
171
+ if "reward" in payload:
172
+ obs["reward"] = payload["reward"]
173
+ return obs
174
+ return payload
175
+
176
+
177
+ def _parse_action(text: str) -> Tuple[Dict[str, Any], str]:
178
+ raw = text.strip()
179
+ try:
180
+ data = json.loads(raw)
181
+ except Exception:
182
+ start = raw.find("{")
183
+ end = raw.rfind("}")
184
+ if start != -1 and end != -1 and end > start:
185
+ try:
186
+ data = json.loads(raw[start : end + 1])
187
+ except Exception:
188
+ data = {}
189
+ else:
190
+ data = {}
191
+
192
+ alert_id = str(data.get("alert_id", "")).strip()
193
+ action_type = str(data.get("action_type", "")).strip() or "investigate"
194
+ raw_notes = data.get("notes", data.get("reasoning", ""))
195
+ notes = str(raw_notes).strip()
196
+
197
+ action = {"alert_id": alert_id, "action_type": action_type, "notes": notes}
198
+ return action, notes
199
+
200
+
201
+ def _action_str(action: Dict[str, Any]) -> str:
202
+ return json.dumps(action, ensure_ascii=False, separators=(",", ":"))
203
+
204
+
205
+ def _alert_ids_from_obs(alerts: Any) -> List[str]:
206
+ out: List[str] = []
207
+ if not isinstance(alerts, list):
208
+ return out
209
+ for a in alerts:
210
+ if isinstance(a, dict) and a.get("id"):
211
+ out.append(str(a["id"]))
212
+ return out
213
+
214
+
215
+ def _pick_fallback_alert_id(
216
+ alerts: Any, unresolved_ordered: List[str]
217
+ ) -> Optional[str]:
218
+ """
219
+ Pick a valid unresolved id. Only considers rows in `alerts` whose id is in
220
+ unresolved_ordered - never return an id that leaked into `alerts` from elsewhere.
221
+ Prefer critical among those rows, else first matching row, else first in episode order.
222
+ """
223
+ if not unresolved_ordered:
224
+ return None
225
+ allowed = set(unresolved_ordered)
226
+ if isinstance(alerts, list) and alerts:
227
+ for a in alerts:
228
+ if not isinstance(a, dict):
229
+ continue
230
+ aid = str(a.get("id", "")).strip()
231
+ if aid not in allowed:
232
+ continue
233
+ if a.get("severity") == "critical":
234
+ return aid
235
+ for a in alerts:
236
+ if not isinstance(a, dict):
237
+ continue
238
+ aid = str(a.get("id", "")).strip()
239
+ if aid in allowed:
240
+ return aid
241
+ return unresolved_ordered[0]
242
+
243
+
244
+ def _sanitize_action(
245
+ action: Dict[str, Any],
246
+ obs: Dict[str, Any],
247
+ episode_alert_ids: List[str],
248
+ ) -> Dict[str, Any]:
249
+ """
250
+ If the model hallucinates an alert_id (e.g. disk-alert-1 from a prior task), repair.
251
+
252
+ Only ids that appeared in the initial reset for THIS episode are valid - never trust
253
+ the model to invent ids. Also avoid targeting an id already in resolved_alerts.
254
+ """
255
+ if not episode_alert_ids:
256
+ return action
257
+
258
+ alerts = obs.get("alerts", [])
259
+ resolved = set(str(x) for x in (obs.get("resolved_alerts") or []) if x is not None)
260
+ aid = str(action.get("alert_id", "")).strip()
261
+
262
+ epi_set = set(episode_alert_ids)
263
+ workable = [i for i in episode_alert_ids if i not in resolved]
264
+
265
+ ok = aid in epi_set and aid not in resolved
266
+ if ok:
267
+ return action
268
+
269
+ out = dict(action)
270
+ # Preserve reset order (important for cascade chain: svc-001 before svc-002, ...)
271
+ unresolved_for_pick = workable
272
+ chosen = _pick_fallback_alert_id(alerts, unresolved_for_pick)
273
+ if chosen is None:
274
+ return out
275
+
276
+ bad = aid or "(empty)"
277
+ note = str(out.get("notes", "")).strip()
278
+ reason = (
279
+ "not part of this episode's alerts"
280
+ if aid not in epi_set
281
+ else "already resolved"
282
+ )
283
+ repair = f"Invalid alert_id {bad!r} ({reason}); using {chosen}."
284
+ out["alert_id"] = chosen
285
+ out["notes"] = f"{repair} {note}".strip()
286
+ return out
287
+
288
+
289
+ def run_episode(
290
+ *,
291
+ task_id: str,
292
+ client: Optional[OpenAI],
293
+ model_name: str,
294
+ use_stub: bool,
295
+ ) -> float:
296
+ """Run one benchmark task; return episode score in [0, 1] ( capped sum of step rewards)."""
297
+ rewards: List[float] = []
298
+ steps_taken = 0
299
+ score = 0.0
300
+ success = False
301
+
302
+ log_start(task=task_id, env=BENCHMARK, model=model_name)
303
+
304
+ try:
305
+ reset_payload = requests.post(
306
+ f"{ENV_URL}/reset", json={"task_id": task_id}, timeout=15
307
+ ).json()
308
+ obs = _normalize_step_payload(reset_payload)
309
+ episode_alert_ids = _alert_ids_from_obs(obs.get("alerts", []))
310
+
311
+ max_loops = int(
312
+ obs.get("max_steps") or _TASK_MAX_STEPS.get(task_id, 20) or 20
313
+ )
314
+ t0 = time.time()
315
+ step = 0
316
+
317
+ while not bool(obs.get("done", False)):
318
+ if time.time() - t0 > 60 * 15:
319
+ break
320
+ if step >= max_loops:
321
+ break
322
+
323
+ step += 1
324
+ err: Optional[str] = None
325
+ reward = 0.0
326
+ done = False
327
+ action_line = "{}"
328
+
329
+ action: Dict[str, Any] = {}
330
+ try:
331
+ alerts = obs.get("alerts", [])
332
+
333
+ if use_stub:
334
+ action = _stub_action(task_id, obs, episode_alert_ids)
335
+ else:
336
+ assert client is not None
337
+ user_content = _build_llm_user_payload(
338
+ task_id=task_id, step=step, obs=obs
339
+ )
340
+ response = client.chat.completions.create(
341
+ model=model_name,
342
+ messages=[
343
+ {"role": "system", "content": SYSTEM_PROMPT},
344
+ {"role": "user", "content": user_content},
345
+ ],
346
+ temperature=0.0,
347
+ )
348
+ action, _notes = _parse_action(
349
+ response.choices[0].message.content or ""
350
+ )
351
+ if not action.get("alert_id"):
352
+ if isinstance(alerts, list) and alerts:
353
+ first = alerts[0]
354
+ if isinstance(first, dict) and "id" in first:
355
+ action["alert_id"] = first["id"]
356
+
357
+ action = _sanitize_action(action, obs, episode_alert_ids)
358
+
359
+ action_line = _action_str(action)
360
+
361
+ step_payload = requests.post(
362
+ f"{ENV_URL}/step", json={"action": action}, timeout=15
363
+ ).json()
364
+ obs = _normalize_step_payload(step_payload)
365
+
366
+ reward = float(obs.get("reward") or 0.0)
367
+ done = bool(obs.get("done", False))
368
+ rewards.append(reward)
369
+ steps_taken = step
370
+ except Exception as exc:
371
+ err = str(exc).replace("\n", " ")
372
+ rewards.append(0.0)
373
+ steps_taken = step
374
+ # Avoid empty action in logs when the LLM or HTTP provider fails (e.g. 402).
375
+ if not action:
376
+ fb_id = _pick_fallback_alert_id(
377
+ obs.get("alerts", []),
378
+ [
379
+ i
380
+ for i in episode_alert_ids
381
+ if i
382
+ not in set(
383
+ str(x)
384
+ for x in (obs.get("resolved_alerts") or [])
385
+ if x is not None
386
+ )
387
+ ],
388
+ )
389
+ if fb_id:
390
+ action = {
391
+ "alert_id": fb_id,
392
+ "action_type": "investigate",
393
+ "notes": "LLM/API error; no step sent.",
394
+ }
395
+ action_line = _action_str(action) if action else "{}"
396
+
397
+ log_step(
398
+ step=step,
399
+ action_str=action_line,
400
+ reward=reward,
401
+ done=done,
402
+ error=err,
403
+ )
404
+
405
+ if err is not None:
406
+ break
407
+ if done:
408
+ break
409
+
410
+ total = sum(rewards) if rewards else 0.0
411
+ score = min(max(total, 0.0), 1.0)
412
+ success = score >= SUCCESS_SCORE_THRESHOLD
413
+
414
+ finally:
415
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
416
+
417
+ return score
418
+
419
+
420
+ def main() -> None:
421
+ use_stub = _truthy_env("INFERENCE_STUB")
422
+
423
+ if use_stub:
424
+ model_name = os.environ.get("MODEL_NAME", "stub-local")
425
+ client: Optional[OpenAI] = None
426
+ else:
427
+ required = ("API_BASE_URL", "MODEL_NAME", "HF_TOKEN")
428
+ missing = [k for k in required if not os.environ.get(k)]
429
+ if missing:
430
+ raise SystemExit(
431
+ "Set these environment variables before running inference.py: "
432
+ + ", ".join(missing)
433
+ + " - or set INFERENCE_STUB=1 to run against the env with a built-in policy "
434
+ "(no LLM)."
435
+ )
436
+ client = OpenAI(
437
+ api_key=os.environ["HF_TOKEN"],
438
+ base_url=os.environ["API_BASE_URL"],
439
+ )
440
+ model_name = os.environ["MODEL_NAME"]
441
+
442
+ tasks = ["task_easy", "task_medium", "task_hard"]
443
+ episode_scores: List[Tuple[str, float]] = []
444
+ for task in tasks:
445
+ ep_score = run_episode(
446
+ task_id=task,
447
+ client=client,
448
+ model_name=model_name,
449
+ use_stub=use_stub,
450
+ )
451
+ episode_scores.append((task, ep_score))
452
+
453
+ # Hackathon evaluators may parse stdout strictly ([START]/[STEP]/[END] only).
454
+ # Set INFERENCE_SUMMARY=1 for an extra aggregate line (local leaderboards).
455
+ if _truthy_env("INFERENCE_SUMMARY"):
456
+ scores_only = [s for _, s in episode_scores]
457
+ mean_score = sum(scores_only) / len(scores_only) if scores_only else 0.0
458
+ min_score = min(scores_only) if scores_only else 0.0
459
+ strict_ok = all(s >= STRICT_TASK_SCORE for s in scores_only)
460
+ parts = ",".join(f"{t}:{v:.3f}" for t, v in episode_scores)
461
+ print(
462
+ f"[SUMMARY] mean_score={mean_score:.3f} min_score={min_score:.3f} "
463
+ f"strict_all_ge_{STRICT_TASK_SCORE:g}={str(strict_ok).lower()} "
464
+ f"per_task={parts}",
465
+ flush=True,
466
+ )
467
+
468
+
469
+ if __name__ == "__main__":
470
+ main()
models.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for the Incident Response Env Environment.
9
+
10
+ These types form the environment API contract: the agent sends an Action and receives
11
+ an Observation, and external validators can query State.
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ from typing import List, Literal, Optional
17
+
18
+ from openenv.core.env_server.types import Action, Observation, State
19
+ from pydantic import Field
20
+
21
+
22
+ Severity = Literal["low", "medium", "high", "critical"]
23
+
24
+
25
+ class Alert(Observation):
26
+ id: str = Field(..., description="Stable unique alert identifier")
27
+ title: str = Field(..., description="Short alert title")
28
+ severity: Severity = Field(..., description="Alert severity")
29
+ description: str = Field(..., description="Human-readable alert context")
30
+ source: str = Field(default="unknown", description="Alert source system")
31
+
32
+
33
+ class IncidentAction(Action):
34
+ alert_id: str = Field(default="", description="Which alert the agent is acting on")
35
+ action_type: str = Field(
36
+ default="investigate",
37
+ description=(
38
+ "investigate|scale_up|restart|rollback|fix|mitigate|remediate|isolate|block"
39
+ ),
40
+ )
41
+ notes: str = Field(default="", description="Optional reasoning or context for the action")
42
+
43
+
44
+ class IncidentObservation(Observation):
45
+ alerts: List[Alert] = Field(default_factory=list, description="Active alerts")
46
+ resolved_alerts: List[str] = Field(
47
+ default_factory=list, description="Alert IDs resolved so far"
48
+ )
49
+ system_health: float = Field(
50
+ default=1.0, ge=0.0, le=1.0, description="0.0–1.0 overall health"
51
+ )
52
+ step_number: int = Field(default=0, ge=0, description="Current step count")
53
+ message: str = Field(default="", description="Environment feedback to the agent")
54
+
55
+
56
+ class IncidentState(State):
57
+ task_id: str = Field(default="task_easy")
58
+ max_steps: int = Field(default=0, ge=0)
59
+ total_reward: float = Field(default=0.0)
60
+ scenario_name: str = Field(default="unknown")
61
+
62
+
63
+ # Backwards-compat aliases (older template names).
64
+ IncidentResponseAction = IncidentAction
65
+ IncidentResponseObservation = IncidentObservation
openenv.yaml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: incident-response-env
2
+ version: "1.0.0"
3
+ description: >
4
+ Train AI agents to triage real-world server incidents.
5
+ Agent receives firing alerts and must identify root causes,
6
+ prioritize correctly, and resolve cascading failures.
7
+
8
+ tasks:
9
+ - id: task_easy
10
+ name: Single Alert Triage
11
+ difficulty: easy
12
+ max_steps: 5
13
+ - id: task_medium
14
+ name: Root Cause Identification
15
+ difficulty: medium
16
+ max_steps: 10
17
+ - id: task_hard
18
+ name: Cascading Failure Resolution
19
+ difficulty: hard
20
+ max_steps: 20
21
+
22
+ observation_space:
23
+ type: structured
24
+ description: >
25
+ Active incident state: alerts (list of {id, title, severity, description, source}),
26
+ resolved_alerts (alert IDs cleared so far), system_health (0.0-1.0), step_number,
27
+ and message (environment feedback). Reward and done are returned by the OpenEnv
28
+ step wrapper alongside the observation.
29
+
30
+ action_space:
31
+ type: structured
32
+ description: >
33
+ Structured remediation. action_type must be one of: investigate, scale_up,
34
+ restart, rollback, fix, mitigate, remediate, isolate, block (see models.IncidentAction).
35
+ fields:
36
+ - alert_id: string
37
+ - action_type: string
38
+ - notes: string
39
+ action_type_allowed:
40
+ - investigate
41
+ - scale_up
42
+ - restart
43
+ - rollback
44
+ - fix
45
+ - mitigate
46
+ - remediate
47
+ - isolate
48
+ - block
49
+
50
+ reward_range: [0.0, 1.0]
51
+ # Set to your Hugging Face Docker Space repo before deploy, e.g. username/incident-response-env
52
+ docker_image: your-hf-username/incident-response-env
openenv_incident_response_env.egg-info/PKG-INFO ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: openenv-incident_response_env
3
+ Version: 0.1.0
4
+ Summary: Incident Response Env environment for OpenEnv
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: openenv-core[core]>=0.2.2
7
+ Requires-Dist: openai>=1.0.0
8
+ Requires-Dist: requests>=2.32.0
9
+ Provides-Extra: dev
10
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
11
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_incident_response_env.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ ./__init__.py
4
+ ./client.py
5
+ ./inference.py
6
+ ./models.py
7
+ openenv_incident_response_env.egg-info/PKG-INFO
8
+ openenv_incident_response_env.egg-info/SOURCES.txt
9
+ openenv_incident_response_env.egg-info/dependency_links.txt
10
+ openenv_incident_response_env.egg-info/entry_points.txt
11
+ openenv_incident_response_env.egg-info/requires.txt
12
+ openenv_incident_response_env.egg-info/top_level.txt
13
+ server/__init__.py
14
+ server/app.py
15
+ server/environment.py
16
+ server/graders.py
17
+ server/incident_response_env_environment.py
18
+ server/scenarios.py
openenv_incident_response_env.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
openenv_incident_response_env.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = incident_response_env.server.app:main
openenv_incident_response_env.egg-info/requires.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.2
2
+ openai>=1.0.0
3
+ requests>=2.32.0
4
+
5
+ [dev]
6
+ pytest>=8.0.0
7
+ pytest-cov>=4.0.0
openenv_incident_response_env.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ incident_response_env
pyproject.toml ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-incident_response_env"
13
+ version = "0.1.0"
14
+ description = "Incident Response Env environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
18
+ # install from github
19
+ # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
+ "openenv-core[core]>=0.2.2",
21
+ # Environment-specific dependencies
22
+ # Add all dependencies needed for your environment here
23
+ # Examples:
24
+ # "numpy>=1.19.0",
25
+ # "torch>=2.0.0",
26
+ # "gymnasium>=0.29.0",
27
+ # "openspiel>=1.0.0",
28
+ # "smolagents>=1.22.0,<2",
29
+ "openai>=1.0.0",
30
+ "requests>=2.32.0",
31
+ ]
32
+
33
+ [project.optional-dependencies]
34
+ dev = [
35
+ "pytest>=8.0.0",
36
+ "pytest-cov>=4.0.0",
37
+ ]
38
+
39
+ [project.scripts]
40
+ # Server entry point - enables running via: uv run --project . server
41
+ # or: python -m incident_response_env.server.app
42
+ server = "incident_response_env.server.app:main"
43
+
44
+ [tool.setuptools]
45
+ include-package-data = true
46
+ packages = ["incident_response_env", "incident_response_env.server"]
47
+ package-dir = { "incident_response_env" = ".", "incident_response_env.server" = "server" }
server/Dockerfile ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Local / CI alternate path. Hugging Face Docker Spaces use the repo-root Dockerfile.
2
+ FROM python:3.11-slim
3
+
4
+ WORKDIR /app
5
+
6
+ COPY server/requirements.txt .
7
+ RUN pip install --no-cache-dir -r requirements.txt
8
+
9
+ COPY . .
10
+
11
+ EXPOSE 8000
12
+
13
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Incident Response Env environment server components."""
8
+
9
+ from .environment import IncidentResponseEnvironment
10
+
11
+ __all__ = ["IncidentResponseEnvironment"]
server/app.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Incident Response Env Environment.
9
+
10
+ This module creates an HTTP server that exposes the IncidentResponseEnvironment
11
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
12
+
13
+ Endpoints:
14
+ - POST /reset: Reset the environment
15
+ - POST /step: Execute an action
16
+ - GET /state: Get current environment state
17
+ - GET /schema: Get action/observation schemas
18
+ - WS /ws: WebSocket endpoint for persistent sessions
19
+
20
+ Usage:
21
+ # Development (with auto-reload):
22
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
+
24
+ # Production:
25
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
+
27
+ # Or run directly:
28
+ python -m server.app
29
+ """
30
+
31
+ from __future__ import annotations
32
+
33
+ import inspect
34
+
35
+ try:
36
+ from openenv.core.env_server.http_server import create_app
37
+ except Exception as e: # pragma: no cover
38
+ raise ImportError(
39
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
40
+ ) from e
41
+
42
+ try:
43
+ # Docker/validator mode: run as `uvicorn server.app:app` from repo root.
44
+ from fastapi import FastAPI
45
+ from fastapi.routing import APIRoute
46
+ from models import (
47
+ IncidentResponseAction,
48
+ IncidentResponseObservation,
49
+ IncidentState,
50
+ )
51
+ from server.environment import IncidentResponseEnvironment
52
+ except Exception: # pragma: no cover
53
+ # Package mode: `uvicorn incident_response_env.server.app:app`
54
+ from fastapi import FastAPI
55
+ from fastapi.routing import APIRoute
56
+
57
+ from models import (
58
+ IncidentResponseAction,
59
+ IncidentResponseObservation,
60
+ IncidentState,
61
+ )
62
+ from .environment import IncidentResponseEnvironment
63
+
64
+ # OpenEnv's HTTP /reset and /step handlers invoke the factory for every request.
65
+ # A fresh Environment per request breaks episode state (each /step would hit
66
+ # scenario=None and fall back to task_easy). Use a single shared instance so
67
+ # stateless HTTP clients behave like one continuous episode.
68
+ _shared_incident_env = IncidentResponseEnvironment()
69
+
70
+
71
+ def incident_env_factory() -> IncidentResponseEnvironment:
72
+ return _shared_incident_env
73
+
74
+
75
+ # Create the app. Prefer the hackathon-style signature: create_app(factory)
76
+ sig = None
77
+ try: # pragma: no cover
78
+ sig = inspect.signature(create_app)
79
+ except Exception: # pragma: no cover
80
+ sig = None
81
+
82
+ if sig is not None and len(sig.parameters) == 1:
83
+ app = create_app(incident_env_factory) # type: ignore[misc]
84
+ else:
85
+ # Older signature used by the OpenEnv template.
86
+ app = create_app( # type: ignore[misc]
87
+ incident_env_factory,
88
+ IncidentResponseAction,
89
+ IncidentResponseObservation,
90
+ env_name="incident_response_env",
91
+ max_concurrent_envs=1,
92
+ )
93
+
94
+
95
+ def _reregister_state_route(application: FastAPI) -> None:
96
+ """OpenEnv registers GET /state with base State, which omits IncidentState fields."""
97
+ application.router.routes = [
98
+ route
99
+ for route in application.router.routes
100
+ if not (
101
+ isinstance(route, APIRoute)
102
+ and route.path == "/state"
103
+ and "GET" in route.methods
104
+ )
105
+ ]
106
+
107
+ @application.get(
108
+ "/state",
109
+ response_model=IncidentState,
110
+ tags=["State Management"],
111
+ summary="Get current environment state",
112
+ )
113
+ async def incident_state() -> IncidentState:
114
+ return _shared_incident_env.state
115
+
116
+
117
+ _reregister_state_route(app)
118
+
119
+
120
+ def main(host: str = "0.0.0.0", port: int = 8000):
121
+ """
122
+ Entry point for direct execution via uv run or python -m.
123
+
124
+ This function enables running the server without Docker:
125
+ uv run --project . server
126
+ uv run --project . server --port 8001
127
+ python -m incident_response_env.server.app
128
+
129
+ Args:
130
+ host: Host address to bind to (default: "0.0.0.0")
131
+ port: Port number to listen on (default: 8000)
132
+
133
+ For production deployments, consider using uvicorn directly with
134
+ multiple workers:
135
+ uvicorn incident_response_env.server.app:app --workers 4
136
+ """
137
+ import uvicorn
138
+
139
+ uvicorn.run(app, host=host, port=port)
140
+
141
+
142
+ if __name__ == "__main__":
143
+ import argparse
144
+
145
+ parser = argparse.ArgumentParser()
146
+ parser.add_argument("--port", type=int, default=8000)
147
+ args = parser.parse_args()
148
+ # openenv validate expects a literal `main()` substring in this file
149
+ if args.port == 8000:
150
+ main()
151
+ else:
152
+ main(port=args.port)
server/environment.py ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core environment logic for the Incident Response playground.
3
+
4
+ This module intentionally keeps "ground truth" (e.g., root-cause flags) internal and
5
+ never returns it in observations. Agents must infer root cause from context.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import uuid
11
+ from dataclasses import dataclass
12
+ from typing import List, Optional
13
+
14
+ from openenv.core.env_server.interfaces import Environment
15
+
16
+ try:
17
+ from ..models import Alert, IncidentAction, IncidentObservation, IncidentState
18
+ from .graders import IncidentGrader
19
+ from .scenarios import Scenario, ScenarioGenerator
20
+ except ImportError: # pragma: no cover
21
+ from models import Alert, IncidentAction, IncidentObservation, IncidentState
22
+ from server.graders import IncidentGrader
23
+ from server.scenarios import Scenario, ScenarioGenerator
24
+
25
+
26
+ @dataclass
27
+ class _InternalAlert:
28
+ alert: Alert
29
+ is_root_cause: bool
30
+
31
+
32
+ class IncidentResponseEnvironment(Environment):
33
+ # HTTP server uses a process-wide shared instance for /reset + /step; only
34
+ # one logical episode/client should drive it at a time.
35
+ SUPPORTS_CONCURRENT_SESSIONS: bool = False
36
+
37
+ def __init__(self):
38
+ self.grader = IncidentGrader()
39
+ self.scenario: Optional[Scenario] = None
40
+
41
+ self.resolved: List[str] = []
42
+ self.step_count: int = 0
43
+ self.episode_id: str = ""
44
+ self.task_id: str = "task_easy"
45
+ self.total_reward: float = 0.0
46
+ self.current_health: float = 1.0
47
+
48
+ self._alerts: List[_InternalAlert] = []
49
+
50
+ def reset(self, task_id: str = "task_easy", seed: int | None = None): # type: ignore[override]
51
+ self.scenario = ScenarioGenerator.generate(task_id, seed=seed)
52
+ self.resolved = []
53
+ self.step_count = 0
54
+ self.episode_id = str(uuid.uuid4())[:8]
55
+ self.task_id = task_id
56
+ self.total_reward = 0.0
57
+ self.current_health = self.scenario.initial_health
58
+
59
+ self._alerts = [
60
+ _InternalAlert(alert=a, is_root_cause=is_rc)
61
+ for a, is_rc in self.scenario.initial_alerts_internal
62
+ ]
63
+
64
+ return IncidentObservation(
65
+ alerts=self._get_active_alerts(),
66
+ resolved_alerts=[],
67
+ system_health=self.current_health,
68
+ step_number=0,
69
+ done=False,
70
+ reward=0.0,
71
+ message="Incident detected. Begin triage.",
72
+ )
73
+
74
+ def step(self, action: IncidentAction) -> IncidentObservation: # type: ignore[override]
75
+ if self.scenario is None:
76
+ # Be forgiving if a judge/runner forgets to call reset first.
77
+ self.reset(task_id=getattr(action, "task_id", "task_easy"))
78
+
79
+ self.step_count += 1
80
+
81
+ reward, feedback = self.grader.grade(
82
+ action=action,
83
+ scenario=self.scenario, # type: ignore[arg-type]
84
+ step=self.step_count,
85
+ resolved=self.resolved,
86
+ )
87
+ self.total_reward += reward
88
+
89
+ self._maybe_resolve(action)
90
+
91
+ done = self._episode_goal_satisfied() or (
92
+ self.scenario is not None and self.step_count >= self.scenario.max_steps
93
+ )
94
+
95
+ return IncidentObservation(
96
+ alerts=self._get_active_alerts(),
97
+ resolved_alerts=list(self.resolved),
98
+ system_health=self.current_health,
99
+ step_number=self.step_count,
100
+ done=done,
101
+ reward=reward,
102
+ message=feedback,
103
+ )
104
+
105
+ @property
106
+ def state(self) -> IncidentState: # type: ignore[override]
107
+ scenario_name = self.scenario.name if self.scenario is not None else "unknown"
108
+ max_steps = self.scenario.max_steps if self.scenario is not None else 0
109
+ return IncidentState(
110
+ episode_id=self.episode_id,
111
+ task_id=self.task_id,
112
+ step_count=self.step_count,
113
+ max_steps=max_steps,
114
+ total_reward=self.total_reward,
115
+ scenario_name=scenario_name,
116
+ )
117
+
118
+ def _update_health(self, action: IncidentAction) -> None:
119
+ # Simple deterministic health update: remediation actions improve health more.
120
+ delta = 0.02
121
+ if action.action_type in {
122
+ "scale_up",
123
+ "restart",
124
+ "rollback",
125
+ "fix",
126
+ "mitigate",
127
+ "remediate",
128
+ "isolate",
129
+ "block",
130
+ }:
131
+ delta = 0.05
132
+ self.current_health = max(0.0, min(1.0, self.current_health + delta))
133
+
134
+ def _maybe_resolve(self, action: IncidentAction) -> None:
135
+ if self.scenario is None:
136
+ return
137
+ if not action.alert_id or action.alert_id in self.resolved:
138
+ return
139
+
140
+ action_type = (action.action_type or "").lower().strip()
141
+ resolution_actions = {
142
+ "scale_up",
143
+ "restart",
144
+ "rollback",
145
+ "fix",
146
+ "mitigate",
147
+ "remediate",
148
+ "isolate",
149
+ "block",
150
+ }
151
+ if action_type not in resolution_actions:
152
+ return
153
+
154
+ # Hard task: only allow resolving the next upstream link in the chain.
155
+ if self.scenario.kind == "full_cascade_failure" and self.scenario.cascade_chain_alert_ids:
156
+ chain = list(self.scenario.cascade_chain_alert_ids)
157
+ expected_index = 0
158
+ for cid in chain:
159
+ if cid in self.resolved:
160
+ expected_index += 1
161
+ else:
162
+ break
163
+ expected_id = chain[expected_index] if expected_index < len(chain) else None
164
+ if action.alert_id != expected_id:
165
+ return
166
+
167
+ self.resolved.append(action.alert_id)
168
+ self._update_health(action)
169
+
170
+ def _get_active_alerts(self) -> List[Alert]:
171
+ # Never reveal internal root-cause flags.
172
+ active = []
173
+ for entry in self._alerts:
174
+ if entry.alert.id not in self.resolved:
175
+ active.append(entry.alert)
176
+ return active
177
+
178
+ def _all_critical_resolved(self) -> bool:
179
+ for entry in self._alerts:
180
+ if entry.alert.severity == "critical" and entry.alert.id not in self.resolved:
181
+ return False
182
+ return True
183
+
184
+ def _episode_goal_satisfied(self) -> bool:
185
+ """
186
+ Episode ends when the task's success condition is met.
187
+
188
+ Cascade (hard) tasks require every link in cascade_chain_alert_ids to be
189
+ resolved - not only severity:critical rows - so agents earn graded rewards
190
+ along the full chain and total score can reach 1.0.
191
+ """
192
+ if self.scenario is None:
193
+ return False
194
+ if (
195
+ self.scenario.kind == "full_cascade_failure"
196
+ and self.scenario.cascade_chain_alert_ids
197
+ ):
198
+ return all(
199
+ cid in self.resolved
200
+ for cid in self.scenario.cascade_chain_alert_ids
201
+ )
202
+ return self._all_critical_resolved()
203
+
server/graders.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Deterministic scoring logic for the incident response tasks.
3
+
4
+ Implements the 3 required tasks for judging:
5
+ - Task 1 (easy): single obvious alert, single correct action.
6
+ - Task 2 (medium): identify root cause among symptoms, penalize wasted steps.
7
+ - Task 3 (hard): resolve a cascade chain in order.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from dataclasses import dataclass
13
+ from typing import List, Tuple
14
+
15
+ try:
16
+ from ..models import IncidentAction
17
+ from .scenarios import Scenario
18
+ except ImportError: # pragma: no cover
19
+ from models import IncidentAction
20
+ from server.scenarios import Scenario
21
+
22
+
23
+ @dataclass(frozen=True)
24
+ class IncidentGrader:
25
+ _RESOLUTION_ACTIONS = {
26
+ "scale_up",
27
+ "restart",
28
+ "rollback",
29
+ "fix",
30
+ "mitigate",
31
+ "remediate",
32
+ "isolate",
33
+ "block",
34
+ }
35
+
36
+ def grade(
37
+ self,
38
+ *,
39
+ action: IncidentAction,
40
+ scenario: Scenario,
41
+ step: int,
42
+ resolved: List[str],
43
+ ) -> Tuple[float, str]:
44
+ if not action.alert_id:
45
+ return 0.0, "No alert selected. Choose an alert_id to investigate or remediate."
46
+
47
+ if action.alert_id in resolved:
48
+ return 0.0, "That alert was already resolved. Pick an unresolved alert."
49
+
50
+ alert_by_id = {a.id: a for a, _ in scenario.initial_alerts_internal}
51
+ if action.alert_id not in alert_by_id:
52
+ return 0.0, "Unknown alert_id. Pick one of the active alerts."
53
+
54
+ action_type = (action.action_type or "").lower().strip()
55
+ is_resolution = action_type in self._RESOLUTION_ACTIONS
56
+
57
+ if scenario.kind == "disk_full":
58
+ # Required Task 1 grading.
59
+ if action.alert_id != "disk-alert-1":
60
+ return 0.0, "Wrong alert. Triage the disk alert."
61
+ if action_type == "scale_up":
62
+ return 1.0, "Correct: scaled storage to relieve disk pressure."
63
+ return 0.4, "Correct alert, but wrong action_type. Use scale_up."
64
+
65
+ if scenario.kind == "cascading_db_failure":
66
+ # Required Task 2 grading (meaningful reward across steps).
67
+ root_id = scenario.root_cause_alert_id or "db-001"
68
+ if action.alert_id == root_id:
69
+ reward = 1.0 if step == 1 else 0.5
70
+ feedback = "Addressed root cause." + (" Great first move." if step == 1 else "")
71
+ else:
72
+ reward = 0.1
73
+ feedback = "You addressed a symptom; root cause remains unresolved."
74
+
75
+ # End bonus: if this action (once resolved by environment) would complete all critical.
76
+ # We approximate deterministically: if action targets the root cause with a resolution action.
77
+ if is_resolution and action.alert_id == root_id:
78
+ # Count remaining critical alerts besides this one.
79
+ remaining_critical = [
80
+ a.id
81
+ for a, _ in scenario.initial_alerts_internal
82
+ if a.severity == "critical" and a.id not in resolved and a.id != action.alert_id
83
+ ]
84
+ if not remaining_critical:
85
+ reward += 0.3
86
+ feedback += " All critical alerts resolved. Bonus awarded."
87
+
88
+ return min(1.0, reward), feedback
89
+
90
+ # scenario.kind == "full_cascade_failure"
91
+ chain = list(scenario.cascade_chain_alert_ids)
92
+ if not chain:
93
+ return 0.0, "Scenario misconfigured: missing cascade chain."
94
+
95
+ # Determine expected next link in chain based on what's already resolved.
96
+ expected_index = 0
97
+ for cid in chain:
98
+ if cid in resolved:
99
+ expected_index += 1
100
+ else:
101
+ break
102
+
103
+ expected_id = chain[expected_index] if expected_index < len(chain) else None
104
+ if expected_id is None:
105
+ return 0.0, "Cascade already resolved."
106
+
107
+ if action.alert_id == expected_id:
108
+ reward = 0.25
109
+ feedback = "Correct next step in the cascade chain."
110
+ # Bonus if notes mention the correct service/source.
111
+ svc = alert_by_id[expected_id].source
112
+ if svc and svc.lower() in (action.notes or "").lower():
113
+ reward = min(1.0, reward + 0.1)
114
+ feedback += " Reasoning mentions the correct service."
115
+ # If this is the final link, cap to 1.0 total (environment accumulates).
116
+ if expected_index == len(chain) - 1:
117
+ feedback += " Chain complete."
118
+ return reward, feedback
119
+
120
+ return 0.05, "Out of order. Trace dependencies and resolve the next upstream failure first."
121
+
server/incident_response_env_environment.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Incident Response Env Environment Implementation.
9
+
10
+ A simple test environment that echoes back messages sent to it.
11
+ Perfect for testing HTTP server infrastructure.
12
+ """
13
+
14
+ from uuid import uuid4
15
+
16
+ from openenv.core.env_server.interfaces import Environment
17
+ from openenv.core.env_server.types import State
18
+
19
+ try:
20
+ from ..models import IncidentResponseAction, IncidentResponseObservation
21
+ except ImportError:
22
+ from models import IncidentResponseAction, IncidentResponseObservation
23
+
24
+
25
+ class IncidentResponseEnvironment(Environment):
26
+ """
27
+ A simple echo environment that echoes back messages.
28
+
29
+ This environment is designed for testing the HTTP server infrastructure.
30
+ It maintains minimal state and simply echoes back whatever message it receives.
31
+
32
+ Example:
33
+ >>> env = IncidentResponseEnvironment()
34
+ >>> obs = env.reset()
35
+ >>> print(obs.echoed_message) # "Incident Response Env environment ready!"
36
+ >>>
37
+ >>> obs = env.step(IncidentResponseAction(message="Hello"))
38
+ >>> print(obs.echoed_message) # "Hello"
39
+ >>> print(obs.message_length) # 5
40
+ """
41
+
42
+ # Enable concurrent WebSocket sessions.
43
+ # Set to True if your environment isolates state between instances.
44
+ # When True, multiple WebSocket clients can connect simultaneously, each
45
+ # getting their own environment instance (when using factory mode in app.py).
46
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
47
+
48
+ def __init__(self):
49
+ """Initialize the incident_response_env environment."""
50
+ self._state = State(episode_id=str(uuid4()), step_count=0)
51
+ self._reset_count = 0
52
+
53
+ def reset(self) -> IncidentResponseObservation:
54
+ """
55
+ Reset the environment.
56
+
57
+ Returns:
58
+ IncidentResponseObservation with a ready message
59
+ """
60
+ self._state = State(episode_id=str(uuid4()), step_count=0)
61
+ self._reset_count += 1
62
+
63
+ return IncidentResponseObservation(
64
+ echoed_message="Incident Response Env environment ready!",
65
+ message_length=0,
66
+ done=False,
67
+ reward=0.0,
68
+ )
69
+
70
+ def step(self, action: IncidentResponseAction) -> IncidentResponseObservation: # type: ignore[override]
71
+ """
72
+ Execute a step in the environment by echoing the message.
73
+
74
+ Args:
75
+ action: IncidentResponseAction containing the message to echo
76
+
77
+ Returns:
78
+ IncidentResponseObservation with the echoed message and its length
79
+ """
80
+ self._state.step_count += 1
81
+
82
+ message = action.message
83
+ length = len(message)
84
+
85
+ # Simple reward: longer messages get higher rewards
86
+ reward = length * 0.1
87
+
88
+ return IncidentResponseObservation(
89
+ echoed_message=message,
90
+ message_length=length,
91
+ done=False,
92
+ reward=reward,
93
+ metadata={"original_message": message, "step": self._state.step_count},
94
+ )
95
+
96
+ @property
97
+ def state(self) -> State:
98
+ """
99
+ Get the current environment state.
100
+
101
+ Returns:
102
+ Current State with episode_id and step_count
103
+ """
104
+ return self._state
server/requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.2
2
+ fastapi>=0.104.0
3
+ uvicorn>=0.24.0
4
+ pydantic>=2.0.0
5
+ openai>=1.0.0
6
+ requests>=2.32.0
7
+
server/scenarios.py ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Synthetic incident scenario generation (Tasks 1–3).
3
+
4
+ Scenarios contain internal ground truth (e.g., root-cause IDs / chain order) that
5
+ must never be returned to agents directly.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import random
11
+ from dataclasses import dataclass
12
+ from typing import Literal, Sequence, Tuple
13
+
14
+ try:
15
+ from ..models import Alert
16
+ except ImportError: # pragma: no cover
17
+ from models import Alert
18
+
19
+
20
+ ScenarioKind = Literal["disk_full", "cascading_db_failure", "full_cascade_failure"]
21
+
22
+
23
+ @dataclass(frozen=True)
24
+ class Scenario:
25
+ name: str
26
+ kind: ScenarioKind
27
+ max_steps: int
28
+ initial_health: float
29
+ initial_alerts_internal: Sequence[Tuple[Alert, bool]]
30
+
31
+ # Internal ground truth (never shown to agent)
32
+ root_cause_alert_id: str | None = None
33
+ cascade_chain_alert_ids: Sequence[str] = ()
34
+
35
+
36
+ class ScenarioGenerator:
37
+ SERVICE_NAMES = [
38
+ "auth-service",
39
+ "payment-service",
40
+ "user-db",
41
+ "order-service",
42
+ "cache-layer",
43
+ "api-gateway",
44
+ "storage-service",
45
+ "database",
46
+ "user-service",
47
+ ]
48
+
49
+ @staticmethod
50
+ def generate(
51
+ task_id: str, seed: int | None = None, *, n_services: int | None = None, chain_length: int | None = None
52
+ ) -> Scenario:
53
+ """
54
+ Produce unlimited variations via randomness.
55
+
56
+ Note: env.reset() may also seed randomness; passing seed here makes generation
57
+ self-contained for judge harnesses that call ScenarioGenerator directly.
58
+ """
59
+
60
+ if seed is not None:
61
+ random.seed(seed)
62
+
63
+ if task_id == "task_easy":
64
+ return ScenarioGenerator._single_alert()
65
+
66
+ if task_id == "task_medium":
67
+ return ScenarioGenerator._root_cause(n_services=n_services or random.randint(3, 5))
68
+
69
+ # task_hard (or anything else) maps to cascade chain
70
+ return ScenarioGenerator._cascade_chain(chain_length=chain_length or random.randint(3, 5))
71
+
72
+ @staticmethod
73
+ def _pick_services(k: int) -> list[str]:
74
+ names = list(ScenarioGenerator.SERVICE_NAMES)
75
+ random.shuffle(names)
76
+ return names[:k]
77
+
78
+ @staticmethod
79
+ def _single_alert() -> Scenario:
80
+ # Required Task 1 scenario: "disk_full"
81
+ return Scenario(
82
+ name="disk_full",
83
+ kind="disk_full",
84
+ max_steps=3,
85
+ initial_health=0.55,
86
+ initial_alerts_internal=[
87
+ (
88
+ Alert(
89
+ id="disk-alert-1",
90
+ title="Disk at 99%",
91
+ severity="critical",
92
+ description="Storage node nearly out of space. Writes failing intermittently.",
93
+ source="storage-service",
94
+ ),
95
+ True,
96
+ )
97
+ ],
98
+ root_cause_alert_id="disk-alert-1",
99
+ )
100
+
101
+ @staticmethod
102
+ def _root_cause(*, n_services: int) -> Scenario:
103
+ # Required Task 2 scenario: "cascading_db_failure"
104
+ services = ScenarioGenerator._pick_services(max(3, n_services))
105
+ db_service = "database"
106
+ if db_service not in services:
107
+ services[0] = db_service
108
+
109
+ api_service = "api-gateway" if "api-gateway" in services else services[1]
110
+ pay_service = "payment-service" if "payment-service" in services else services[2]
111
+
112
+ alerts: list[Tuple[Alert, bool]] = [
113
+ (
114
+ Alert(
115
+ id="db-001",
116
+ title="DB connection timeout",
117
+ severity="critical",
118
+ description="Database pool exhausted; connections timing out. Downstream services likely impacted.",
119
+ source=db_service,
120
+ ),
121
+ True,
122
+ ),
123
+ (
124
+ Alert(
125
+ id="api-002",
126
+ title="High error rate",
127
+ severity="medium",
128
+ description="5xx rate elevated. Errors correlate with DB timeout spikes.",
129
+ source=api_service,
130
+ ),
131
+ False,
132
+ ),
133
+ (
134
+ Alert(
135
+ id="pay-003",
136
+ title="Requests failing",
137
+ severity="medium",
138
+ description="Payment calls failing with dependency errors (DB).",
139
+ source=pay_service,
140
+ ),
141
+ False,
142
+ ),
143
+ ]
144
+
145
+ # Optionally add one extra noisy alert for variety.
146
+ if n_services >= 4:
147
+ noise_src = services[3]
148
+ alerts.append(
149
+ (
150
+ Alert(
151
+ id="aux-004",
152
+ title="Cache miss rate increased",
153
+ severity="low",
154
+ description="Cache miss rate above baseline; could be secondary effect.",
155
+ source=noise_src,
156
+ ),
157
+ False,
158
+ )
159
+ )
160
+
161
+ return Scenario(
162
+ name="cascading_db_failure",
163
+ kind="cascading_db_failure",
164
+ max_steps=8,
165
+ initial_health=0.6,
166
+ initial_alerts_internal=alerts,
167
+ root_cause_alert_id="db-001",
168
+ )
169
+
170
+ @staticmethod
171
+ def _cascade_chain(*, chain_length: int) -> Scenario:
172
+ # Required Task 3 scenario: "full_cascade_failure"
173
+ chain_services = ["auth-service", "user-service", "order-service", "payment-service"]
174
+ if chain_length != 4:
175
+ # Allow variable length, but keep the "auth → user → order → payment" prefix
176
+ extras = [s for s in ScenarioGenerator.SERVICE_NAMES if s not in chain_services]
177
+ random.shuffle(extras)
178
+ chain_services = (chain_services + extras)[: max(3, chain_length)]
179
+
180
+ chain_ids: list[str] = []
181
+ internal: list[Tuple[Alert, bool]] = []
182
+
183
+ for i, svc in enumerate(chain_services):
184
+ aid = f"svc-{i+1:03d}"
185
+ chain_ids.append(aid)
186
+ next_svc = chain_services[i + 1] if i + 1 < len(chain_services) else None
187
+ hint = (
188
+ f"Downstream impact observed: {next_svc} reporting dependency errors."
189
+ if next_svc
190
+ else "Downstream impact widespread."
191
+ )
192
+ internal.append(
193
+ (
194
+ Alert(
195
+ id=aid,
196
+ title=f"{svc} failing",
197
+ severity="critical" if i == 0 else "high",
198
+ description=f"{svc} error spike. {hint}",
199
+ source=svc,
200
+ ),
201
+ i == 0, # treat first link as "root cause" internally
202
+ )
203
+ )
204
+
205
+ return Scenario(
206
+ name="full_cascade_failure",
207
+ kind="full_cascade_failure",
208
+ max_steps=max(10, len(chain_ids) * 3),
209
+ initial_health=0.45,
210
+ initial_alerts_internal=internal,
211
+ root_cause_alert_id=chain_ids[0],
212
+ cascade_chain_alert_ids=tuple(chain_ids),
213
+ )
214
+
uv.lock ADDED
The diff for this file is too large to render. See raw diff