siddeshwar-kagatikar commited on
Commit
49b9b2f
·
1 Parent(s): 3eeb606

Add OpenEnv HTTP API and submission inference script

Browse files
Dockerfile CHANGED
@@ -11,7 +11,7 @@ ENV HOME=/home/user \
11
 
12
  WORKDIR $HOME/app
13
 
14
- COPY --chown=user pyproject.toml README.md $HOME/app/
15
  COPY --chown=user src $HOME/app/src
16
  COPY --chown=user config $HOME/app/config
17
  COPY --chown=user datasets $HOME/app/datasets
@@ -25,4 +25,3 @@ RUN pip install --no-cache-dir --upgrade pip && \
25
  EXPOSE 7860
26
 
27
  CMD ["sh", "-c", "uvicorn server:app --host 0.0.0.0 --port ${PORT:-7860}"]
28
-
 
11
 
12
  WORKDIR $HOME/app
13
 
14
+ COPY --chown=user pyproject.toml README.md openenv.yaml inference.py $HOME/app/
15
  COPY --chown=user src $HOME/app/src
16
  COPY --chown=user config $HOME/app/config
17
  COPY --chown=user datasets $HOME/app/datasets
 
25
  EXPOSE 7860
26
 
27
  CMD ["sh", "-c", "uvicorn server:app --host 0.0.0.0 --port ${PORT:-7860}"]
 
README.md CHANGED
@@ -156,6 +156,34 @@ python scripts/run_openai_baseline.py --model gpt-5-nano
156
 
157
  The script is designed to stay bounded enough for a normal benchmark pass to finish comfortably under 20 minutes on a lightweight chat model, while still using the full fixed task set. For repeatability it fixes the benchmark graph/tasks and uses deterministic decoding settings. Because remote model backends can still change over time, the output artifact also records model metadata and system fingerprints when available.
158
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
  ## Docker And Hugging Face Space
160
 
161
  The repository is ready for a Docker-based Hugging Face Space:
@@ -179,6 +207,11 @@ The FastAPI app serves:
179
  - `/dashboard`: generated benchmark dashboard
180
  - `/api/environment`: environment metadata
181
  - `/healthz`: health check
 
 
 
 
 
182
 
183
  ## Automated Validation
184
 
 
156
 
157
  The script is designed to stay bounded enough for a normal benchmark pass to finish comfortably under 20 minutes on a lightweight chat model, while still using the full fixed task set. For repeatability it fixes the benchmark graph/tasks and uses deterministic decoding settings. Because remote model backends can still change over time, the output artifact also records model metadata and system fingerprints when available.
158
 
159
+ ## Inference Script
160
+
161
+ The submission-ready inference entrypoint is the root `inference.py` file. It talks to the deployed Hugging Face Space over HTTP, uses the OpenAI client for all model calls, and emits structured stdout logs in the `[START]`, `[STEP]`, and `[END]` format.
162
+
163
+ Required environment variables:
164
+
165
+ - `API_BASE_URL`
166
+ - `MODEL_NAME`
167
+ - `HF_TOKEN`
168
+
169
+ Optional environment variables:
170
+
171
+ - `SPACE_URL` default: `https://siddeshwar1625-osint.hf.space`
172
+ - `TASK_INDICES` default: `0,10,20`
173
+ - `MAX_STEPS` default: `8`
174
+
175
+ Example local test command against a running local server:
176
+
177
+ ```bash
178
+ API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-5.4-mini HF_TOKEN=your_key SPACE_URL=http://127.0.0.1:7860 python inference.py
179
+ ```
180
+
181
+ Example test command against the deployed Space:
182
+
183
+ ```bash
184
+ API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-5.4-mini HF_TOKEN=your_key SPACE_URL=https://siddeshwar1625-osint.hf.space python inference.py
185
+ ```
186
+
187
  ## Docker And Hugging Face Space
188
 
189
  The repository is ready for a Docker-based Hugging Face Space:
 
207
  - `/dashboard`: generated benchmark dashboard
208
  - `/api/environment`: environment metadata
209
  - `/healthz`: health check
210
+ - `/openenv.yaml`: OpenEnv HTTP spec stub
211
+ - `/openenv/tasks`: task enumeration
212
+ - `/openenv/reset`: episode reset endpoint
213
+ - `/openenv/step`: episode step endpoint
214
+ - `/openenv/state/{session_id}`: current session state endpoint
215
 
216
  ## Automated Validation
217
 
inference.py ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import os
5
+ from typing import Any
6
+
7
+ import requests
8
+ from openai import OpenAI
9
+ from requests import RequestException
10
+
11
+ from osint_env.baselines.openai_runner import SYSTEM_PROMPT, build_action_tools
12
+
13
+
14
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
15
+ MODEL_NAME = os.getenv("MODEL_NAME", "gpt-5.4-mini")
16
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
17
+ SPACE_URL = os.getenv("SPACE_URL", "https://siddeshwar1625-osint.hf.space").rstrip("/")
18
+
19
+ MAX_STEPS = int(os.getenv("MAX_STEPS", "8"))
20
+ TEMPERATURE = float(os.getenv("TEMPERATURE", "0.0"))
21
+ MAX_TOKENS = int(os.getenv("MAX_TOKENS", "256"))
22
+ REQUEST_TIMEOUT = int(os.getenv("REQUEST_TIMEOUT", "90"))
23
+ TASK_INDICES = [int(part.strip()) for part in os.getenv("TASK_INDICES", "0,10,20").split(",") if part.strip()]
24
+ SUCCESS_SCORE_THRESHOLD = float(os.getenv("SUCCESS_SCORE_THRESHOLD", "0.67"))
25
+
26
+ BENCHMARK = "osint-openenv"
27
+ TASK_NAME = "fixed_levels_easy_mid_hard"
28
+
29
+
30
+ def log_start(task: str, env: str, model: str) -> None:
31
+ print(f"[START] task={task} env={env} model={model}", flush=True)
32
+
33
+
34
+ def log_step(step: int, action: dict[str, Any], reward: float, done: bool, error: str | None) -> None:
35
+ action_text = json.dumps(action, sort_keys=True, separators=(",", ":"))
36
+ error_text = "null" if error is None else json.dumps(error)
37
+ print(
38
+ f"[STEP] step={step} action={action_text} reward={reward:.4f} done={str(bool(done)).lower()} error={error_text}",
39
+ flush=True,
40
+ )
41
+
42
+
43
+ def log_end(success: bool, steps: int, score: float, rewards: list[float]) -> None:
44
+ rewards_text = json.dumps([round(value, 4) for value in rewards], separators=(",", ":"))
45
+ print(
46
+ f"[END] success={str(bool(success)).lower()} steps={steps} score={score:.4f} rewards={rewards_text}",
47
+ flush=True,
48
+ )
49
+
50
+
51
+ def _supports_reasoning_effort_in_chat_completions(model: str) -> bool:
52
+ model_name = str(model).strip().lower()
53
+ if model_name.startswith("gpt-5.4-mini"):
54
+ return False
55
+ return model_name.startswith("gpt-5")
56
+
57
+
58
+ def _request_kwargs(messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> dict[str, Any]:
59
+ kwargs: dict[str, Any] = {
60
+ "model": MODEL_NAME,
61
+ "messages": messages,
62
+ "tools": tools,
63
+ "tool_choice": "required",
64
+ "parallel_tool_calls": False,
65
+ }
66
+ if MODEL_NAME.strip().lower().startswith("gpt-5"):
67
+ kwargs["max_completion_tokens"] = MAX_TOKENS
68
+ if _supports_reasoning_effort_in_chat_completions(MODEL_NAME):
69
+ kwargs["reasoning_effort"] = "none"
70
+ else:
71
+ kwargs["temperature"] = TEMPERATURE
72
+ kwargs["max_tokens"] = MAX_TOKENS
73
+ return kwargs
74
+
75
+
76
+ def _message_text(message: Any) -> str:
77
+ content = getattr(message, "content", "")
78
+ if isinstance(content, str):
79
+ return content
80
+ if isinstance(content, list):
81
+ parts: list[str] = []
82
+ for item in content:
83
+ if isinstance(item, dict) and item.get("type") == "text":
84
+ parts.append(str(item.get("text", "")))
85
+ return "\n".join(part for part in parts if part)
86
+ return str(content or "")
87
+
88
+
89
+ def _space_get(path: str) -> dict[str, Any]:
90
+ response = requests.get(f"{SPACE_URL}{path}", timeout=REQUEST_TIMEOUT)
91
+ response.raise_for_status()
92
+ return response.json()
93
+
94
+
95
+ def _space_post(path: str, payload: dict[str, Any]) -> dict[str, Any]:
96
+ response = requests.post(f"{SPACE_URL}{path}", json=payload, timeout=REQUEST_TIMEOUT)
97
+ response.raise_for_status()
98
+ return response.json()
99
+
100
+
101
+ def _decode_action(tool_name: str, args: dict[str, Any]) -> dict[str, Any]:
102
+ if tool_name == "submit_answer":
103
+ return {"action_type": "ANSWER", "payload": {"answer": str(args.get("answer", "")).strip()}}
104
+ if tool_name == "add_edge":
105
+ return {
106
+ "action_type": "ADD_EDGE",
107
+ "payload": {
108
+ "src": str(args.get("src", "")).strip(),
109
+ "rel": str(args.get("rel", "")).strip(),
110
+ "dst": str(args.get("dst", "")).strip(),
111
+ "confidence": float(args.get("confidence", 1.0)),
112
+ },
113
+ }
114
+ return {"action_type": "CALL_TOOL", "payload": {"tool_name": tool_name, "args": dict(args)}}
115
+
116
+
117
+ def get_model_action(client: OpenAI, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> tuple[dict[str, Any], dict[str, Any]]:
118
+ try:
119
+ completion = client.chat.completions.create(**_request_kwargs(messages, tools))
120
+ message = completion.choices[0].message
121
+ tool_calls = list(message.tool_calls or [])
122
+ if not tool_calls:
123
+ fallback_answer = _message_text(message).strip() or "unknown"
124
+ return {"action_type": "ANSWER", "payload": {"answer": fallback_answer}}, {
125
+ "role": "assistant",
126
+ "content": _message_text(message),
127
+ }
128
+ tool_call = tool_calls[0]
129
+ try:
130
+ args = json.loads(tool_call.function.arguments or "{}")
131
+ except json.JSONDecodeError:
132
+ args = {}
133
+ if not isinstance(args, dict):
134
+ args = {}
135
+ assistant_message = {
136
+ "role": "assistant",
137
+ "content": _message_text(message),
138
+ "tool_calls": [
139
+ {
140
+ "id": tool_call.id,
141
+ "type": "function",
142
+ "function": {
143
+ "name": str(tool_call.function.name),
144
+ "arguments": json.dumps(args, sort_keys=True),
145
+ },
146
+ }
147
+ ],
148
+ }
149
+ return _decode_action(str(tool_call.function.name), args), assistant_message
150
+ except Exception as exc:
151
+ print(f"[DEBUG] Model request failed: {exc}", flush=True)
152
+ return {"action_type": "ANSWER", "payload": {"answer": "unknown"}}, {"role": "assistant", "content": ""}
153
+
154
+
155
+ def main() -> None:
156
+ if not HF_TOKEN:
157
+ raise SystemExit("HF_TOKEN is required.")
158
+
159
+ try:
160
+ ping = _space_get("/healthz")
161
+ if ping.get("status") != "ok":
162
+ raise SystemExit(f"Unexpected healthz payload: {ping}")
163
+ except RequestException as exc:
164
+ raise SystemExit(f"Space ping failed: {exc}") from exc
165
+
166
+ client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN, timeout=REQUEST_TIMEOUT)
167
+ tools = build_action_tools()
168
+
169
+ history: list[str] = []
170
+ rewards: list[float] = []
171
+ task_scores: list[float] = []
172
+ steps_taken = 0
173
+
174
+ log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
175
+
176
+ for task_index in TASK_INDICES:
177
+ result = _space_post("/openenv/reset", {"task_index": task_index})
178
+ session_id = str(result["session_id"])
179
+ done = bool(result.get("done", False))
180
+ messages: list[dict[str, Any]] = [
181
+ {"role": "system", "content": SYSTEM_PROMPT},
182
+ {
183
+ "role": "user",
184
+ "content": json.dumps(result["observation"], indent=2, sort_keys=True),
185
+ },
186
+ ]
187
+
188
+ for local_step in range(1, MAX_STEPS + 1):
189
+ if done:
190
+ break
191
+ action, assistant_message = get_model_action(client, messages, tools)
192
+ error = None
193
+ try:
194
+ result = _space_post(
195
+ "/openenv/step",
196
+ {
197
+ "session_id": session_id,
198
+ "action_type": action["action_type"],
199
+ "payload": action["payload"],
200
+ },
201
+ )
202
+ except RequestException as exc:
203
+ error = str(exc)
204
+ result = _space_get(f"/openenv/state/{session_id}")
205
+ reward = float(result.get("reward", 0.0) or 0.0)
206
+ done = bool(result.get("done", False))
207
+ rewards.append(reward)
208
+ steps_taken += 1
209
+ log_step(step=steps_taken, action=action, reward=reward, done=done, error=error)
210
+ history.append(f"step={steps_taken} task_index={task_index} reward={reward:+.4f}")
211
+ messages.append(assistant_message)
212
+ messages.append(
213
+ {
214
+ "role": "tool",
215
+ "tool_call_id": "remote_step",
216
+ "content": json.dumps(result, sort_keys=True),
217
+ }
218
+ )
219
+ if done:
220
+ break
221
+
222
+ info = dict(result.get("info", {}))
223
+ task_answer = str(info.get("task_answer", ""))
224
+ agent_answer = str(info.get("agent_answer", ""))
225
+ task_scores.append(1.0 if agent_answer and agent_answer == task_answer else 0.0)
226
+
227
+ score = sum(task_scores) / max(1, len(task_scores))
228
+ success = score >= SUCCESS_SCORE_THRESHOLD
229
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
230
+
231
+
232
+ if __name__ == "__main__":
233
+ main()
openenv.yaml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: osint-openenv
2
+ version: 0.1.0
3
+ description: Synthetic OSINT benchmark environment exposed over HTTP.
4
+ transport:
5
+ type: http
6
+ base_path: /
7
+ endpoints:
8
+ health:
9
+ method: GET
10
+ path: /healthz
11
+ metadata:
12
+ method: GET
13
+ path: /api/environment
14
+ tasks:
15
+ method: GET
16
+ path: /openenv/tasks
17
+ reset:
18
+ method: POST
19
+ path: /openenv/reset
20
+ step:
21
+ method: POST
22
+ path: /openenv/step
23
+ state:
24
+ method: GET
25
+ path: /openenv/state/{session_id}
26
+ models:
27
+ action_space:
28
+ - CALL_TOOL
29
+ - ADD_EDGE
30
+ - ANSWER
31
+ observation_fields:
32
+ - tool_outputs
33
+ - graph_snapshot
34
+ - action_history
35
+ - task
server.py CHANGED
@@ -5,12 +5,22 @@ import os
5
  from collections import Counter
6
  from functools import lru_cache
7
  from pathlib import Path
 
8
  from typing import Any
 
9
 
10
- from fastapi import FastAPI
11
  from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
12
 
 
 
 
 
 
 
 
13
  from osint_env.config import clone_environment_config, load_seeding_config, load_shared_config
 
14
  from osint_env.env.environment import OSINTEnvironment
15
  from osint_env.eval.runner import run_evaluation
16
  from osint_env.llm import build_llm_client
@@ -25,6 +35,10 @@ SPACE_PORT = int(os.getenv("PORT", "7860"))
25
  SPACE_DASHBOARD = Path("artifacts/space_dashboard.html")
26
  LATEST_BASELINE_OUTPUT = Path("artifacts/baselines/openai_fixed_levels_latest.json")
27
  LATEST_EVALUATION_OUTPUT = Path("artifacts/latest_evaluation.json")
 
 
 
 
28
 
29
 
30
  def _load_json(path: Path) -> dict[str, Any] | None:
@@ -59,6 +73,67 @@ def _build_environment() -> OSINTEnvironment:
59
  return OSINTEnvironment(env_cfg, llm=llm)
60
 
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  @lru_cache(maxsize=1)
63
  def _base_environment_snapshot() -> dict[str, Any]:
64
  env = _build_environment()
@@ -271,11 +346,69 @@ def healthz() -> JSONResponse:
271
  return JSONResponse({"status": "ok"})
272
 
273
 
 
 
 
 
 
274
  @app.get("/api/environment")
275
  def environment_metadata() -> JSONResponse:
276
  return JSONResponse(_space_snapshot())
277
 
278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
279
  @app.get("/dashboard")
280
  def dashboard() -> FileResponse:
281
  snapshot = _space_snapshot()
 
5
  from collections import Counter
6
  from functools import lru_cache
7
  from pathlib import Path
8
+ from threading import Lock
9
  from typing import Any
10
+ from uuid import uuid4
11
 
12
+ from fastapi import FastAPI, HTTPException
13
  from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
14
 
15
+ from osint_env.api import (
16
+ OpenEnvActionRequest,
17
+ OpenEnvObservationModel,
18
+ OpenEnvResetRequest,
19
+ OpenEnvResponseEnvelope,
20
+ OpenEnvTaskSummary,
21
+ )
22
  from osint_env.config import clone_environment_config, load_seeding_config, load_shared_config
23
+ from osint_env.domain.models import Action, ActionType
24
  from osint_env.env.environment import OSINTEnvironment
25
  from osint_env.eval.runner import run_evaluation
26
  from osint_env.llm import build_llm_client
 
35
  SPACE_DASHBOARD = Path("artifacts/space_dashboard.html")
36
  LATEST_BASELINE_OUTPUT = Path("artifacts/baselines/openai_fixed_levels_latest.json")
37
  LATEST_EVALUATION_OUTPUT = Path("artifacts/latest_evaluation.json")
38
+ OPENENV_SPEC_PATH = Path("openenv.yaml")
39
+
40
+ _SESSION_LOCK = Lock()
41
+ _SESSIONS: dict[str, OSINTEnvironment] = {}
42
 
43
 
44
  def _load_json(path: Path) -> dict[str, Any] | None:
 
73
  return OSINTEnvironment(env_cfg, llm=llm)
74
 
75
 
76
+ def _serialize_observation(observation: Any) -> OpenEnvObservationModel:
77
+ return OpenEnvObservationModel(
78
+ tool_outputs=list(observation.tool_outputs),
79
+ graph_snapshot=dict(observation.graph_snapshot),
80
+ action_history=list(observation.action_history),
81
+ task=dict(observation.task),
82
+ )
83
+
84
+
85
+ def _safe_session_info(info: dict[str, Any]) -> dict[str, Any]:
86
+ return {
87
+ "step_count": int(info.get("step_count", 0)),
88
+ "total_reward": float(info.get("total_reward", 0.0)),
89
+ "tool_calls": int(info.get("tool_calls", 0)),
90
+ "redundant_tool_calls": int(info.get("redundant_tool_calls", 0)),
91
+ "task_answer": str(info.get("task_answer", "")),
92
+ "agent_answer": "" if info.get("agent_answer") is None else str(info.get("agent_answer", "")),
93
+ "graph_f1": float(info.get("graph_f1", 0.0)),
94
+ "reward_components": dict(info.get("reward_components", {})),
95
+ }
96
+
97
+
98
+ def _task_summaries(env: OSINTEnvironment) -> list[OpenEnvTaskSummary]:
99
+ return [
100
+ OpenEnvTaskSummary(
101
+ task_id=task.task_id,
102
+ task_type=task.task_type,
103
+ question=task.question,
104
+ difficulty=str(task.metadata.get("difficulty", "unknown")),
105
+ )
106
+ for task in env.tasks
107
+ ]
108
+
109
+
110
+ def _resolve_task_index(env: OSINTEnvironment, request: OpenEnvResetRequest) -> int:
111
+ if request.task_index is not None:
112
+ task_index = int(request.task_index)
113
+ if task_index < 0 or task_index >= len(env.tasks):
114
+ raise HTTPException(status_code=400, detail=f"Invalid task_index {task_index}")
115
+ return task_index
116
+ if request.task_id:
117
+ for idx, task in enumerate(env.tasks):
118
+ if task.task_id == request.task_id:
119
+ return idx
120
+ raise HTTPException(status_code=400, detail=f"Unknown task_id {request.task_id}")
121
+ return 0
122
+
123
+
124
+ def _get_session_env(session_id: str) -> OSINTEnvironment:
125
+ with _SESSION_LOCK:
126
+ env = _SESSIONS.get(session_id)
127
+ if env is None:
128
+ raise HTTPException(status_code=404, detail=f"Unknown session_id {session_id}")
129
+ return env
130
+
131
+
132
+ def _store_session(session_id: str, env: OSINTEnvironment) -> None:
133
+ with _SESSION_LOCK:
134
+ _SESSIONS[session_id] = env
135
+
136
+
137
  @lru_cache(maxsize=1)
138
  def _base_environment_snapshot() -> dict[str, Any]:
139
  env = _build_environment()
 
346
  return JSONResponse({"status": "ok"})
347
 
348
 
349
+ @app.get("/openenv.yaml")
350
+ def openenv_spec() -> FileResponse:
351
+ return FileResponse(OPENENV_SPEC_PATH, media_type="text/yaml")
352
+
353
+
354
  @app.get("/api/environment")
355
  def environment_metadata() -> JSONResponse:
356
  return JSONResponse(_space_snapshot())
357
 
358
 
359
+ @app.get("/openenv/tasks", response_model=list[OpenEnvTaskSummary])
360
+ def openenv_tasks() -> list[OpenEnvTaskSummary]:
361
+ env = _build_environment()
362
+ return _task_summaries(env)
363
+
364
+
365
+ @app.post("/openenv/reset", response_model=OpenEnvResponseEnvelope)
366
+ def openenv_reset(request: OpenEnvResetRequest) -> OpenEnvResponseEnvelope:
367
+ env = _build_environment()
368
+ env._task_idx = _resolve_task_index(env, request)
369
+ observation = env.reset()
370
+ session_id = str(uuid4())
371
+ _store_session(session_id, env)
372
+ return OpenEnvResponseEnvelope(
373
+ session_id=session_id,
374
+ observation=_serialize_observation(observation),
375
+ reward=0.0,
376
+ done=False,
377
+ info=_safe_session_info(env._info()),
378
+ )
379
+
380
+
381
+ @app.post("/openenv/step", response_model=OpenEnvResponseEnvelope)
382
+ def openenv_step(request: OpenEnvActionRequest) -> OpenEnvResponseEnvelope:
383
+ env = _get_session_env(request.session_id)
384
+ try:
385
+ action_type = ActionType(str(request.action_type))
386
+ except ValueError as exc:
387
+ raise HTTPException(status_code=400, detail=f"Unsupported action_type {request.action_type}") from exc
388
+ observation, reward, done, info = env.step(Action(action_type, dict(request.payload)))
389
+ return OpenEnvResponseEnvelope(
390
+ session_id=request.session_id,
391
+ observation=_serialize_observation(observation),
392
+ reward=float(reward),
393
+ done=bool(done),
394
+ info=_safe_session_info(info),
395
+ )
396
+
397
+
398
+ @app.get("/openenv/state/{session_id}", response_model=OpenEnvResponseEnvelope)
399
+ def openenv_state(session_id: str) -> OpenEnvResponseEnvelope:
400
+ env = _get_session_env(session_id)
401
+ if env.state is None:
402
+ raise HTTPException(status_code=400, detail="Session has not been reset yet")
403
+ return OpenEnvResponseEnvelope(
404
+ session_id=session_id,
405
+ observation=_serialize_observation(env._observation()),
406
+ reward=0.0,
407
+ done=bool(env.state.done),
408
+ info=_safe_session_info(env._info()),
409
+ )
410
+
411
+
412
  @app.get("/dashboard")
413
  def dashboard() -> FileResponse:
414
  snapshot = _space_snapshot()
src/osint_env/api/__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from osint_env.api.models import (
2
+ OpenEnvActionRequest,
3
+ OpenEnvObservationModel,
4
+ OpenEnvResetRequest,
5
+ OpenEnvResponseEnvelope,
6
+ OpenEnvTaskSummary,
7
+ )
8
+
9
+ __all__ = [
10
+ "OpenEnvActionRequest",
11
+ "OpenEnvObservationModel",
12
+ "OpenEnvResetRequest",
13
+ "OpenEnvResponseEnvelope",
14
+ "OpenEnvTaskSummary",
15
+ ]
src/osint_env/api/models.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Any
4
+
5
+ from pydantic import BaseModel, Field
6
+
7
+
8
+ class OpenEnvTaskSummary(BaseModel):
9
+ task_id: str
10
+ task_type: str
11
+ question: str
12
+ difficulty: str = "unknown"
13
+
14
+
15
+ class OpenEnvObservationModel(BaseModel):
16
+ tool_outputs: list[dict[str, Any]]
17
+ graph_snapshot: dict[str, Any]
18
+ action_history: list[dict[str, Any]]
19
+ task: dict[str, Any]
20
+
21
+
22
+ class OpenEnvResetRequest(BaseModel):
23
+ task_id: str | None = None
24
+ task_index: int | None = None
25
+
26
+
27
+ class OpenEnvActionRequest(BaseModel):
28
+ session_id: str
29
+ action_type: str = Field(description="One of CALL_TOOL, ADD_EDGE, ANSWER.")
30
+ payload: dict[str, Any] = Field(default_factory=dict)
31
+
32
+
33
+ class OpenEnvResponseEnvelope(BaseModel):
34
+ session_id: str
35
+ observation: OpenEnvObservationModel
36
+ reward: float
37
+ done: bool
38
+ info: dict[str, Any]
src/osint_env/validation.py CHANGED
@@ -19,6 +19,7 @@ from osint_env.env.reward import compute_answer_reward
19
 
20
  README_PATH = Path("README.md")
21
  DOCKERFILE_PATH = Path("Dockerfile")
 
22
  SHARED_CONFIG_PATH = "datasets/fixed_levels/shared_config_fixed_levels.json"
23
  SEED_FILE_PATH = "datasets/fixed_levels/seed_fixed_levels.json"
24
 
@@ -46,15 +47,18 @@ def check_hf_space_readiness() -> ValidationResult:
46
  client = TestClient(app)
47
  health = client.get("/healthz")
48
  dashboard = client.get("/api/environment")
 
49
  passed = all(
50
  [
51
  README_PATH.exists(),
52
  DOCKERFILE_PATH.exists(),
 
53
  has_sdk,
54
  has_port,
55
  has_openenv_tag,
56
  health.status_code == 200,
57
  dashboard.status_code == 200,
 
58
  ]
59
  )
60
  return ValidationResult(
@@ -63,11 +67,13 @@ def check_hf_space_readiness() -> ValidationResult:
63
  details={
64
  "readme_exists": README_PATH.exists(),
65
  "dockerfile_exists": DOCKERFILE_PATH.exists(),
 
66
  "has_sdk_docker": has_sdk,
67
  "has_app_port": has_port,
68
  "has_openenv_tag": has_openenv_tag,
69
  "healthz_status": health.status_code,
70
  "environment_status": dashboard.status_code,
 
71
  },
72
  )
73
 
@@ -75,6 +81,17 @@ def check_hf_space_readiness() -> ValidationResult:
75
  def check_openenv_spec_compliance() -> ValidationResult:
76
  env = _build_environment()
77
  obs = env.reset()
 
 
 
 
 
 
 
 
 
 
 
78
  passed = all(
79
  [
80
  isinstance(env, Env),
@@ -86,6 +103,9 @@ def check_openenv_spec_compliance() -> ValidationResult:
86
  env.episode_max_length == env.config.max_steps,
87
  isinstance(obs.task, dict),
88
  "question" in obs.task,
 
 
 
89
  ]
90
  )
91
  return ValidationResult(
@@ -97,6 +117,9 @@ def check_openenv_spec_compliance() -> ValidationResult:
97
  "action_space": list(env.action_space),
98
  "episode_max_length": env.episode_max_length,
99
  "task_keys": sorted(obs.task.keys()),
 
 
 
100
  },
101
  )
102
 
 
19
 
20
  README_PATH = Path("README.md")
21
  DOCKERFILE_PATH = Path("Dockerfile")
22
+ OPENENV_SPEC_PATH = Path("openenv.yaml")
23
  SHARED_CONFIG_PATH = "datasets/fixed_levels/shared_config_fixed_levels.json"
24
  SEED_FILE_PATH = "datasets/fixed_levels/seed_fixed_levels.json"
25
 
 
47
  client = TestClient(app)
48
  health = client.get("/healthz")
49
  dashboard = client.get("/api/environment")
50
+ spec = client.get("/openenv.yaml")
51
  passed = all(
52
  [
53
  README_PATH.exists(),
54
  DOCKERFILE_PATH.exists(),
55
+ OPENENV_SPEC_PATH.exists(),
56
  has_sdk,
57
  has_port,
58
  has_openenv_tag,
59
  health.status_code == 200,
60
  dashboard.status_code == 200,
61
+ spec.status_code == 200,
62
  ]
63
  )
64
  return ValidationResult(
 
67
  details={
68
  "readme_exists": README_PATH.exists(),
69
  "dockerfile_exists": DOCKERFILE_PATH.exists(),
70
+ "openenv_spec_exists": OPENENV_SPEC_PATH.exists(),
71
  "has_sdk_docker": has_sdk,
72
  "has_app_port": has_port,
73
  "has_openenv_tag": has_openenv_tag,
74
  "healthz_status": health.status_code,
75
  "environment_status": dashboard.status_code,
76
+ "openenv_spec_status": spec.status_code,
77
  },
78
  )
79
 
 
81
  def check_openenv_spec_compliance() -> ValidationResult:
82
  env = _build_environment()
83
  obs = env.reset()
84
+ client = TestClient(app)
85
+ reset = client.post("/openenv/reset", json={"task_index": 0})
86
+ step = client.post(
87
+ "/openenv/step",
88
+ json={
89
+ "session_id": reset.json()["session_id"] if reset.status_code == 200 else "",
90
+ "action_type": "ANSWER",
91
+ "payload": {"answer": "unknown"},
92
+ },
93
+ )
94
+ state = client.get(f"/openenv/state/{reset.json()['session_id']}") if reset.status_code == 200 else None
95
  passed = all(
96
  [
97
  isinstance(env, Env),
 
103
  env.episode_max_length == env.config.max_steps,
104
  isinstance(obs.task, dict),
105
  "question" in obs.task,
106
+ reset.status_code == 200,
107
+ step.status_code == 200,
108
+ state is not None and state.status_code == 200,
109
  ]
110
  )
111
  return ValidationResult(
 
117
  "action_space": list(env.action_space),
118
  "episode_max_length": env.episode_max_length,
119
  "task_keys": sorted(obs.task.keys()),
120
+ "reset_status": reset.status_code,
121
+ "step_status": step.status_code,
122
+ "state_status": 0 if state is None else state.status_code,
123
  },
124
  )
125
 
tests/test_server.py CHANGED
@@ -24,6 +24,45 @@ def test_server_environment_metadata():
24
  assert "summary" in body
25
 
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  def test_space_snapshot_prefers_newer_evaluation_payload(tmp_path, monkeypatch):
28
  baseline_path = tmp_path / "baseline.json"
29
  evaluation_path = tmp_path / "evaluation.json"
 
24
  assert "summary" in body
25
 
26
 
27
+ def test_openenv_spec_and_tasks_endpoints():
28
+ spec = client.get("/openenv.yaml")
29
+ assert spec.status_code == 200
30
+ assert "reset" in spec.text
31
+
32
+ tasks = client.get("/openenv/tasks")
33
+ assert tasks.status_code == 200
34
+ body = tasks.json()
35
+ assert len(body) >= 3
36
+ assert {"task_id", "task_type", "question", "difficulty"} <= set(body[0].keys())
37
+
38
+
39
+ def test_openenv_reset_step_and_state_cycle():
40
+ reset = client.post("/openenv/reset", json={"task_index": 0})
41
+ assert reset.status_code == 200
42
+ body = reset.json()
43
+ session_id = body["session_id"]
44
+ assert body["done"] is False
45
+ assert "question" in body["observation"]["task"]
46
+
47
+ state = client.get(f"/openenv/state/{session_id}")
48
+ assert state.status_code == 200
49
+ assert state.json()["session_id"] == session_id
50
+
51
+ step = client.post(
52
+ "/openenv/step",
53
+ json={
54
+ "session_id": session_id,
55
+ "action_type": "ANSWER",
56
+ "payload": {"answer": "unknown"},
57
+ },
58
+ )
59
+ assert step.status_code == 200
60
+ step_body = step.json()
61
+ assert step_body["session_id"] == session_id
62
+ assert step_body["done"] is True
63
+ assert "task_answer" in step_body["info"]
64
+
65
+
66
  def test_space_snapshot_prefers_newer_evaluation_payload(tmp_path, monkeypatch):
67
  baseline_path = tmp_path / "baseline.json"
68
  evaluation_path = tmp_path / "evaluation.json"