kdcyberdude commited on
Commit
9eebce3
·
verified ·
1 Parent(s): b6873b7

Upload folder using huggingface_hub

Browse files
BUILD_NOTES.md CHANGED
@@ -192,6 +192,24 @@ When in doubt: check the endpoint schema returned by search_endpoints() — it s
192
 
193
  ---
194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
  ## Non-Issues (Resolved in Design)
196
 
197
  - ~~`store_finding` / `get_findings` tools~~ — **Removed**. Value threading happens through episode `history`.
 
192
 
193
  ---
194
 
195
+ ### 10. HAR Is the Agent's Only API Knowledge Source — No Catalog Fallback
196
+
197
+ **Status:** Design decision, locked
198
+ **Detail:** The `browser_agent` tool uses **only the HAR file** to build the agent's endpoint index and embeddings. The API catalogs (`catalogs/*.json`) are used exclusively by the judge for parameter-sourcing grading — they play no role in the training loop.
199
+
200
+ If a HAR yields very few endpoints, **the HAR recording needs to be improved**, not the code. The product does not patch sparse recordings by injecting catalog data into the agent's search corpus. This is intentional: the RL challenge is for the agent to discover and use APIs it has actually observed, not a curated ground-truth list.
201
+
202
+ **What goes where:**
203
+
204
+ | Data source | Who uses it | How |
205
+ |---|---|---|
206
+ | `hars/*.har` | Agent only | `browser_agent` → `search_endpoints` semantic search |
207
+ | `catalogs/*.json` | Judge only | Parameter-sourcing grading (`judge.py`) |
208
+
209
+ **Do not add catalog augmentation back** to `browser_agent.py` or `search_endpoints.py` under any circumstances. If the embed cache shows a large number of entries (e.g. 503 instead of 1), it means catalog entries leaked into the agent — clear the cache and fix the source.
210
+
211
+ ---
212
+
213
  ## Non-Issues (Resolved in Design)
214
 
215
  - ~~`store_finding` / `get_findings` tools~~ — **Removed**. Value threading happens through episode `history`.
Dockerfile CHANGED
@@ -67,6 +67,9 @@ COPY --from=builder /app/env /app/env
67
  # Set PATH to use the virtual environment
68
  ENV PATH="/app/.venv/bin:$PATH"
69
 
 
 
 
70
  # Set PYTHONPATH so imports work correctly
71
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
72
 
@@ -76,5 +79,4 @@ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
 
77
  # Run the FastAPI server
78
  # The module path is constructed to work with the /app/env structure
79
- ENV ENABLE_WEB_INTERFACE=true
80
  CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
 
67
  # Set PATH to use the virtual environment
68
  ENV PATH="/app/.venv/bin:$PATH"
69
 
70
+ # Enable Gradio web UI for manual testing
71
+ ENV ENABLE_WEB_INTERFACE=true
72
+
73
  # Set PYTHONPATH so imports work correctly
74
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
75
 
 
79
 
80
  # Run the FastAPI server
81
  # The module path is constructed to work with the /app/env structure
 
82
  CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
README.md CHANGED
@@ -15,38 +15,47 @@ base_path: /web
15
 
16
  # HARvestGym
17
 
18
- *Core idea: Trains LLMs to reverse-engineer and complete web tasks through raw HTTP APIs. No browser. No docs. Just a URL and a task.*
19
 
20
- ### Can a small model learn to explore the API surface of any web application — and complete real tasks through those APIs, without ever opening a browser?
21
-
22
- Web applications are full of APIs. Every click in a browser triggers an HTTP call with a precise schema, a specific authentication header, an exact sequence of prerequisites. **HARvestGym trains a small model to do all of that directly** — given a task and a URL, it discovers the relevant endpoints, understands what each one needs, chains the calls in the right order, and completes the task without any browser.
23
 
24
  The model starts with nothing: no schema, no documentation, no endpoint list. It uses tools to explore — issuing requests, inspecting responses, building up its own understanding of how the application works. This is what a developer does when they reverse-engineer an API. The model learns to do the same.
25
 
26
- Given a URL and a task string, the agent must discover which endpoints exist, figure out schemas and parameter dependencies, and execute the right sequence. Zero prior knowledge.
27
 
28
- ## What the Model (Policy) Is Learning
29
 
30
- Given: a natural language task + a live web application URL. No prior knowledge of the application.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- The model calls `browser_agent` first — this returns the list of API endpoints the browser used to complete the task. The model now has a map: it knows what endpoints exist. What it does not know:
33
 
34
  - which of those endpoints are actually needed for this specific task
35
  - in what order they must be called (you cannot add to a cart before the cart exists)
36
  - where each required parameter value comes from
37
  - how to re-authenticate if a session expires mid-episode
38
 
39
- The model must learn to:
40
-
41
- 1. **Discover endpoints** — by using a browser agent tool that completes the same task in a real browser while recording all network traffic, then filtering that traffic to extract only the meaningful application API calls (stripping out CDN requests, analytics, static assets). The browser agent runs once and generates the raw discovery data; the model uses this as its starting context.
42
- 2. **Select the right endpoints** — from the browser agent's list, identify the subset relevant to the current task (not every observed endpoint is needed)
43
- 3. **Sequence calls correctly** — determine the prerequisite order (create cart → find product → add item), including calls that must happen before others even though the task description doesn't say so
44
- 4. **Thread parameters** — this is the hardest part. APIs form a dependency graph:
45
- - Some values come from a previous response (`cart_id` from step 1 → path param in step 3)
46
- - Some values come from the authentication flow (`form_key`, `Bearer token` → header in every subsequent call)
47
- - Some values come from the task description (`product name` → search query → `sku` → body of add-item call)
48
- - The ground truth catalog defines these relationships precisely; the model learns to navigate them
49
- 5. **Handle auth and errors** — detect 401 / session-expired responses, re-authenticate, and continue; interpret 4xx errors and adjust the next call accordingly
50
 
51
  ---
52
 
@@ -61,275 +70,138 @@ The model must learn to:
61
  │ ▼ │
62
  │ ┌────────────────────────────────────────────────────────────────┐ │
63
  │ │ Policy Model (RL Agent) │ │
64
- │ │ small model — no prior knowledge of the app │ │
65
  │ │ │ │
66
- │ │ Observation: task + history + session_state + last_result │ │
67
  │ │ │ │
68
- │ │ Step 1 ──► browser_agent(task, url) │ │
69
- │ │ Step 2+ ──► search_endpoints(query) │ │
70
- │ │ ──► curl_exec(command) │ │
71
- │ │ ──► search_episode_data(query) │ │
72
- │ │ ──► done(result) │ │
73
  │ └────────┬───────────────────────────────────────────────────────┘ │
74
  │ │ │
75
- │ ┌──────┴──────────────────────────────┐
76
- │ │ │
77
- │ ▼ ▼
78
- │ ┌─────────────────────┐ ┌─────────────────────────────────────┐
79
- │ │ Browser Agent │ │ Environment │
80
- │ │ (step 1 only) │ │ │
81
- │ │ │ │ • Executes curl_exec via subprocess│
82
- │ │ Training: │ │ • Auto-injects session cookies │
83
- │ │ Load pre-recorded │ │ • Smart-truncates response bodies │
84
- │ │ cached HAR from │ │ • Indexes full responses into │
85
- │ │ disk or launch │ │ per-episode BM25 + GEMMA store │
86
- │ │ on real browser │ │ • Manages session_state: cookies, │
87
- │ │ │ │ CSRF tokens, auth headers │
88
- │ │ Inference: │ ──────────────┬──────────────────────┘
89
- │ │ Launch real browser│ │
90
- │ │ via Playwright + │ │ HTTP calls (always live)
91
- │ │ bu-30b-a3b-preview │ ▼
92
- │ │ │ ┌─────────────────────────────────────┐
93
- │ │ Both paths produce: │ │ WebArena EC2 (live apps) │
94
- │ │ • Filtered HAR │ │ │
95
- │ │ • OpenAPI-like spec│ │ :7770 Shopping (Magento 2) │
96
- │ │ • GEMMA embeddings │ │ :7780 Shopping Admin │
97
- │ │ for search_ │ │ :9999 Forum (Postmill) │
98
- │ │ endpoints() │ │ :8888 Wikipedia (Kiwix)
99
- │ └─────────────────────┘ │ :3000 Map (OpenStreetMap)
100
- │ └──────────────┬──────────────────────┘
101
- │ │
102
- │ │ episode trajectory
103
- │ ▼
104
- │ ┌─────────────────────────────────────┐
105
- │ │ Deterministic Judge │
106
- │ │ │
107
- │ │ Per-template programmatic grader: │
108
- │ │ • Inspects episode trajectory │
109
- │ │ • Optionally probes live app state │
110
- │ │ • Verifies parameter sourcing │
111
- │ │ (TASK_SPEC / PREV_CALL / │
112
- │ │ AUTH_FLOW / STATIC / DERIVED)
113
- │ │ • Scores [0.0 → 1.0]
114
- │ └──────────────┬──────────────────────┘
115
- │ │
116
- │ ▼
117
- │ ┌─────────────────────────────────────┐
118
- │ │ Reward Signal │
119
- │ │ │
120
- │ │ Per-step: │
121
- │ │ +0.2 valid API call (2xx) │
122
- │ │ +0.1 new path explored │
123
- │ │ +0.25 correct param sourcing │
124
- │ │ −0.15 repeated identical call │
125
- │ │ −0.3 browser_agent called again │
126
- │ │ │
127
- │ │ Episode end: │
128
- │ │ +2.0–+5.0 task complete (easy→hard│
129
- │ │ −1.5 task failed │
130
- │ └──────────────┬──────────────────────┘
131
- │ │
132
- │ ▼
133
- │ ┌─────────────────────────────────────┐
134
- │ │ GRPO (via HF TRL) │
135
- │ │ │
136
- │ │ 8 parallel rollouts per prompt │
137
- │ │ Computes advantages without │
138
- │ │ a value function │
139
- │ │ Updates policy weights │
140
- │ └────────────────────────────────────
141
- │ │
142
- │ └──► updated Policy Model
143
  └─────────────────────────────────────────────────────────────────────────┘
144
  ```
145
 
146
- ### Data Flow: Browser Agent → Search Index → Execution
147
-
148
- ```
149
- HAR File (cached using Browser Agent) ──► filter_har_entries()
150
-
151
-
152
- drop: CDN, analytics, static assets
153
- keep: {method, path, request_body,
154
- response_body, status_code}
155
-
156
-
157
- extract_openapi_spec()
158
- → structured endpoint catalog
159
- {path, method, params, auth, response_fields}
160
-
161
- ┌──────┴──────┐
162
- │ │
163
- ▼ ▼
164
- build_GEMMA_embeddings return summary list
165
- (search_endpoints to RL agent:
166
- index — full schemas) [GET /products,
167
- POST /guest-carts, ...]
168
-
169
-
170
- search_endpoints("create guest cart")
171
- → top-3 endpoint schemas with:
172
- • path params + sources
173
- • body params + sources
174
- • auth requirements
175
- • response field names
176
- ```
177
-
178
- ### Episode Response Indexing
179
-
180
- ```
181
- curl_exec(command)
182
-
183
- ├──► subprocess: execute against live EC2
184
-
185
- ├──► index_full_response()
186
- │ BM25 index ── keyword match (IDs, SKUs, tokens)
187
- │ GEMMA embed ── semantic match (paraphrases)
188
- │ (indexes BEFORE truncation — all items stored)
189
-
190
- └──► smart_truncate()
191
- non-JSON HTML → 3,000 chars
192
- JSON primitive → never truncated
193
- error (4xx/5xx) → never truncated
194
- small JSON → returned as-is
195
- large array → first 2 items shown
196
- + _list_truncated annotation
197
- + hint to call search_episode_data()
198
- ```
199
-
200
- ### Parameter Dependency Graph (what the judge tracks)
201
-
202
- ```
203
- Task: "Add 'Radiant Tee' to a guest cart"
204
-
205
- ┌─────────────────────────────────────────────────────────┐
206
- │ TASK_SPEC ──────────────────────────────────────────┐ │
207
- │ "Radiant Tee" (product name) │ │
208
- │ │ │ │
209
- │ ▼ │ │
210
- │ GET /rest/V1/products?name=Radiant+Tee │ │
211
- │ → items[0].sku = "MH01" (PREV_CALL) ──┐ │ │
212
- │ │ │ │
213
- │ POST /rest/V1/guest-carts │ │ │
214
- │ → body = "cart-abc123" (PREV_CALL) ──┼──┼─►│
215
- │ │ │ │
216
- │ POST /rest/V1/guest-carts/{cartId}/items │ │ │
217
- │ path: cartId ◄────── "cart-abc123" ───────┘ │ │
218
- │ body: sku ◄────── "MH01" ─────────┘ │
219
- │ body: qty ◄────── TASK_SPEC (quantity) │
220
- │ body: quote_id ◄────── DERIVED (= cartId) │
221
- └─────────────────────────────────────────────────────────┘
222
-
223
- Source types tracked by the judge:
224
- TASK_SPEC — value stated in the task string
225
- PREV_CALL — value from a prior curl response in this episode
226
- AUTH_FLOW — value from a session/token auth step
227
- STATIC — fixed application constant (e.g. store_id = 1)
228
- DERIVED — computed from another param (e.g. quote_id = cart_id)
229
- ```
230
-
231
- ### Curriculum: Complexity Tiers
232
-
233
- ```
234
- Easy ──────────────────────── graduate when P(success) > 0.7
235
- │ Single call, no auth │
236
- │ Templates 1, 2 │
237
- │ 1 API call required │
238
- │ ▼
239
- Medium ──────────────────────── graduate when P(success) > 0.7
240
- │ Auth + 1–2 dependent calls │
241
- │ Templates 3, 4 │
242
- │ 2–3 API calls required │
243
- │ ▼
244
- Hard ────────────────────────── final tier
245
- Multi-step chain, full auth, ID threading
246
- Templates 5, 6, 7
247
- 4–8+ API calls required
248
- Reward scaling: ×2.5 vs Easy
249
- ```
250
-
251
- ### The RL Agent's Tool: Browser Agent
252
-
253
- The RL agent has access to a **browser agent tool** powered by `[browser-use/bu-30b-a3b-preview](https://huggingface.co/browser-use/bu-30b-a3b-preview)` — a 30B MoE vision-language model (3B active parameters) purpose-built for web task completion, served via the [browser-use](https://github.com/browser-use/browser-use) library on Playwright. When the RL agent calls this tool with a natural language task, the browser agent:
254
-
255
- 1. Opens the target application in a real browser
256
- 2. Completes the task by clicking, typing, and navigating — exactly as a human would
257
- 3. All HTTP traffic is intercepted via Playwright network events
258
- 4. Returns the intercepted traffic, filtered down to only the application's own API calls
259
-
260
- The filtering step strips analytics pings, CDN requests, font loads, JS/CSS bundles and returns only `{method, path, request_body, response_body, status_code}` tuples for the app's actual API endpoints.
261
-
262
- **Training vs. inference — what gets cached:**
263
-
264
- - The browser agent output (filtered endpoint list) is pre-computed once per task and cached. During training, the RL model receives this cached result instantly — no live browser session runs.
265
- - The RL agent's own `curl_exec` calls **always hit the real live WebArena server** — during both training and inference. No API response is mocked or cached.
266
- - At inference, the browser agent runs live to handle novel tasks or changed application state.
267
-
268
- Full architecture and code: `[BROWSER_AGENT.md](BROWSER_AGENT.md)`
269
-
270
- ### Ground Truth: From the Codebase, Not the Browser
271
 
272
- The browser agent shows *what* API calls happen. It does not explain *why* — specifically, it does not document where each parameter comes from or what field constraints exist. That comes from the application codebase.
273
 
274
- For each WebArena application, we perform a one-time static analysis (using a large model against the Docker image source) to produce a **ground truth API catalog** a precise, hard-coded document specifying:
275
 
276
- ```
277
- endpoint: POST /rest/V1/guest-carts/{cartId}/items
278
- method: POST
279
- auth: None (guest cart)
280
- path_params:
281
- cartId: [string] obtained from: POST /rest/V1/guest-carts response body
282
- body:
283
- cartItem.sku: [string] the product's SKU, from: GET /rest/V1/products → items[].sku
284
- cartItem.qty: [number] quantity, from: task specification
285
- cartItem.quote_id: [string] same as cartId
286
- ```
287
 
288
- This is what the judge compares against. The ground truth defines the complete parameter relationship graph for each application.
289
 
290
- Full extraction process: `[GROUND_TRUTH_EXTRACTION.md](GROUND_TRUTH_EXTRACTION.md)`
291
 
292
- ### The Training Loop
293
 
294
- ```
295
- Task (natural language) + App URL
296
-
297
-
298
- Policy Model (sees: task + history of all prior actions/results + session_state + findings)
299
- │ calls tools to explore and execute
300
- ├─► browser_agent(task, url) → filtered API call list (cached during training)
301
- ├─► search_endpoints(query) → full schema for a specific endpoint
302
- ├─► curl_exec(command) → execute HTTP call, get {status, headers, body}
303
- ├─► search_episode_data(q) → search prior response bodies in this episode
304
- └─► done(result) → declare task complete
305
-
306
-
307
- Live WebArena App (EC2) ←─── real HTTP responses (always live, never mocked)
308
-
309
-
310
- Judge (compares against ground truth API catalog)
311
-
312
-
313
- Reward Signal ──► GRPO ──► updated policy
314
- ```
315
 
316
- ---
317
 
318
- ## Target Applications
 
 
 
 
319
 
320
- All running on a single AWS EC2 instance. Real production software, no simulation.
321
 
 
322
 
323
- | App | Port | URL | Software |
324
- | -------------- | ---- | -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
325
- | Shopping | 7770 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/) | Magento 2 open-source e-commerce platform |
326
- | Shopping Admin | 7780 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7780/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7780/) | Magento 2 Admin backend panel for the same store |
327
- | Forum | 9999 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:9999/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:9999/) | Postmill open-source Reddit-like link aggregation forum |
328
- | Wikipedia | 8888 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:8888/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:8888/) | Kiwix read-only offline mirror of English Wikipedia |
329
- | Map | 3000 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:3000/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:3000/) | OpenStreetMap open-source collaborative mapping platform |
 
 
330
 
 
331
 
332
- Source: [WebArena environment_docker](https://github.com/web-arena-x/webarena/tree/main/environment_docker)
 
 
 
333
 
334
  ---
335
 
@@ -343,170 +215,57 @@ What the model sees at each step:
343
  class Observation(BaseModel):
344
  task: str # Natural language task
345
  app_base_url: str # Root URL of the target application
346
- last_tool_result: Any # Result of last tool call:
347
- # search_endpoints list of endpoint schema strings
348
- # curl_exec {status_code, headers, body (smart-truncated)}
349
- # search_episode_data → list of matching JSON object strings
350
- history: list[dict] # Full episode trajectory: list of {action, tool_result} pairs
351
- # from all prior steps. The model sees what it already tried,
352
- # enabling value threading (read a cart_id from step 2's response
353
- # and use it in step 5's curl call) and loop avoidance.
354
- session_state: dict # Auto-managed by environment: cookies, tokens, CSRF values
355
- # extracted from all prior HTTP Set-Cookie and response bodies
356
- # e.g. {"PHPSESSID": "abc", "form_key": "xyz", "cart_id": "123"}
357
  step_count: int
358
- max_steps: int # 20
359
- ```
360
-
361
- `session_state` is maintained by the environment. The model never parses `Set-Cookie` headers — the environment extracts tokens automatically and makes them available. The model decides *when* to authenticate and *which* session values to use; the environment handles *extraction*.
362
-
363
- **curl execution:** The agent outputs a curl command string. The environment parses it and executes it via subprocess against the live EC2 server — the agent machine never has a direct network connection to WebArena. The environment also injects cookies from `session_state` automatically before each call.
364
-
365
- **Response truncation — smart array truncation, not byte cutoff:** HTTP response bodies are processed by a pure Python function before being returned to the model. Rules applied in order:
366
-
367
- 1. **Non-JSON body** (HTML, CSS, JS, plain text): truncate to 3,000 characters. HTML from form-serving pages (login, post creation) is kept longer than pure prose because CSRF tokens and `<input>` fields are embedded inside the markup and the model needs to locate them. See the [HTML / Form-Submission Handling](#html--form-submission-handling) section below for how the model is expected to work with HTML responses.
368
- 2. **JSON primitive** (string, number, boolean): never truncated — these are tokens, IDs, confirmations.
369
- 3. **Error response (4xx / 5xx)**: never truncated — the model needs every word to self-correct.
370
- 4. **JSON object or array with no large arrays** (< 3 dict items per array): returned as-is.
371
- 5. **JSON with a large array field** (≥ 3 dict items): keep first 2 items, drop the rest, and add a `_list_truncated` annotation:
372
-
373
- ```json
374
- {
375
- "items": [
376
- {"sku": "MH01", "name": "Radiant Tee", "price": 22.0},
377
- {"sku": "MH02", "name": "Breathe-Easy Tank", "price": 34.0}
378
- ],
379
- "_list_truncated": {
380
- "field": "items",
381
- "shown": 2,
382
- "total": 50,
383
- "note": "Showing 2 of 50 items. Use search_episode_data() to find a specific item from this response."
384
- }
385
- }
386
- ```
387
-
388
- **Episode response indexing:** Every `curl_exec` call indexes the full request and response bodies into a per-episode hybrid index (BM25 for keyword matching + GEMMA semantic embeddings for paraphrase handling). When a list is truncated, all items (not just the 2 shown) are indexed. The model can retrieve any specific object using `search_episode_data("keyword or natural language query")` without needing a filtered API endpoint to exist. See `TOOLS.md` for the full indexing algorithm.
389
-
390
- ### Action Space
391
-
392
- The model outputs a single tool call per step. Full technical specifications for all tools (document construction, truncation implementation, index architecture, caveats) are in `[TOOLS.md](./TOOLS.md)`.
393
-
394
-
395
- | Tool | Input | What It Does | Output |
396
- | ---------------------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
397
- | `browser_agent(task, url)` | Task string + app base URL | Checks for pre-recorded HAR; if found, processes it — otherwise launches live browser to perform task and record traffic. Extracts OpenAPI-like spec, builds GEMMA embeddings for search. | Summary list of API endpoint names + methods (e.g. `GET /products`). No schemas/headers. Use `search_endpoints()` for details. |
398
- | `search_endpoints(query)` | Natural language query | Semantic search over GEMMA-embedded endpoint spec built by `browser_agent`. Returns full parameter details for matching endpoints. | Top-3 endpoint schemas (method, path, auth, params with sources, response fields) |
399
- | `curl_exec(command)` | Full curl command string | Executes HTTP call against live EC2 server, indexes full response into episode BM25 store, returns truncated observation. | `{status_code, headers, body}` — body smart-truncated; full body indexed to episode store |
400
- | `search_episode_data(query)` | Keyword or natural language query | Hybrid BM25 + GEMMA semantic search over all request/response bodies from prior `curl_exec` calls in this episode. | Top-5 JSON objects from this episode's request/response history |
401
- | `done(result?)` | Optional result string | Signals task complete, triggers judge evaluation. | Ends episode |
402
-
403
-
404
- `browser_agent` is called **exactly once per episode at step 1**. During training, it loads a cached pre-recorded HAR file(if available); at inference, it will launch a live browser session. It returns the deduplicated list of API endpoint patterns observed in the network traffic. **If called again after step 1, the call executes normally but a −0.3 penalty is applied to the reward.** `search_endpoints` then provides the full schema for any specific endpoint the model wants to call — searching the GEMMA embeddings built by `browser_agent` from the HAR data.
405
-
406
- `curl_exec` is the primary HTTP action — one string that encodes method, URL, headers, and body together, exactly as API documentation is written. This lets the model leverage its pretrained knowledge of `curl` syntax while producing calls that are self-documenting.
407
-
408
- ```bash
409
- # Step 1 — Discover which endpoint creates a guest cart
410
- # (model calls search_endpoints first, sees: POST /rest/V1/guest-carts)
411
-
412
- # Step 2 — Create guest cart
413
- curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'
414
- # → body: "cart-abc123" (plain string — never truncated)
415
-
416
- # Step 3 — Find the product SKU (list response, truncated to 2 items + note)
417
- curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&searchCriteria[filter_groups][0][filters][0][value]=Radiant+Tee'
418
- # → body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}],"total_count":1}
419
- # (1 item — not truncated; if 200 items, all 200 indexed, 2 shown in context)
420
-
421
- # Step 4 — Add item (model reads cart-abc123 from step 2, MH01 from step 3 — all in history)
422
- curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' \
423
- -H 'Content-Type: application/json' \
424
- -d '{"cartItem":{"sku":"MH01","qty":1,"quote_id":"cart-abc123"}}'
425
- ```
426
-
427
- Values from prior responses (cart IDs, SKUs, tokens) are threaded directly from the growing episode history. `session_state` tokens (cookies, CSRF values) are auto-injected by the environment. If a list response was truncated and the model needs a specific item not shown in the 2-item sample, it calls `search_episode_data("Radiant Tee sku")` — all 200 items are indexed, even though only 2 were shown in context.
428
-
429
- ### Prompt Structure:
430
-
431
  ```
432
- SYSTEM: You are an API agent. Complete the task using only the tools available:
433
- browser_agent, search_endpoints, curl_exec, search_episode_data, done.
434
- When a response is HTML, look for JSON data embedded in <script> tags or
435
- extract values from <input> fields. CSRF tokens appear as hidden inputs:
436
- <input type="hidden" name="_csrf_token" value="XYZ">
437
-
438
- TASK: Add "Radiant Tee" to a guest cart at http://ec2-16-59-2-56.../
439
-
440
- [session_state: {}]
441
-
442
- STEP 1 ACTION: browser_agent("Add Radiant Tee to a guest cart", "http://ec2-...:7770/")
443
- STEP 1 RESULT: {"app": "shopping", "endpoints": [
444
- "POST /rest/V1/guest-carts",
445
- "GET /rest/V1/products",
446
- "POST /rest/V1/guest-carts/{id}/items",
447
- ...
448
- ], "note": "Use search_endpoints() to get full schema for any of these."}
449
-
450
- STEP 2 ACTION: search_endpoints("create guest cart")
451
- STEP 2 RESULT: ["endpoint: POST /rest/V1/guest-carts | auth: none | returns: string (cartId)", ...]
452
-
453
- STEP 3 ACTION: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'")
454
- STEP 3 RESULT: {status_code: 200, body: "cart-abc123"}
455
 
456
- STEP 4 ACTION: search_endpoints("find product by name get sku")
457
- STEP 4 RESULT: ["endpoint: GET /rest/V1/products | query: searchCriteria filters | returns: .items[].sku .items[].name", ...]
458
 
459
- STEP 5 ACTION: curl_exec("curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&searchCriteria[filter_groups][0][filters][0][value]=Radiant+Tee'")
460
- STEP 5 RESULT: {status_code: 200, body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}],"total_count":1}}
 
 
 
 
461
 
462
- STEP 6 ACTION: search_endpoints("add item to guest cart cartId")
463
- STEP 6 RESULT: ["endpoint: POST /rest/V1/guest-carts/{cartId}/items | path: cartId from POST /rest/V1/guest-carts | body: cartItem.sku, cartItem.qty, cartItem.quote_id (same as cartId)", ...]
464
 
465
- STEP 7 ACTION: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' -H 'Content-Type: application/json' -d '{\"cartItem\":{\"sku\":\"MH01\",\"qty\":1,\"quote_id\":\"cart-abc123\"}}'")
466
- STEP 7 RESULT: {status_code: 200, body: {"item_id": 5, "sku": "MH01", "qty": 1}}
467
-
468
- → generate STEP 8: done("Radiant Tee added to cart")
469
- ```
470
 
471
- `browser_agent` at step 1 gives the model the full endpoint landscape upfront — it can see `/rest/V1/guest-carts` and `/rest/V1/products` immediately and plan the call sequence before making any HTTP calls. `search_endpoints` fills in the exact parameter schemas. Value threading (`"MH01"`, `"cart-abc123"`) happens through the growing history — if step 5 had returned 200 products truncated to 2, the model would call `search_episode_data("Radiant Tee sku")` to retrieve `MH01` from the episode index.
472
 
473
- ### Parameter Relationship Graph (What the Judge Knows)
 
 
 
 
 
 
474
 
475
- The judge holds a complete dependency map for each task:
476
 
477
- ```
478
- Parameter Source Types:
479
- TASK_SPEC — value given directly in the task (e.g., "product #42")
480
- PREV_CALL — value from a prior API response in this episode
481
- AUTH_FLOW — value obtained during authentication (session token, CSRF key)
482
- STATIC — fixed value known from the application (e.g., store_id = 1)
483
- DERIVED — computed from another value (e.g., cart_id = quote_id)
484
- ```
485
-
486
- For each task, the judge knows which parameters fall into which category, and whether the model correctly sourced each value. This is how partial credit works — the model gets reward for correctly threading a `cart_id` even if the final call had a wrong field elsewhere.
487
 
488
  ### Reward Space
489
 
490
  **Per-step:**
491
 
492
-
493
- | Signal | Value | Trigger |
494
- | ---------------------------- | ----- | --------------------------------------------------------------------------------------------------- |
495
- | Valid API call (2xx) | +0.2 | `curl_exec` returns 2xx status |
496
- | New path called this episode | +0.1 | `curl_exec` normalized path not called before in this episode — discourages looping on one endpoint |
497
- | Correct parameter sourcing | +0.25 | judge: value in curl call came from the correct source type |
498
- | Session value correctly used | +0.1 | auth token/cookie present and correct in curl call |
499
- | Repeated identical call | −0.15 | exact duplicate curl command issued twice |
500
- | browser_agent called again | −0.3 | `browser_agent` called after step 1 — call executes normally, penalty applied to reward |
501
- | Malformed curl command | −0.1 | curl cannot be parsed or executed by the environment |
502
- | 4xx response (recoverable) | −0.05 | call failed but episode continues |
503
-
504
-
505
- Note: `search_endpoints`, `search_episode_data`, and `done` carry no direct per-step reward. Using `search_endpoints` to find the correct schema is indirectly rewarded by enabling correct parameter sourcing (+0.25) in the curl call that follows. `search_episode_data` is indirectly rewarded by allowing the model to retrieve the correct value to place in the next curl command.
506
 
507
  **Episode end:**
508
 
509
-
510
  | Outcome | Reward |
511
  | ----------------------------------------------------------- | ------------------------------------------ |
512
  | Task completed correctly | +2.0 to +5.0 (scales with difficulty tier) |
@@ -514,139 +273,148 @@ Note: `search_endpoints`, `search_episode_data`, and `done` carry no direct per-
514
  | Authentication correctly obtained (even if task fails) | +0.3 |
515
  | Timeout / task failed entirely | −1.5 |
516
 
517
-
518
  Target signal separation: successful episodes `+3` to `+7`, failed episodes `−2` to `−1`. Required for GRPO.
519
 
520
- > **Reward design insight:** Pure step-level rewards can teach a model to "look busy" — accumulating +0.2 (valid call) and +0.1 (new path) rewards while never converging to task completion. To prevent this, the terminal outcome reward must dominate the sum of all per-step rewards. Two mechanisms enforce this:
521
- >
522
- > 1. **Hard ceiling on step rewards per episode.** Maximum achievable per-step reward over 20 steps is bounded: `20 × (0.2 + 0.1 + 0.25 + 0.1) = 13`. But a failed episode still ends at `−1.5`, so any correct episode completion still produces a substantially better total.
523
- > 2. **Curriculum learning as the primary defense.** Easy tasks (Template 1: single GET, no auth) have a trivially short optimal path (2 steps). There is no room to accumulate "fake" exploration reward when the optimal episode only needs 2 calls. The model learns that the terminal reward is the only thing that matters before it encounters tasks long enough to be gamed. Medium and Hard tiers are introduced only after the model reliably solves Easy — by then the behavior pattern is already anchored. This mirrors how SWE-gym-style environments scale difficulty: start simple enough that the reward signal is unambiguous, then broaden.
524
- >
525
- > **Premature `done()` penalty:** If the judge scores the final state as incorrect (task not completed), the episode ends at `−1.5`. There is no bonus for calling `done()` early — it is strictly worse than continuing to make correct API calls. The model only benefits from calling `done()` when the task is actually complete.
526
-
527
- **Reset behavior:** `reset()` clears session state, episode history, episode BM25 index, step counter. It does not reset the remote application database. The judge evaluates relative state (did the cart contain the item?), not absolute state (is the DB row count exactly N?).
528
 
529
  ---
530
 
531
- ## HTML / Form-Submission Handling
532
 
533
- Not every endpoint in the target applications returns JSON. The Forum (Postmill) and Wikipedia (Kiwix) applications rely on HTML form submissions and HTML responses respectively. The agent is designed to handle both transparently.
534
 
535
- ### Why This Matters
536
 
537
- A generalizable API agent must work with the full spectrum of web interfacesnot just REST JSON endpoints. Form-based POST submissions (with CSRF tokens, multipart bodies, URL-encoded fields) are ubiquitous in real web applications. Training on them is intentional: the model learns to identify the correct request format from context rather than assuming JSON everywhere.
538
 
539
- ### CSRF Token Extraction
540
 
541
- Postmill protects state-changing routes (login, post creation) with a per-session CSRF token. This token is embedded as a hidden `<input>` field in the HTML form:
 
 
542
 
543
- ```html
544
- <input type="hidden" name="_csrf_token" value="abc123XYZ">
 
 
 
 
 
 
545
  ```
546
 
547
- **How the model handles this no dedicated CSRF tool needed:**
548
 
549
- 1. The model issues a GET to the form page (e.g., `GET /login`).
550
- 2. The environment returns the HTML body, truncated to 3,000 characters (raised from 1,000 specifically to ensure hidden input fields near the end of small forms are included).
551
- 3. The model reads the `value` attribute of `input[name="_csrf_token"]` directly from the returned HTML string. HTML parsing is not required — the token appears as a predictable plain-text pattern in the markup.
552
- 4. The model places the extracted token into the subsequent POST body or form field.
553
- 5. The environment auto-extracts any `Set-Cookie` header from the login response into `session_state`, so subsequent requests are automatically authenticated.
554
 
555
- If the CSRF token is positioned after the 3,000-character cutoff (possible in very large rendered pages), the model can call `search_episode_data("_csrf_token")` — the full HTML body is indexed into the episode store before truncation, making the token retrievable by keyword search.
556
 
557
- ```bash
558
- # Forum login flow
559
- curl -X POST 'http://ec2-.../login' \
560
- -H 'Content-Type: application/x-www-form-urlencoded' \
561
- -d '_csrf_token=abc123XYZ&_username=user&_password=pass'
562
- # → 302 redirect + Set-Cookie: PHPSESSID=... (auto-injected into session_state)
563
-
564
- # Forum post creation
565
- curl -X POST 'http://ec2-.../f/general/submit' \
566
- -H 'Content-Type: application/x-www-form-urlencoded' \
567
- -d '_csrf_token=abc123XYZ&title=My+Post&body=Hello+World'
568
- ```
569
 
570
- ### Wikipedia / HTML-Only Responses
 
571
 
572
- Kiwix serves static HTML pages — there is no JSON API. The agent treats Wikipedia responses as structured text: search results appear in `<a href>` anchor tags; article content is in `<p>` tags.
573
 
574
- The environment wraps the truncated HTML response in a lightweight JSON envelope before returning it to the model, so the observation format is always `{status_code, headers, body}` regardless of content type:
575
 
576
- ```json
577
- {
578
- "status_code": 200,
579
- "headers": {"Content-Type": "text/html"},
580
- "body": "<html>...<ul class='mw-search-results'><li><a href='/wiki/Mars'>Mars</a>...</ul>..."
581
- }
582
  ```
 
583
 
584
- For Template 2 ("Retrieve article summary for `{title}`"), task completion is verified by confirming the correct article URL was fetched and returned HTTP 200 — not by parsing article content. This makes the grader robust to HTML structure changes.
 
 
585
 
586
- ### Form vs. JSON Detection
 
587
 
588
- `curl_exec` detects whether a request is form-encoded or JSON by inspecting the `Content-Type` header in the curl command string:
 
589
 
590
- - `Content-Type: application/json` body is JSON, response indexed as JSON
591
- - `Content-Type: application/x-www-form-urlencoded` or `multipart/form-data` body is form data, response indexed as text
592
- - No `Content-Type` (GET requests) → response indexed based on `Content-Type` of the response
593
 
594
- The model is responsible for setting the correct `Content-Type` in its curl command. The system prompt includes explicit guidance on when to use each.
 
595
 
596
- ---
 
597
 
598
- ## Tasks
 
599
 
600
- HARvestGym trains on **7 task templates** rather than a larger flat task list. Each template is a parameterized scenario: one reward function, one ground truth catalog entry, one grader — but potentially hundreds of distinct episode variations produced by substituting different values for the template slots (`{product_name}`, `{category_name}`, etc.).
 
601
 
602
- If the training went smoothly, then we can scale it to automatically task creation to create all possible aspects of a task.
603
 
604
- **How template parameters are populated:** Before training, a one-time data prep step calls the application's own listing APIs and builds a static **parameter pool** for each template (see `[parameter_pools.json](parameter_pools.json)`, refreshed via `[scripts/build_parameter_pools.py](scripts/build_parameter_pools.py)`):
605
 
 
606
 
607
- | Template slot | Source |
608
- | ----------------------------- | --------------------------------------------------------------- |
609
- | `{category_name}` | `GET /rest/V1/categories` — all leaf category names |
610
- | `{product_name}` | `GET /rest/V1/products?pageSize=200` — all product names + SKUs |
611
- | `{forum_category}` | Forum's category listing API |
612
- | `{title}`, `{sku}`, `{price}` | Generated or sampled from existing product names |
613
 
 
 
 
614
 
615
- Each episode samples randomly from its pool. The model never sees the pool directly — it gets the task string (e.g., `"Add 'Radiant Tee' to a guest cart"`) and must discover the correct endpoint + SKU through its own API calls.
616
 
617
- ### Complexity Tiers
 
 
 
618
 
619
- Templates are organized into **complexity tiers** for curriculum training — the model only graduates to harder templates once it reliably solves easier ones:
 
620
 
 
 
621
 
622
- | Tier | Characteristic | API calls required |
623
- | ------ | --------------------------------------------- | ------------------ |
624
- | Easy | Single call, no auth | 1 |
625
- | Medium | Auth + 1–2 dependent calls | 2–3 |
626
- | Hard | Multi-step chain with ID threading, full auth | 4–8+ |
627
 
 
 
 
628
 
629
- ### Task Templates
630
 
 
 
 
 
 
 
631
 
632
- | # | Tier | App | Template | Key Challenge |
633
- | --- | ------ | -------------- | ------------------------------------------------------ | ------------------------------------------------------- |
634
- | 1 | Easy | Shopping | List products in category `{category_name}` | Single GET with query params |
635
- | 2 | Easy | Wikipedia | Retrieve article summary for `{title}` | Single GET, path parameter resolution |
636
- | 3 | Medium | Shopping | Add `{product_name}` to a guest cart | 2 calls: create cart → add item; ID threading |
637
- | 4 | Medium | Forum | Retrieve all posts in `{forum_category}` (authed) | Login → extract session → GET |
638
- | 5 | Hard | Forum | Create a post titled `{title}` in `{category}` | Login → extract CSRF `form_key` → POST with full schema |
639
- | 6 | Hard | Shopping | Guest checkout for `{product_name}` | 5+ chained calls; cart → item → shipping → payment |
640
- | 7 | Hard | Shopping Admin | Create a new product with SKU `{sku}`, price `{price}` | Admin bearer token → full Magento product schema |
641
 
 
 
 
642
 
643
- Each task has a deterministic programmatic grader (score in `[0.0, 1.0]`):
644
-
645
- - **Easy graders**: check HTTP response body for expected values
646
- - **Medium graders**: probe application state after episode (e.g., fetch the cart, verify item is present)
647
- - **Hard graders**: verify multi-step state change in the application (e.g., post exists, checkout created)
648
 
649
- **On optional request parameters:** API responses and real network traffic often contain extra headers and parameters (`X-Requested-With`, `Cache-Control`, correlation IDs, etc.) that are not functionally required. The judge scores only on *required* parameters. Extra or missing optional headers or body params do not affect the reward signal.
 
 
650
 
651
  ---
652
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  # HARvestGym
17
 
18
+ ### Can a small model learn to reverse-engineer any web application's API and complete real tasks through those APIs, without ever opening a browser?
19
 
20
+ Web applications are full of APIs. Every click in a browser triggers an HTTP call with a precise schema, a specific authentication header, an exact sequence of prerequisites. **HARvestGym trains a small model to do all of that directly** given a task and a URL, it discovers the relevant endpoints, figures out what each one needs, chains the calls in the right order, and completes the task without any browser.
 
 
21
 
22
  The model starts with nothing: no schema, no documentation, no endpoint list. It uses tools to explore — issuing requests, inspecting responses, building up its own understanding of how the application works. This is what a developer does when they reverse-engineer an API. The model learns to do the same.
23
 
24
+ ---
25
 
26
+ ## How It Works
27
 
28
+ ```
29
+ Task + App URL
30
+
31
+
32
+ Policy Model (RL Agent)
33
+ small model — no prior knowledge of the app
34
+
35
+ Step 1 ──► browser_agent(task, url) → filtered API endpoint list
36
+ Step 2+ ──► search_endpoints(query) → full schema for a specific endpoint
37
+ ──► curl_exec(command) → execute HTTP call, get response
38
+ ──► search_episode_data(query) → search prior response bodies
39
+ ──► done(result) → declare task complete
40
+
41
+
42
+ Live WebArena Apps (EC2) ←── real HTTP responses (always live, never mocked)
43
+
44
+
45
+ Deterministic Judge (compares against ground truth API catalog)
46
+
47
+
48
+ Reward Signal ──► GRPO ──► updated policy
49
+ ```
50
 
51
+ The agent calls `browser_agent` once at the start — this runs a real browser to complete the same task while recording all network traffic, then returns the filtered list of API endpoints observed. The agent now has a map of what endpoints exist. What it does *not* know:
52
 
53
  - which of those endpoints are actually needed for this specific task
54
  - in what order they must be called (you cannot add to a cart before the cart exists)
55
  - where each required parameter value comes from
56
  - how to re-authenticate if a session expires mid-episode
57
 
58
+ The model must learn to discover all of this on its own.
 
 
 
 
 
 
 
 
 
 
59
 
60
  ---
61
 
 
70
  │ ▼ │
71
  │ ┌────────────────────────────────────────────────────────────────┐ │
72
  │ │ Policy Model (RL Agent) │ │
73
+ │ │ small model — no prior knowledge of the app │ │
74
  │ │ │ │
75
+ │ │ Observation: task + history + session_state + last_result │ │
76
  │ │ │ │
77
+ │ │ Step 1 ──► browser_agent(task, url) │ │
78
+ │ │ Step 2+ ──► search_endpoints(query) │ │
79
+ │ │ ──► curl_exec(command) │ │
80
+ │ │ ──► search_episode_data(query) │ │
81
+ │ │ ──► done(result) │ │
82
  │ └────────┬───────────────────────────────────────────────────────┘ │
83
  │ │ │
84
+ │ ┌──────┴──────────────────────────────┐
85
+ │ │ │
86
+ │ ▼ ▼
87
+ │ ┌─────────────────────┐ ┌─────────────────────────────────────┐
88
+ │ │ Browser Agent │ │ Environment │
89
+ │ │ (step 1 only) │ │ │
90
+ │ │ │ │ • Executes curl_exec via subprocess│
91
+ │ │ Training: │ │ • Auto-injects session cookies │
92
+ │ │ Load pre-recorded │ │ • Smart-truncates response bodies │
93
+ │ │ cached HAR from │ │ • Indexes full responses into │
94
+ │ │ disk or launch │ │ per-episode BM25 + GEMMA store │
95
+ │ │ on real browser │ │ • Manages session_state: cookies, │
96
+ │ │ │ │ CSRF tokens, auth headers │
97
+ │ │ Inference: │ ���──────────────┬──────────────────────┘
98
+ │ │ Launch real browser│ │
99
+ │ │ via Playwright + │ │ HTTP calls (always live)
100
+ │ │ bu-30b-a3b-preview │ ▼
101
+ │ │ │ ┌─────────────────────────────────────┐
102
+ │ │ Both paths produce: │ │ WebArena EC2 (live apps) │
103
+ │ │ • Filtered HAR │ │ │
104
+ │ │ • OpenAPI-like spec│ │ :7770 Shopping (Magento 2) │
105
+ │ │ • GEMMA embeddings │ │ :7780 Shopping Admin │
106
+ │ │ for search_ │ │ :9999 Forum (Postmill) │
107
+ │ │ endpoints() │ │ :8888 Wikipedia (Kiwix)
108
+ │ └─────────────────────┘ │ :3000 Map (OpenStreetMap)
109
+ │ └──────────────┬──────────────────────┘
110
+ │ │
111
+ │ │ episode trajectory
112
+ │ ▼
113
+ │ ┌─────────────────────────────────────┐
114
+ │ │ Deterministic Judge │
115
+ │ │ │
116
+ │ │ Per-template programmatic grader: │
117
+ │ │ • Inspects episode trajectory │
118
+ │ │ • Optionally probes live app state │
119
+ │ │ • Verifies parameter sourcing │
120
+ │ │ (TASK_SPEC / PREV_CALL / │
121
+ │ │ AUTH_FLOW / STATIC / DERIVED)
122
+ │ │ • Scores [0.0 → 1.0]
123
+ │ └──────────────┬──────────────────────┘
124
+ │ │
125
+ │ ▼
126
+ │ ┌─────────────────────────────────────┐
127
+ │ │ Reward Signal │
128
+ │ │ │
129
+ │ │ Per-step: │
130
+ │ │ +0.2 valid API call (2xx) │
131
+ │ │ +0.1 new path explored │
132
+ │ │ +0.25 correct param sourcing │
133
+ │ │ −0.15 repeated identical call │
134
+ │ │ −0.3 browser_agent called again │
135
+ │ │ │
136
+ │ │ Episode end: │
137
+ │ │ +2.0–+5.0 task complete (easy→hard│
138
+ │ │ −1.5 task failed │
139
+ │ └──────────────┬──────────────────────┘
140
+ │ │
141
+ │ ▼
142
+ │ ┌─────────────────────────────────────┐
143
+ │ │ GRPO (via HF TRL) │
144
+ │ │ │
145
+ │ │ 8 parallel rollouts per prompt │
146
+ │ │ Computes advantages without │
147
+ │ │ a value function │
148
+ │ │ Updates policy weights │
149
+ │ └────��────────────────────────────────┘
150
+ │ │
151
+ │ └──► updated Policy Model
152
  └─────────────────────────────────────────────────────────────────────────┘
153
  ```
154
 
155
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
+ ## Target Applications
158
 
159
+ All running on a single AWS EC2 instancereal production software, no simulation.
160
 
161
+ | App | Port | Software |
162
+ | -------------- | ---- | ------------------------------------------------- |
163
+ | Shopping | 7770 | Magento 2 — open-source e-commerce platform |
164
+ | Shopping Admin | 7780 | Magento 2 Admin — backend panel for the same store|
165
+ | Forum | 9999 | Postmill — open-source Reddit-like forum |
166
+ | Wikipedia | 8888 | Kiwix — read-only offline mirror of Wikipedia |
167
+ | Map | 3000 | OpenStreetMap — collaborative mapping platform |
 
 
 
 
168
 
169
+ Source: [WebArena environment_docker](https://github.com/web-arena-x/webarena/tree/main/environment_docker)
170
 
171
+ ---
172
 
173
+ ## Tasks
174
 
175
+ HARvestGym trains on **7 task templates** across three complexity tiers. Each template is a parameterized scenario: one reward function, one ground truth catalog entry, one grader — but potentially hundreds of distinct episode variations produced by substituting different values for the template slots (`{product_name}`, `{category_name}`, etc.).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
+ ### Complexity Tiers
178
 
179
+ | Tier | Characteristic | API calls required |
180
+ | ------ | --------------------------------------------- | ------------------ |
181
+ | Easy | Single call, no auth | 1 |
182
+ | Medium | Auth + 1–2 dependent calls | 2–3 |
183
+ | Hard | Multi-step chain with ID threading, full auth | 4–8+ |
184
 
185
+ The model only graduates to harder templates once it reliably solves easier ones.
186
 
187
+ ### Task Templates
188
 
189
+ | # | Tier | App | Template | Key Challenge |
190
+ | --- | ------ | -------------- | ------------------------------------------------------ | ------------------------------------------------------- |
191
+ | 1 | Easy | Shopping | List products in category `{category_name}` | Single GET with query params |
192
+ | 2 | Easy | Wikipedia | Retrieve article summary for `{title}` | Single GET, path parameter resolution |
193
+ | 3 | Medium | Shopping | Add `{product_name}` to a guest cart | 2 calls: create cart add item; ID threading |
194
+ | 4 | Medium | Forum | Retrieve all posts in `{forum_category}` (authed) | Login → extract session → GET |
195
+ | 5 | Hard | Forum | Create a post titled `{title}` in `{category}` | Login extract CSRF `form_key` POST with full schema |
196
+ | 6 | Hard | Shopping | Guest checkout for `{product_name}` | 5+ chained calls; cart → item → shipping → payment |
197
+ | 7 | Hard | Shopping Admin | Create a new product with SKU `{sku}`, price `{price}` | Admin bearer token → full Magento product schema |
198
 
199
+ **Template parameters** are populated from a static parameter pool built by querying the live applications before training (see `parameter_pools.json`, refreshed via `scripts/build_parameter_pools.py`). Each episode samples randomly from its pool — the model never sees the pool directly, it must discover the correct values through its own API calls.
200
 
201
+ Each task has a deterministic programmatic grader (score in `[0.0, 1.0]`):
202
+ - **Easy graders**: check HTTP response body for expected values
203
+ - **Medium graders**: probe application state after episode (e.g., fetch the cart, verify item is present)
204
+ - **Hard graders**: verify multi-step state change in the application (e.g., post exists, checkout created)
205
 
206
  ---
207
 
 
215
  class Observation(BaseModel):
216
  task: str # Natural language task
217
  app_base_url: str # Root URL of the target application
218
+ last_tool_result: Any # Result of last tool call
219
+ history: list[dict] # Full episode trajectory: [{action, tool_result}, ...]
220
+ session_state: dict # Auto-managed: cookies, tokens, CSRF values
 
 
 
 
 
 
 
 
221
  step_count: int
222
+ max_steps: int # 20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
224
 
225
+ `session_state` is maintained by the environment the model decides *when* to authenticate and *which* session values to use; the environment handles *extraction* from `Set-Cookie` headers and response bodies.
 
226
 
227
+ **Response truncation** rules applied in order:
228
+ 1. Non-JSON body (HTML, CSS): truncated to 3,000 characters
229
+ 2. JSON primitive (string, number): never truncated — these are tokens, IDs
230
+ 3. Error response (4xx/5xx): never truncated — the model needs every word to self-correct
231
+ 4. Small JSON (no large arrays): returned as-is
232
+ 5. Large JSON array (≥ 3 items): first 2 items shown + `_list_truncated` annotation + hint to call `search_episode_data()`
233
 
234
+ Every `curl_exec` call indexes the *full* response into a per-episode hybrid index (BM25 + GEMMA embeddings) *before* truncation — so all items are always retrievable even when only 2 were shown.
 
235
 
236
+ ### Action Space
 
 
 
 
237
 
238
+ The model outputs a single tool call per step.
239
 
240
+ | Tool | Input | Output |
241
+ | ---------------------------- | --------------------------------- | ------------------------------------------------------------------------------- |
242
+ | `browser_agent(task, url)` | Task string + app base URL | Summary list of API endpoint names + methods (e.g. `GET /products`) |
243
+ | `search_endpoints(query)` | Natural language query | Top-3 endpoint schemas (method, path, auth, params with sources, response fields)|
244
+ | `curl_exec(command)` | Full curl command string | `{status_code, headers, body}` — body smart-truncated; full body indexed |
245
+ | `search_episode_data(query)` | Keyword or natural language query | Top-5 JSON objects from this episode's request/response history |
246
+ | `done(result?)` | Optional result string | Ends episode, triggers judge evaluation |
247
 
248
+ `browser_agent` is called **exactly once per episode at step 1**. Calling it again applies a −0.3 penalty. During training, it loads a cached HAR file; at inference, it launches a live browser session.
249
 
250
+ Full technical specifications for all tools: [`TOOLS.md`](./TOOLS.md)
 
 
 
 
 
 
 
 
 
251
 
252
  ### Reward Space
253
 
254
  **Per-step:**
255
 
256
+ | Signal | Value | Trigger |
257
+ | ---------------------------- | ------ | -------------------------------------------------------------------- |
258
+ | Valid API call (2xx) | +0.2 | `curl_exec` returns 2xx status |
259
+ | New path called this episode | +0.1 | Normalized path not called before — discourages looping |
260
+ | Correct parameter sourcing | +0.25 | Judge: value came from the correct source type |
261
+ | Session value correctly used | +0.1 | Auth token/cookie present and correct in curl call |
262
+ | Repeated identical call | 0.15 | Exact duplicate curl command issued twice |
263
+ | browser_agent called again | −0.3 | `browser_agent` called after step 1 |
264
+ | Malformed curl command | −0.1 | curl cannot be parsed or executed |
265
+ | 4xx response (recoverable) | −0.05 | Call failed but episode continues |
 
 
 
 
266
 
267
  **Episode end:**
268
 
 
269
  | Outcome | Reward |
270
  | ----------------------------------------------------------- | ------------------------------------------ |
271
  | Task completed correctly | +2.0 to +5.0 (scales with difficulty tier) |
 
273
  | Authentication correctly obtained (even if task fails) | +0.3 |
274
  | Timeout / task failed entirely | −1.5 |
275
 
 
276
  Target signal separation: successful episodes `+3` to `+7`, failed episodes `−2` to `−1`. Required for GRPO.
277
 
278
+ > **Reward design note:** Pure step-level rewards can teach a model to "look busy" — accumulating exploration rewards while never completing the task. The terminal outcome reward is designed to dominate the sum of all per-step rewards. The curriculum is the primary defense: Easy tasks have a trivially short optimal path (2 steps), so there's no room to accumulate fake exploration reward before the model learns that the terminal reward is what matters.
 
 
 
 
 
 
 
279
 
280
  ---
281
 
282
+ ## Key Design Decisions
283
 
284
+ ### Browser Agent as a Discovery Tool
285
 
286
+ The RL agent has access to a **browser agent tool** powered by [`bu-30b-a3b-preview`](https://huggingface.co/browser-use/bu-30b-a3b-preview) — a 30B MoE vision-language model (3B active parameters) served via the [browser-use](https://github.com/browser-use/browser-use) library on Playwright. When called, it completes the task in a real browser while intercepting all network traffic, then returns the filtered API call list.
287
 
288
+ **Training vs. inference:** The browser agent output is pre-computed and cached per task during trainingthe RL model receives it instantly, no live browser session runs. At inference, the browser agent runs live to handle novel tasks.
289
 
290
+ Full details: [`BROWSER_AGENT.md`](BROWSER_AGENT.md)
291
 
292
+ ### Ground Truth from the Codebase, Not the Browser
293
+
294
+ The browser agent shows *what* API calls happen. It does not explain *why* — where each parameter comes from or what field constraints exist. That comes from a one-time static analysis of each WebArena application's Docker image source, producing a **ground truth API catalog**:
295
 
296
+ ```
297
+ endpoint: POST /rest/V1/guest-carts/{cartId}/items
298
+ path_params:
299
+ cartId: obtained from: POST /rest/V1/guest-carts → response body
300
+ body:
301
+ cartItem.sku: the product's SKU, from: GET /rest/V1/products → items[].sku
302
+ cartItem.qty: quantity, from: task specification
303
+ cartItem.quote_id: same as cartId
304
  ```
305
 
306
+ The judge uses this to verify not just *what* the model called, but *where each parameter value came from*. Source types: `TASK_SPEC`, `PREV_CALL`, `AUTH_FLOW`, `STATIC`, `DERIVED`. This is how partial credit works — the model gets reward for correctly threading a `cart_id` even if the final call had a wrong field elsewhere.
307
 
308
+ Full extraction process: [`GROUND_TRUTH_EXTRACTION.md`](GROUND_TRUTH_EXTRACTION.md)
 
 
 
 
309
 
310
+ ### HTML and Form-Based Applications
311
 
312
+ Not every endpoint returns JSON. The Forum (Postmill) relies on HTML form submissions with CSRF tokens; Wikipedia (Kiwix) serves static HTML pages. The agent handles both:
 
 
 
 
 
 
 
 
 
 
 
313
 
314
+ - **CSRF tokens**: The model GETs the form page, reads the `value` attribute of `input[name="_csrf_token"]` from the returned HTML, and places it in the subsequent POST. If the token is beyond the 3,000-character truncation point, it calls `search_episode_data("_csrf_token")` — the full HTML is indexed before truncation.
315
+ - **HTML-only responses**: Wikipedia responses are returned in the standard `{status_code, headers, body}` envelope. Search results appear in `<a href>` tags; article content in `<p>` tags.
316
 
317
+ ---
318
 
319
+ ## Example Episode
320
 
 
 
 
 
 
 
321
  ```
322
+ TASK: Add "Radiant Tee" to a guest cart at http://ec2-16-59-2-56.../
323
 
324
+ STEP 1: browser_agent("Add Radiant Tee to a guest cart", "http://ec2-...:7770/")
325
+ → {"endpoints": ["POST /rest/V1/guest-carts", "GET /rest/V1/products",
326
+ "POST /rest/V1/guest-carts/{id}/items", ...]}
327
 
328
+ STEP 2: search_endpoints("create guest cart")
329
+ → ["endpoint: POST /rest/V1/guest-carts | auth: none | returns: string (cartId)"]
330
 
331
+ STEP 3: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'")
332
+ → {status_code: 200, body: "cart-abc123"}
333
 
334
+ STEP 4: search_endpoints("find product by name get sku")
335
+ ["endpoint: GET /rest/V1/products | query: searchCriteria filters | returns: .items[].sku"]
 
336
 
337
+ STEP 5: curl_exec("curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&...[value]=Radiant+Tee'")
338
+ → {status_code: 200, body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}]}}
339
 
340
+ STEP 6: search_endpoints("add item to guest cart cartId")
341
+ → ["endpoint: POST /rest/V1/guest-carts/{cartId}/items | path: cartId from POST /rest/V1/guest-carts | body: cartItem.sku, cartItem.qty, cartItem.quote_id"]
342
 
343
+ STEP 7: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' -H 'Content-Type: application/json' -d '{\"cartItem\":{\"sku\":\"MH01\",\"qty\":1,\"quote_id\":\"cart-abc123\"}}'")
344
+ → {status_code: 200, body: {"item_id": 5, "sku": "MH01", "qty": 1}}
345
 
346
+ STEP 8: done("Radiant Tee added to cart")
347
+ ```
348
 
349
+ Values from prior responses (`cart-abc123`, `MH01`) are threaded directly from the growing episode history. If step 5 had returned 200 products truncated to 2, the model would call `search_episode_data("Radiant Tee sku")` to retrieve `MH01` from the episode index.
350
 
351
+ ---
352
 
353
+ ## Setup
354
 
355
+ ### Prerequisites
 
 
 
 
 
356
 
357
+ - Docker installed and running
358
+ - Python 3.11+ with [`uv`](https://github.com/astral-sh/uv)
359
+ - A Hugging Face token with read access
360
 
361
+ ### Local Development
362
 
363
+ ```bash
364
+ # Clone and enter the project
365
+ git clone <your-hf-space-url>
366
+ cd HARvestGym
367
 
368
+ # Install dependencies
369
+ uv sync
370
 
371
+ # Validate the OpenEnv spec
372
+ openenv validate
373
 
374
+ # Build and run the Docker image
375
+ docker build -t harvgym .
376
+ docker run -p 8000:8000 harvgym
 
 
377
 
378
+ # Run the inference script
379
+ HF_TOKEN=hf_xxx uv run inference.py
380
+ ```
381
 
382
+ ### Environment Variables
383
 
384
+ | Variable | Default | Required | Purpose |
385
+ | -------------- | ------------------------------------ | -------- | ----------------------------------------- |
386
+ | `HF_TOKEN` | — | **Yes** | HuggingFace auth token |
387
+ | `API_BASE_URL` | `https://router.huggingface.co/v1` | No | LLM API endpoint |
388
+ | `MODEL_NAME` | `google/gemma-4-31B-it` | No | Model for inference |
389
+ | `HARVGYM_TASK` | `har_classify_easy` | No | Override which task to run |
390
 
391
+ ### API Endpoints
 
 
 
 
 
 
 
 
392
 
393
+ ```bash
394
+ # Reset episode
395
+ curl -X POST http://localhost:8000/reset
396
 
397
+ # Execute a step
398
+ curl -X POST http://localhost:8000/step \
399
+ -H "Content-Type: application/json" \
400
+ -d '{"tool": "browser_agent", "args": {"task": "...", "url": "..."}}'
 
401
 
402
+ # Get current state
403
+ curl http://localhost:8000/state
404
+ ```
405
 
406
  ---
407
 
408
+ ## Baseline Performance
409
+
410
+ Scores generated by running `uv run inference.py` with `google/gemma-4-31B-it` via the HuggingFace Router.
411
+
412
+ | Task | Difficulty | Score | Steps | Result | Notes |
413
+ | ---- | ---------- | ----- | ----- | ------ | ----- |
414
+ | `easy_list_pants` | Easy | **0.74** | 6 | PASS | List products in 'Pants' category |
415
+ | `medium_cart_camera_backpack` | Medium | **0.56** | 20 | PASS | Add Camera Backpack to guest cart |
416
+ | `medium_cart_flannel_jacket` | Medium | **0.60** | 20 | PASS | Add Flannel Jacket to guest cart |
417
+ | `hard_checkout_ripstop_pants` | Hard | **0.22** | 20 | FAIL | Full guest checkout (hit step limit) |
418
+ | **Overall** | — | **0.53** | — | **3/4 passed** | |
419
+
420
+ > **To regenerate:** `HF_TOKEN=hf_xxx uv run inference.py`
hars/forum.har CHANGED
The diff for this file is too large to render. See raw diff
 
hars/shopping.har CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dc116ba8f3cb52e5fe8335dcaf1eefbb88161df4d494f30832338f57bbe52ed9
3
- size 13392889
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:878c65126d999ef91d6b75438431f7c1b9164ac580140bd7ca61ef693cacd76c
3
+ size 115555293
hars/shopping_admin.har CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1c9d48fde1cc1f65c0e81ff9a46d1b23fece9c352b1c548de91ca848ee2411f1
3
- size 60961456
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce2209be9f3265b0a1682935171fb932c0056bc67b7517419b3ef5239c2ba2be
3
+ size 148077790
hars/wikipedia.har CHANGED
The diff for this file is too large to render. See raw diff
 
inference.py CHANGED
@@ -29,39 +29,65 @@ Usage:
29
  import asyncio
30
  import json
31
  import os
 
32
  import sys
33
  import textwrap
 
34
  from typing import Any, List, Optional
35
 
36
  from openai import OpenAI
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  # ---------------------------------------------------------------------------
39
  # Configuration — auto-detect provider from env vars
40
  # ---------------------------------------------------------------------------
41
 
42
- _OPENROUTER_KEY = os.getenv("OPENROUTER_API_KEY")
43
- _HF_TOKEN = os.getenv("HF_TOKEN")
 
 
 
 
44
 
 
45
  if _OPENROUTER_KEY:
46
- # OpenRouter mode — great for testing with powerful models cheaply
47
  API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
48
  API_KEY = _OPENROUTER_KEY
49
  MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31b-it")
50
- HF_TOKEN = _HF_TOKEN # still needed for the env server itself
51
  print(f"[INFO] Provider: OpenRouter | Model: {MODEL_NAME}", flush=True)
52
- elif _HF_TOKEN:
53
  # HuggingFace Inference Router — final submission target
54
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
55
- API_KEY = _HF_TOKEN
56
- HF_TOKEN = _HF_TOKEN
57
- MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
58
  print(f"[INFO] Provider: HuggingFace | Model: {MODEL_NAME}", flush=True)
59
- else:
60
- raise ValueError(
61
- "No API key found. Set either:\n"
62
- " OPENROUTER_API_KEY=sk-or-xxx (for OpenRouter testing)\n"
63
- " HF_TOKEN=hf_xxx (for HuggingFace submission)"
64
- )
65
 
66
  # ---------------------------------------------------------------------------
67
  # Tool definitions — proper OpenAI function-calling format.
@@ -79,23 +105,22 @@ TOOLS = [
79
  "function": {
80
  "name": "browser_agent",
81
  "description": (
82
- "Discovers all available API endpoints for the target web application "
83
- "by replaying recorded HTTP traffic (HAR files) and augmenting with a "
84
- "ground-truth API catalog. Returns a structured index of endpoints with "
85
- "methods, paths, and parameter schemas. "
86
- "MUST be called exactly once at step 1 before any other tool. "
87
- "Do NOT call again after step 1."
88
  ),
89
  "parameters": {
90
  "type": "object",
91
  "properties": {
92
  "task": {
93
  "type": "string",
94
- "description": "The natural language task description (e.g. 'Add Radiant Tee to cart')",
95
  },
96
  "url": {
97
  "type": "string",
98
- "description": "Base URL of the target application (e.g. 'http://host:7770/')",
99
  },
100
  },
101
  "required": ["task", "url"],
@@ -109,20 +134,19 @@ TOOLS = [
109
  "function": {
110
  "name": "search_endpoints",
111
  "description": (
112
- "Search the discovered API endpoint catalog using a natural language query. "
113
- "Returns matching endpoint schemas including HTTP method, full path, "
114
- "required/optional parameters, authentication requirements, and example payloads. "
115
- "Use this after browser_agent to find the exact endpoint and payload structure "
116
- "before making a curl_exec call. "
117
- "Examples: 'create guest cart', 'add item to cart', 'set shipping address', "
118
- "'place order', 'get products by category'."
119
  ),
120
  "parameters": {
121
  "type": "object",
122
  "properties": {
123
  "query": {
124
  "type": "string",
125
- "description": "Natural language description of the API operation you need (e.g. 'create guest cart', 'add item to cart')",
 
 
126
  },
127
  },
128
  "required": ["query"],
@@ -136,14 +160,18 @@ TOOLS = [
136
  "function": {
137
  "name": "curl_exec",
138
  "description": (
139
- "Execute an HTTP request against the live application. "
140
- "Returns {status_code, headers, body} with the full API response. "
141
- "Session cookies and auth tokens are automatically injected do NOT "
142
- "manually set Cookie or Authorization headers. "
143
- "Use proper curl syntax with -s (silent) flag. "
144
- "Always include -H 'Content-Type: application/json' for POST/PUT requests. "
145
- "Read the response body carefully it contains IDs (cart_id, item_id, order_id) "
146
- "needed for subsequent steps."
 
 
 
 
147
  ),
148
  "parameters": {
149
  "type": "object",
@@ -151,12 +179,10 @@ TOOLS = [
151
  "command": {
152
  "type": "string",
153
  "description": (
154
- "Full curl command string. Examples:\n"
155
- " GET: curl -s -X GET 'http://host/rest/V1/categories'\n"
156
- " POST: curl -s -X POST 'http://host/rest/V1/guest-carts' -H 'Content-Type: application/json'\n"
157
- " POST with body: curl -s -X POST 'http://host/rest/V1/guest-carts/CART_ID/items' "
158
- "-H 'Content-Type: application/json' "
159
- "-d '{\"cartItem\":{\"sku\":\"MH01-XS-Black\",\"qty\":1,\"quote_id\":\"CART_ID\"}}'"
160
  ),
161
  },
162
  },
@@ -171,18 +197,21 @@ TOOLS = [
171
  "function": {
172
  "name": "search_episode_data",
173
  "description": (
174
- "Search all prior API responses collected during this episode for a specific value. "
175
- "Use when a previous curl_exec response was long/truncated and you need to find "
176
- "a specific item, ID, SKU, or field value from it. "
177
- "Examples: 'cart id from guest-carts response', 'product SKU for Radiant Tee', "
178
- "'category id for Gear'."
 
 
179
  ),
180
  "parameters": {
181
  "type": "object",
182
  "properties": {
183
  "query": {
184
  "type": "string",
185
- "description": "What value you are looking for in the episode's response history (e.g. 'cart id', 'SKU for Radiant Tee')",
 
186
  },
187
  },
188
  "required": ["query"],
@@ -196,58 +225,197 @@ TOOLS = [
196
  "function": {
197
  "name": "done",
198
  "description": (
199
- "Signal that the task is fully complete. Call this ONLY after you have "
200
- "successfully executed all required API calls and verified the outcome "
201
- "(e.g. item was added to cart, order was placed). "
202
- "Do NOT call done() as a fallback or when uncertain — it triggers final scoring."
203
  ),
204
  "parameters": {
205
  "type": "object",
206
  "properties": {
207
  "result": {
208
  "type": "string",
209
- "description": "Optional summary of what was accomplished (e.g. 'Added Radiant Tee to cart CART_ID, item_id=42')",
210
  },
211
  },
212
  "additionalProperties": False,
213
  },
214
- "strict": False, # result is optional
215
  },
216
  },
217
  ]
218
 
219
  BENCHMARK = "harvgym"
220
  MAX_STEPS = 20
221
- TEMPERATURE = 0.2 # Lower temp → more deterministic tool calls
222
- MAX_TOKENS = 1024 # More room for reasoning + JSON
223
  SUCCESS_SCORE_THRESHOLD = 0.5
224
 
225
- # Task definitions: use FIXED task descriptions so the model always knows
226
- # exactly what to do (env.reset() may randomize, but we tell it the target)
227
- TASKS = [
228
- {
229
- "task_name": "har_classify_easy",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
  "template_id": 1,
231
- "description": "List products in the 'Gear' category",
232
- "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
233
  "difficulty": "easy",
234
- },
235
- {
236
- "task_name": "har_classify_medium",
237
- "template_id": 3,
238
- "description": "Add 'Radiant Tee' (SKU: MH01-XS-Black) to a guest cart",
239
- "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
240
- "difficulty": "medium",
241
- },
242
- {
243
- "task_name": "har_pipeline_hard",
244
- "template_id": 6,
245
- "description": "Complete a full guest checkout for 'Radiant Tee' (SKU: MH01-XS-Black)",
246
- "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
247
- "difficulty": "hard",
248
- },
 
 
 
 
 
 
 
 
 
249
  ]
250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  # ---------------------------------------------------------------------------
252
  # Logging helpers (hackathon format)
253
  # ---------------------------------------------------------------------------
@@ -279,27 +447,35 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
279
  # ---------------------------------------------------------------------------
280
 
281
  SYSTEM_PROMPT = textwrap.dedent("""
282
- You are an API agent completing real-world tasks on a live Magento e-commerce application
283
- by calling HTTP APIs in the correct sequence.
284
 
285
  WORKFLOW:
286
- 1. Call browser_agent (step 1 only) to discover all available API endpoints.
287
- 2. Call search_endpoints to find the exact endpoint schema you need.
288
- 3. Call curl_exec to execute the HTTP request. Read the response it contains IDs for next steps.
289
- 4. Repeat steps 2-3 for each action in the task (create cart add item → set address → place order).
290
- 5. Call done() only after the task is fully accomplished.
291
-
292
- KEY FACTS about Magento REST API (http://host:7770/rest/V1/):
293
- - Guest cart flow: POST /guest-carts → returns cartId string
294
- - Add item: POST /guest-carts/{cartId}/items body: {"cartItem":{"sku":"...","qty":1,"quote_id":"{cartId}"}}
295
- - Shipping: POST /guest-carts/{cartId}/shipping-information
296
- - Place order: PUT /guest-carts/{cartId}/order
297
-
298
- RULES:
299
- - Call browser_agent exactly once at step 1.
300
- - Always call search_endpoints before curl_exec to get the correct path and payload.
301
- - Cart IDs, item IDs, and order IDs come from curl_exec responses read them carefully.
302
- - Do not call done() until the task is verified complete.
 
 
 
 
 
 
 
 
303
  """).strip()
304
 
305
 
@@ -334,12 +510,10 @@ def build_user_prompt(task_desc: str, app_base_url: str, step: int,
334
  """Build the user prompt for each step."""
335
  history_lines = []
336
  if history:
337
- # Show last 8 steps with meaningful result summaries
338
- for h in history[-8:]:
339
  result = h.get("result", {})
340
- # For curl results: show status_code + first 200 chars of body
341
  if isinstance(result, dict) and "status_code" in result:
342
- body_preview = str(result.get("body", ""))[:300]
343
  result_summary = f'status={result["status_code"]} body={body_preview}'
344
  else:
345
  result_summary = str(result)[:300]
@@ -351,18 +525,23 @@ def build_user_prompt(task_desc: str, app_base_url: str, step: int,
351
  session_str = json.dumps(session_state, indent=2)[:500] if session_state else "{}"
352
  last_result_str = _format_result_for_context(last_result)
353
 
 
 
 
 
 
354
  return textwrap.dedent(f"""
355
  TASK: {task_desc}
356
  APP URL: {app_base_url}
357
  STEP: {step}/{MAX_STEPS}
358
 
359
- SESSION STATE (cookies/tokens auto-managed):
360
  {session_str}
361
 
362
  LAST TOOL RESULT:
363
  {last_result_str}
364
 
365
- HISTORY (last {len(history_lines)} steps):
366
  {chr(10).join(history_lines) if history_lines else " (none yet)"}
367
 
368
  What is your next tool call? Output ONLY the JSON object.
@@ -384,51 +563,86 @@ def get_model_action(client: OpenAI, task_desc: str, app_base_url: str,
384
  "X-Title": "HARvestGym",
385
  }
386
 
387
- try:
388
- # Use the OpenAI tools API each tool has name + description + typed params.
389
- # tool_choice="required" forces the model to always call a tool (no free text).
390
- completion = client.chat.completions.create(
391
- model=MODEL_NAME,
392
- messages=[
393
- {"role": "system", "content": SYSTEM_PROMPT},
394
- {"role": "user", "content": user_prompt},
395
- ],
396
- tools=TOOLS,
397
- tool_choice="required",
398
- temperature=TEMPERATURE,
399
- max_tokens=MAX_TOKENS,
400
- stream=False,
401
- extra_headers=extra_headers if extra_headers else None,
402
- )
403
 
404
- choice = completion.choices[0] if completion.choices else None
405
- if choice is None:
406
- print(f"[DEBUG] Empty choices at step {step}", flush=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
407
  if step == 1:
408
  return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
409
- return {"tool": "done", "args": {"result": "Empty API response"}}
410
-
411
- # Native tool call response (preferred — gives us structured args directly)
412
- if choice.message.tool_calls:
413
- tc = choice.message.tool_calls[0]
414
- tool_name = tc.function.name
415
- try:
416
- args = json.loads(tc.function.arguments)
417
- except json.JSONDecodeError:
418
- args = {}
419
- print(f"[DEBUG] Tool call: {tool_name}({list(args.keys())})", flush=True)
420
- return {"tool": tool_name, "args": args}
421
 
422
- # Some providers return plain text even with tools (fallback)
423
- text = (choice.message.content or "").strip()
424
- print(f"[DEBUG] No tool_calls in response, trying text parse: {text[:100]}", flush=True)
425
- return _parse_text_fallback(text, step, task_desc, app_base_url)
426
-
427
- except Exception as exc:
428
- print(f"[DEBUG] LLM call failed at step {step}: {exc}", flush=True)
429
- if step == 1:
430
- return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
431
- return {"tool": "done", "args": {"result": f"LLM error: {exc}"}}
432
 
433
 
434
  def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str) -> dict:
@@ -451,9 +665,12 @@ def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str
451
  print(f"[DEBUG] Text fallback failed: {text[:200]}", flush=True)
452
  if step == 1:
453
  return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
454
- if "done" in text.lower():
 
 
455
  return {"tool": "done", "args": {}}
456
- return {"tool": "done", "args": {"result": f"Parse error: {text[:100]}"}}
 
457
 
458
 
459
  # ---------------------------------------------------------------------------
@@ -471,9 +688,20 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
471
  template_id = task_config["template_id"]
472
  task_description = task_config["description"]
473
  app_base_url = task_config["app_base_url"]
474
-
475
- # Pin the template via env var so reset() samples from the right pool
476
- os.environ["HARVGYM_TASK"] = task_name # use name, not int
 
 
 
 
 
 
 
 
 
 
 
477
 
478
  env = HARvestGymEnvironment()
479
 
@@ -489,11 +717,15 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
489
 
490
  try:
491
  obs = env.reset()
492
- # CRITICAL: use the env-sampled task description the judge grades exactly
493
- # what env.reset() returned (random category/product), not our hardcoded string.
494
  task_desc = obs.task or task_description
495
  base_url = obs.app_base_url or app_base_url
496
 
 
 
 
 
497
  for step in range(1, MAX_STEPS + 1):
498
  if getattr(obs, "done", False):
499
  break
@@ -523,6 +755,12 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
523
  last_result = obs.last_tool_result
524
  session_state = dict(obs.session_state or {})
525
 
 
 
 
 
 
 
526
  history.append({
527
  "step": step,
528
  "tool": tool,
@@ -534,6 +772,7 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
534
  reward = -0.1
535
  done = False
536
  error_str = str(exc)[:200]
 
537
 
538
  rewards.append(reward)
539
  steps_taken = step
@@ -546,19 +785,32 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
546
  # Reward range by design: terminal success = +2 to +5, terminal fail = -1.5
547
  # Use a generous baseline so partial credit shows up.
548
  total_reward = sum(rewards)
549
- # Normalise to [0,1]: shift by +1.5 (min), divide by max-possible per task
550
- # Template 1 max=2, Template 3 max=3.5, Template 6 max=5 → use 5.0 as ceiling
551
- score = max(0.0, min(1.0, (total_reward + 1.5) / (5.0 + 1.5)))
 
 
 
552
  success = total_reward >= 0.5 # any positive terminal reward = success
553
 
 
 
 
 
554
  except Exception as exc:
555
  error_str = str(exc)[:200]
556
  print(f"[DEBUG] Episode error: {error_str}", flush=True)
557
  finally:
 
 
 
 
558
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
559
 
560
  return {
561
  "task_name": task_name,
 
 
562
  "success": success,
563
  "steps": steps_taken,
564
  "score": score,
@@ -574,21 +826,38 @@ async def main() -> None:
574
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
575
 
576
  results = []
577
- for task_config in TASKS:
 
 
 
 
 
 
578
  result = await run_episode(task_config, client)
579
  results.append(result)
580
-
581
- # Summary
582
- print("\n[SUMMARY]", flush=True)
583
- for r in results:
584
- status = "PASS" if r["success"] else "FAIL"
585
  print(
586
- f" [{status}] {r['task_name']} — score={r['score']:.2f} steps={r['steps']}",
587
  flush=True,
588
  )
589
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
590
  overall_score = sum(r["score"] for r in results) / len(results) if results else 0.0
591
- print(f"\n overall_score={overall_score:.2f}", flush=True)
 
592
 
593
 
594
  if __name__ == "__main__":
 
29
  import asyncio
30
  import json
31
  import os
32
+ import re
33
  import sys
34
  import textwrap
35
+ from pathlib import Path
36
  from typing import Any, List, Optional
37
 
38
  from openai import OpenAI
39
 
40
+ # ---------------------------------------------------------------------------
41
+ # Verbose mode — set VERBOSE=1 for detailed per-step debugging.
42
+ # Keep disabled (default) for hackathon submission to avoid stdout noise.
43
+ # ---------------------------------------------------------------------------
44
+
45
+ VERBOSE = os.getenv("VERBOSE", "0").strip() == "1"
46
+
47
+
48
+ def vprint(*args) -> None:
49
+ """Print only when VERBOSE=1."""
50
+ if VERBOSE:
51
+ print(*args, flush=True)
52
+
53
+
54
+ def vdump(label: str, obj: Any, max_chars: int = 2000) -> None:
55
+ """Pretty-print a labelled object when verbose."""
56
+ if not VERBOSE:
57
+ return
58
+ try:
59
+ text = json.dumps(obj, indent=2)
60
+ except Exception:
61
+ text = str(obj)
62
+ if len(text) > max_chars:
63
+ text = text[:max_chars] + f"\n... [truncated {len(text)-max_chars} chars]"
64
+ print(f"\n{'─'*60}\n[VERBOSE] {label}\n{'─'*60}\n{text}\n", flush=True)
65
+
66
+
67
  # ---------------------------------------------------------------------------
68
  # Configuration — auto-detect provider from env vars
69
  # ---------------------------------------------------------------------------
70
 
71
+ HF_TOKEN = os.getenv("HF_TOKEN")
72
+ if not HF_TOKEN:
73
+ raise ValueError(
74
+ "HF_TOKEN is required but not set.\n"
75
+ "Usage: HF_TOKEN=hf_xxx uv run inference.py"
76
+ )
77
 
78
+ _OPENROUTER_KEY = os.getenv("OPENROUTER_API_KEY")
79
  if _OPENROUTER_KEY:
80
+ # OpenRouter mode — useful for local testing with alternative models
81
  API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
82
  API_KEY = _OPENROUTER_KEY
83
  MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31b-it")
 
84
  print(f"[INFO] Provider: OpenRouter | Model: {MODEL_NAME}", flush=True)
85
+ else:
86
  # HuggingFace Inference Router — final submission target
87
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
88
+ API_KEY = HF_TOKEN
89
+ MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31B-it")
 
90
  print(f"[INFO] Provider: HuggingFace | Model: {MODEL_NAME}", flush=True)
 
 
 
 
 
 
91
 
92
  # ---------------------------------------------------------------------------
93
  # Tool definitions — proper OpenAI function-calling format.
 
105
  "function": {
106
  "name": "browser_agent",
107
  "description": (
108
+ "Discovers API endpoints available on the target web application by "
109
+ "replaying real browser traffic recorded in HAR files. Returns a "
110
+ "structured index of observed endpoints with HTTP methods, paths, "
111
+ "request/response schemas, and headers (including any auth headers seen). "
112
+ "Call this ONCE at step 1 to build the endpoint index. Do not call again."
 
113
  ),
114
  "parameters": {
115
  "type": "object",
116
  "properties": {
117
  "task": {
118
  "type": "string",
119
+ "description": "The task you need to accomplish (used to prioritise relevant endpoints)",
120
  },
121
  "url": {
122
  "type": "string",
123
+ "description": "Base URL of the target application",
124
  },
125
  },
126
  "required": ["task", "url"],
 
134
  "function": {
135
  "name": "search_endpoints",
136
  "description": (
137
+ "Semantic search over the endpoints and it's details found by the browser_agent. "
138
+ "Returns matching endpoint schemas: HTTP method, full path, required parameters, "
139
+ "authentication requirements (bearer token, cookie, etc.), and example payloads. "
140
+ "Use this before every curl_exec call to confirm the correct endpoint shape. "
 
 
 
141
  ),
142
  "parameters": {
143
  "type": "object",
144
  "properties": {
145
  "query": {
146
  "type": "string",
147
+ "description": "Natural language description of the operation you need "
148
+ "(e.g. 'authenticate user', 'list products in category', "
149
+ "'add item to cart', 'place order')",
150
  },
151
  },
152
  "required": ["query"],
 
160
  "function": {
161
  "name": "curl_exec",
162
  "description": (
163
+ "Execute an HTTP request against the live application and return the response. "
164
+ "Response contains: status_code, headers, body. "
165
+ "For HTML pages, body is a structured summary: page title, forms with action URLs "
166
+ "and field values (product IDs, form_key, etc.), and visible text. "
167
+ "IMPORTANT: When the body shows '[Forms — N found]' with POST actions containing "
168
+ "'/checkout/cart/add/...', the 'product' field IS the product ID and the action "
169
+ "URL IS the add-to-cart endpointuse these directly without calling "
170
+ "search_episode_data again. "
171
+ "Session state (cookies, auth tokens) is automatically managed — previously "
172
+ "obtained tokens are injected into subsequent requests automatically. "
173
+ "If the response is truncated or you need a value from an earlier response, "
174
+ "use search_episode_data."
175
  ),
176
  "parameters": {
177
  "type": "object",
 
179
  "command": {
180
  "type": "string",
181
  "description": (
182
+ "Full curl command string (use -s for silent mode). "
183
+ "Include -H 'Content-Type: application/json' for POST/PUT/PATCH. "
184
+ "Example: curl -s -X POST 'http://host/api/endpoint' "
185
+ "-H 'Content-Type: application/json' -d '{\"key\":\"value\"}'"
 
 
186
  ),
187
  },
188
  },
 
197
  "function": {
198
  "name": "search_episode_data",
199
  "description": (
200
+ "Semantic search over all API responses collected during this episode. "
201
+ "Full response bodies are stored untruncated this tool finds the right "
202
+ "response and returns a compact preview with a note showing the total "
203
+ "number of matching objects (e.g. '47 items total showing first 3'). "
204
+ "Use more specific queries to drill into a particular value. "
205
+ "Examples: 'id for category Gear', 'SKU for Radiant Tee', "
206
+ "'cart id', 'authentication token', 'order id after checkout'."
207
  ),
208
  "parameters": {
209
  "type": "object",
210
  "properties": {
211
  "query": {
212
  "type": "string",
213
+ "description": "What you are looking for in the response history of the curl commands you executed "
214
+ "(e.g. 'category id for Pants', 'cart id', 'token')",
215
  },
216
  },
217
  "required": ["query"],
 
225
  "function": {
226
  "name": "done",
227
  "description": (
228
+ "Signal that the task is complete and trigger final scoring. "
229
+ "Call this immediately after the response that fulfills the task objective. "
230
+ "Do not make further API calls once the goal is met — call done() next."
 
231
  ),
232
  "parameters": {
233
  "type": "object",
234
  "properties": {
235
  "result": {
236
  "type": "string",
237
+ "description": "Brief summary of what was accomplished",
238
  },
239
  },
240
  "additionalProperties": False,
241
  },
242
+ "strict": False,
243
  },
244
  },
245
  ]
246
 
247
  BENCHMARK = "harvgym"
248
  MAX_STEPS = 20
249
+ TEMPERATURE = 0.2
250
+ MAX_TOKENS = 64000
251
  SUCCESS_SCORE_THRESHOLD = 0.5
252
 
253
+ # ---------------------------------------------------------------------------
254
+ # Task bank 5 easy (T1: list products), 5 medium (T3: add to cart),
255
+ # 5 hard (T6: guest checkout).
256
+ #
257
+ # For hackathon submission only the first easy/medium/hard is run.
258
+ # Full evaluation runs all 15 sequentially to measure generalisation.
259
+ # ---------------------------------------------------------------------------
260
+ _SHOP = "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/"
261
+
262
+ def _load_parameter_pools_for_tasks() -> dict:
263
+ pools_path = Path(__file__).with_name("parameter_pools.json")
264
+ with open(pools_path) as f:
265
+ return json.load(f)
266
+
267
+
268
+ _TASK_PARAMETER_POOLS = _load_parameter_pools_for_tasks()
269
+
270
+
271
+ def _lookup_category_params(category_name: str) -> dict:
272
+ categories = _TASK_PARAMETER_POOLS.get("template_1", {}).get("pool", {}).get("category_name", [])
273
+ for item in categories:
274
+ if item.get("name") == category_name:
275
+ return {
276
+ "category_name": item["name"],
277
+ "category_id": item.get("category_id"),
278
+ }
279
+ raise ValueError(f"Unknown category in parameter_pools.json: {category_name}")
280
+
281
+
282
+ def _lookup_product_params(product_name: str, template_id: int) -> dict:
283
+ products = _TASK_PARAMETER_POOLS.get(f"template_{template_id}", {}).get("pool", {}).get("product_name", [])
284
+ for item in products:
285
+ if item.get("name") == product_name:
286
+ return {
287
+ "product_name": item["name"],
288
+ "sku": item.get("sku", ""),
289
+ "product_id": item.get("product_id"),
290
+ }
291
+ raise ValueError(
292
+ f"Unknown product in parameter_pools.json for template {template_id}: {product_name}"
293
+ )
294
+
295
+
296
+ def _make_easy_task(task_name: str, category_name: str) -> dict:
297
+ return {
298
+ "task_name": task_name,
299
  "template_id": 1,
 
 
300
  "difficulty": "easy",
301
+ "description": f"List products in the '{category_name}' category",
302
+ "app_base_url": _SHOP,
303
+ "task_params": _lookup_category_params(category_name),
304
+ }
305
+
306
+
307
+ def _make_product_task(task_name: str, template_id: int, difficulty: str,
308
+ description: str, product_name: str) -> dict:
309
+ return {
310
+ "task_name": task_name,
311
+ "template_id": template_id,
312
+ "difficulty": difficulty,
313
+ "description": description,
314
+ "app_base_url": _SHOP,
315
+ "task_params": _lookup_product_params(product_name, template_id),
316
+ }
317
+
318
+
319
+ TASKS_EASY = [
320
+ _make_easy_task("easy_list_pants", "Pants"),
321
+ _make_easy_task("easy_list_bags", "Bags"),
322
+ _make_easy_task("easy_list_jackets", "Jackets"),
323
+ _make_easy_task("easy_list_hoodies", "Hoodies"),
324
+ _make_easy_task("easy_list_shoes", "Shoes"),
325
  ]
326
 
327
+ TASKS_MEDIUM = [
328
+ _make_product_task(
329
+ "medium_cart_camera_backpack",
330
+ 3,
331
+ "medium",
332
+ "Add 'Camera Backpack Bagsmar DSLR Waterproof' to a guest cart",
333
+ "Camera Backpack Bagsmar DSLR Waterproof",
334
+ ),
335
+ _make_product_task(
336
+ "medium_cart_flannel_jacket",
337
+ 3,
338
+ "medium",
339
+ "Add 'Noldares Flannel Jacket For Men Plaid' to a guest cart",
340
+ "Noldares Flannel Jacket For Men Plaid",
341
+ ),
342
+ _make_product_task(
343
+ "medium_cart_champion_hoodie",
344
+ 3,
345
+ "medium",
346
+ "Add 'Champion Hoodie Big And Tall Zip Up' to a guest cart",
347
+ "Champion Hoodie Big And Tall Zip Up",
348
+ ),
349
+ _make_product_task(
350
+ "medium_cart_cargo_pants",
351
+ 3,
352
+ "medium",
353
+ "Add 'Mens Slim Fit Cargo Pants Athletic' to a guest cart",
354
+ "Mens Slim Fit Cargo Pants Athletic",
355
+ ),
356
+ _make_product_task(
357
+ "medium_cart_leather_jacket",
358
+ 3,
359
+ "medium",
360
+ "Add 'Inesver Womens Leather Jacket Open Front' to a guest cart",
361
+ "Inesver Womens Leather Jacket Open Front",
362
+ ),
363
+ ]
364
+
365
+ TASKS_HARD = [
366
+ _make_product_task(
367
+ "hard_checkout_ripstop_pants",
368
+ 6,
369
+ "hard",
370
+ "Complete a full guest checkout for 'Mens Ripstop Cargo Pants Tactical Hiking'",
371
+ "Mens Ripstop Cargo Pants Tactical Hiking",
372
+ ),
373
+ _make_product_task(
374
+ "hard_checkout_flannel_jacket",
375
+ 6,
376
+ "hard",
377
+ "Complete a full guest checkout for 'Noldares Flannel Jacket For Men Plaid'",
378
+ "Noldares Flannel Jacket For Men Plaid",
379
+ ),
380
+ _make_product_task(
381
+ "hard_checkout_champion_hoodie",
382
+ 6,
383
+ "hard",
384
+ "Complete a full guest checkout for 'Champion Hoodie Big And Tall Zip Up'",
385
+ "Champion Hoodie Big And Tall Zip Up",
386
+ ),
387
+ _make_product_task(
388
+ "hard_checkout_fleece_jacket",
389
+ 6,
390
+ "hard",
391
+ "Complete a full guest checkout for 'Womens Fleece Jacket With Hood Winter'",
392
+ "Womens Fleece Jacket With Hood Winter",
393
+ ),
394
+ _make_product_task(
395
+ "hard_checkout_totes_boots",
396
+ 6,
397
+ "hard",
398
+ "Complete a full guest checkout for 'Totes Womens Cold Weather Boots Nicole'",
399
+ "Totes Womens Cold Weather Boots Nicole",
400
+ ),
401
+ ]
402
+
403
+ # Default: first of each tier (hackathon submission format)
404
+ TASKS = [ TASKS_EASY[0], TASKS_MEDIUM[0], TASKS_MEDIUM[1], TASKS_HARD[0]]
405
+
406
+ # Set EVAL_MODE=full to run all 1By default, we have three tasks.5; EVAL_MODE=easy/medium/hard to run only that tier
407
+ _EVAL_MODE = os.getenv("EVAL_MODE", "").strip().lower()
408
+ if _EVAL_MODE == "full":
409
+ TASKS = TASKS_EASY + TASKS_MEDIUM + TASKS_HARD
410
+ elif _EVAL_MODE == "easy":
411
+ TASKS = TASKS_EASY
412
+ elif _EVAL_MODE == "one":
413
+ TASKS = [TASKS_MEDIUM[1]]
414
+ elif _EVAL_MODE == "medium":
415
+ TASKS = TASKS_MEDIUM
416
+ elif _EVAL_MODE == "hard":
417
+ TASKS = TASKS_HARD
418
+
419
  # ---------------------------------------------------------------------------
420
  # Logging helpers (hackathon format)
421
  # ---------------------------------------------------------------------------
 
447
  # ---------------------------------------------------------------------------
448
 
449
  SYSTEM_PROMPT = textwrap.dedent("""
450
+ You are an API agent. Your goal is to complete a real-world task on a live web application
451
+ by calling its HTTP APIs in the correct order using the tools provided.
452
 
453
  WORKFLOW:
454
+ 1. Call browser_agent once at step 1 to build an index of the application's endpoints.
455
+ 2. Use search_endpoints before each API call to find the correct path, method, and required parameters.
456
+ 3. Execute HTTP requests with curl_exec in the correct dependency order. Read every response
457
+ carefully IDs, tokens, and error messages in responses are required inputs for (or
458
+ corrective signals for) subsequent calls.
459
+ 4. If a prior response contains a value you need now, use search_episode_data to retrieve it.
460
+ 5. Call done() as soon as the task objective is met.
461
+
462
+ PRINCIPLES:
463
+ - Always discover before you act: browser_agent first, then search_endpoints.
464
+ - Extract every ID, token, and key from API responses and use them in subsequent calls.
465
+ - If a request returns an auth error, find and call the auth endpoint first, then retry.
466
+ - Never fabricate IDs or values — they must come from actual API responses.
467
+ - Once the task is done, call done() immediately — do not make additional calls.
468
+ - Some tasks require a sequence of dependent API calls where the output of one call
469
+ (an ID, token, or key) is the required input to the next. Identify these dependencies
470
+ before acting: plan the call sequence, then execute step by step.
471
+ - Never call the same endpoint repeatedly hoping for a different result. If a call already
472
+ succeeded, move on to the next step. Repeating the same call wastes steps and incurs a
473
+ penalty.
474
+ - Do not brute-force or vary parameters at random. If a call fails, read the error message
475
+ in LAST TOOL RESULT, diagnose the cause logically, and use that understanding to form the
476
+ correct next request.
477
+ - If you are partway through a multi-step task and a required ID or token is missing, use
478
+ search_episode_data to retrieve it from an earlier response before making a new call.
479
  """).strip()
480
 
481
 
 
510
  """Build the user prompt for each step."""
511
  history_lines = []
512
  if history:
513
+ for h in history:
 
514
  result = h.get("result", {})
 
515
  if isinstance(result, dict) and "status_code" in result:
516
+ body_preview = str(result.get("body", ""))[:800]
517
  result_summary = f'status={result["status_code"]} body={body_preview}'
518
  else:
519
  result_summary = str(result)[:300]
 
525
  session_str = json.dumps(session_state, indent=2)[:500] if session_state else "{}"
526
  last_result_str = _format_result_for_context(last_result)
527
 
528
+ # Highlight form_key if available — it's needed for HTML form POSTs
529
+ form_key_hint = ""
530
+ if session_state.get("form_key"):
531
+ form_key_hint = f"\nFORM_KEY (auto-extracted, use in POST body): {session_state['form_key']}"
532
+
533
  return textwrap.dedent(f"""
534
  TASK: {task_desc}
535
  APP URL: {app_base_url}
536
  STEP: {step}/{MAX_STEPS}
537
 
538
+ SESSION STATE (cookies/tokens auto-managed):{form_key_hint}
539
  {session_str}
540
 
541
  LAST TOOL RESULT:
542
  {last_result_str}
543
 
544
+ HISTORY (all {len(history_lines)} steps so far):
545
  {chr(10).join(history_lines) if history_lines else " (none yet)"}
546
 
547
  What is your next tool call? Output ONLY the JSON object.
 
563
  "X-Title": "HARvestGym",
564
  }
565
 
566
+ vprint(f"\n{'═'*60}")
567
+ vprint(f"[VERBOSE] === LLM CALLstep {step} ===")
568
+ vdump("SYSTEM PROMPT", SYSTEM_PROMPT)
569
+ vdump("USER PROMPT", user_prompt)
 
 
 
 
 
 
 
 
 
 
 
 
570
 
571
+ # Retry loop backs off on 429 rate limits, never calls done() on a transient error
572
+ _MAX_RETRIES = 3
573
+ _BASE_DELAY = 3 # seconds before first retry
574
+ for _attempt in range(_MAX_RETRIES):
575
+ try:
576
+ completion = client.chat.completions.create(
577
+ model=MODEL_NAME,
578
+ messages=[
579
+ {"role": "system", "content": SYSTEM_PROMPT},
580
+ {"role": "user", "content": user_prompt},
581
+ ],
582
+ tools=TOOLS,
583
+ tool_choice="required",
584
+ temperature=TEMPERATURE,
585
+ max_tokens=MAX_TOKENS,
586
+ stream=False,
587
+ extra_headers=extra_headers if extra_headers else None,
588
+ )
589
+
590
+ choice = completion.choices[0] if completion.choices else None
591
+
592
+ vdump(f"RAW COMPLETION (step {step}, attempt {_attempt+1})", {
593
+ "finish_reason": choice.finish_reason if choice else None,
594
+ "usage": dict(completion.usage) if hasattr(completion, "usage") and completion.usage else None,
595
+ "message_content": choice.message.content if choice else None,
596
+ "tool_calls_count": len(choice.message.tool_calls or []) if choice else 0,
597
+ })
598
+
599
+ # Detect null/empty completion (upstream rate limit without a 429 status)
600
+ if choice is None or (
601
+ choice.finish_reason is None
602
+ and not (choice.message.tool_calls or (choice.message.content or "").strip())
603
+ ):
604
+ wait = _BASE_DELAY * (2 ** _attempt)
605
+ print(f"[DEBUG] Null completion at step {step} (attempt {_attempt+1}/{_MAX_RETRIES}) — waiting {wait}s", flush=True)
606
+ import time; time.sleep(wait)
607
+ continue # retry
608
+
609
+ # Native tool call (preferred)
610
+ if choice.message.tool_calls:
611
+ tc = choice.message.tool_calls[0]
612
+ tool_name = tc.function.name
613
+ try:
614
+ args = json.loads(tc.function.arguments)
615
+ except json.JSONDecodeError:
616
+ args = {}
617
+ print(f"[DEBUG] Tool call: {tool_name}({list(args.keys())})", flush=True)
618
+ vdump(f"TOOL CALL ARGS — {tool_name}", args)
619
+ return {"tool": tool_name, "args": args}
620
+
621
+ # Plain-text fallback (some providers ignore tool_choice="required")
622
+ text = (choice.message.content or "").strip()
623
+ print(f"[DEBUG] No tool_calls in response, trying text parse: {text[:100]}", flush=True)
624
+ vprint(f"[VERBOSE] Full text response: {text}")
625
+ return _parse_text_fallback(text, step, task_desc, app_base_url)
626
+
627
+ except Exception as exc:
628
+ exc_str = str(exc)
629
+ is_rate_limit = "429" in exc_str or "rate" in exc_str.lower() or "Rate" in exc_str
630
+ if is_rate_limit and _attempt < _MAX_RETRIES - 1:
631
+ wait = _BASE_DELAY * (2 ** _attempt)
632
+ print(f"[DEBUG] Rate-limited at step {step} (attempt {_attempt+1}/{_MAX_RETRIES}) — waiting {wait}s then retrying", flush=True)
633
+ import time; time.sleep(wait)
634
+ continue # retry
635
+ # Non-rate-limit error or exhausted retries — don't call done(), keep episode alive
636
+ print(f"[DEBUG] LLM call failed at step {step} (attempt {_attempt+1}): {exc}", flush=True)
637
  if step == 1:
638
  return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
639
+ return {"tool": "search_endpoints", "args": {"query": "available API endpoints"}}
 
 
 
 
 
 
 
 
 
 
 
640
 
641
+ # Exhausted all retries nudge forward without ending the episode
642
+ print(f"[DEBUG] All {_MAX_RETRIES} retries exhausted at step {step} — nudging with search_endpoints", flush=True)
643
+ if step == 1:
644
+ return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
645
+ return {"tool": "search_endpoints", "args": {"query": "available API endpoints"}}
 
 
 
 
 
646
 
647
 
648
  def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str) -> dict:
 
665
  print(f"[DEBUG] Text fallback failed: {text[:200]}", flush=True)
666
  if step == 1:
667
  return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
668
+ # If the model explicitly says done, honour it — but only if text clearly indicates it.
669
+ # A bare parse error should NEVER call done() because that would trigger the judge early.
670
+ if re.search(r"\bdone\b", text.lower()) and len(text.strip()) < 80:
671
  return {"tool": "done", "args": {}}
672
+ # Keep episode alive nudge the model rather than punishing with a premature judge call.
673
+ return {"tool": "search_endpoints", "args": {"query": "available REST API endpoints"}}
674
 
675
 
676
  # ---------------------------------------------------------------------------
 
688
  template_id = task_config["template_id"]
689
  task_description = task_config["description"]
690
  app_base_url = task_config["app_base_url"]
691
+ task_params = dict(task_config.get("task_params") or {})
692
+
693
+ # Pin the exact task so env.reset() uses the intended category/product instead
694
+ # of sampling a random item from the template pool.
695
+ os.environ["HARVGYM_TASK"] = str(template_id)
696
+ os.environ["HARVGYM_TASK_SPEC_JSON"] = json.dumps(
697
+ {
698
+ "template_id": template_id,
699
+ "description": task_description,
700
+ "params": task_params,
701
+ "base_url": app_base_url,
702
+ "difficulty": task_config.get("difficulty", ""),
703
+ }
704
+ )
705
 
706
  env = HARvestGymEnvironment()
707
 
 
717
 
718
  try:
719
  obs = env.reset()
720
+ # Use the env-provided task description, which now matches the exact task spec
721
+ # passed in above.
722
  task_desc = obs.task or task_description
723
  base_url = obs.app_base_url or app_base_url
724
 
725
+ vprint(f"\n{'═'*60}")
726
+ vprint(f"[VERBOSE] EPISODE START — {task_name}")
727
+ vdump("INITIAL OBSERVATION (from env.reset)", obs.__dict__ if hasattr(obs, "__dict__") else str(obs))
728
+
729
  for step in range(1, MAX_STEPS + 1):
730
  if getattr(obs, "done", False):
731
  break
 
755
  last_result = obs.last_tool_result
756
  session_state = dict(obs.session_state or {})
757
 
758
+ vprint(f"\n[VERBOSE] ── step {step} result ──")
759
+ vdump(f"TOOL RESULT — {tool}", last_result)
760
+ vprint(f"[VERBOSE] reward={reward:.3f} done={done}")
761
+ if done:
762
+ vdump("FINAL OBS (done=True)", obs.__dict__ if hasattr(obs, "__dict__") else str(obs))
763
+
764
  history.append({
765
  "step": step,
766
  "tool": tool,
 
772
  reward = -0.1
773
  done = False
774
  error_str = str(exc)[:200]
775
+ vprint(f"[VERBOSE] Step {step} EXCEPTION: {exc}")
776
 
777
  rewards.append(reward)
778
  steps_taken = step
 
785
  # Reward range by design: terminal success = +2 to +5, terminal fail = -1.5
786
  # Use a generous baseline so partial credit shows up.
787
  total_reward = sum(rewards)
788
+ # Score: normalize to [0, 1] using per-template terminal-reward ceiling.
789
+ # Template 1 (easy) max=2.0, Template 3 (medium) max=3.5, Template 6 (hard) max=5.0.
790
+ # Shift by +1.5 so that the fail reward (-1.5) maps to 0 and max maps to 1.
791
+ _TEMPLATE_REWARD_CEIL = {1: 2.0, 3: 3.5, 6: 5.0}
792
+ _reward_ceil = _TEMPLATE_REWARD_CEIL.get(task_config.get("template_id"), 5.0)
793
+ score = max(0.0, min(1.0, (total_reward + 1.5) / (_reward_ceil + 1.5)))
794
  success = total_reward >= 0.5 # any positive terminal reward = success
795
 
796
+ vprint(f"\n[VERBOSE] ── episode end — {task_name} ──")
797
+ vprint(f"[VERBOSE] total_reward={total_reward:.3f} score={score:.3f} success={success}")
798
+ vprint(f"[VERBOSE] rewards per step: {[f'{r:.2f}' for r in rewards]}")
799
+
800
  except Exception as exc:
801
  error_str = str(exc)[:200]
802
  print(f"[DEBUG] Episode error: {error_str}", flush=True)
803
  finally:
804
+ try:
805
+ env.close()
806
+ except Exception as e:
807
+ print(f"[DEBUG] env.close() error: {e}", flush=True)
808
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
809
 
810
  return {
811
  "task_name": task_name,
812
+ "difficulty": task_config.get("difficulty", "unknown"),
813
+ "description": task_config.get("description", ""),
814
  "success": success,
815
  "steps": steps_taken,
816
  "score": score,
 
826
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
827
 
828
  results = []
829
+ for i, task_config in enumerate(TASKS, 1):
830
+ difficulty = task_config.get("difficulty", "")
831
+ desc = task_config.get("description", "")
832
+ print(
833
+ f"\n{'='*70}\n[TASK {i}/{len(TASKS)}] ({difficulty.upper()}) {desc}\n{'='*70}",
834
+ flush=True,
835
+ )
836
  result = await run_episode(task_config, client)
837
  results.append(result)
838
+ status = "PASS" if result["success"] else "FAIL"
 
 
 
 
839
  print(
840
+ f" [{status}] score={result['score']:.2f} steps={result['steps']}",
841
  flush=True,
842
  )
843
 
844
+ # Summary grouped by difficulty tier
845
+ print("\n" + "="*70, flush=True)
846
+ print("[SUMMARY]", flush=True)
847
+ for tier in ["easy", "medium", "hard"]:
848
+ tier_results = [r for r in results if r.get("difficulty") == tier]
849
+ if not tier_results:
850
+ continue
851
+ avg = sum(r["score"] for r in tier_results) / len(tier_results)
852
+ passes = sum(1 for r in tier_results if r["success"])
853
+ print(f"\n {tier.upper()} ({passes}/{len(tier_results)} passed, avg score={avg:.2f}):", flush=True)
854
+ for r in tier_results:
855
+ status = "PASS" if r["success"] else "FAIL"
856
+ print(f" [{status}] {r['task_name']} — score={r['score']:.2f} steps={r['steps']}", flush=True)
857
+
858
  overall_score = sum(r["score"] for r in results) / len(results) if results else 0.0
859
+ print(f"\n OVERALL score={overall_score:.2f} ({sum(1 for r in results if r['success'])}/{len(results)} passed)",
860
+ flush=True)
861
 
862
 
863
  if __name__ == "__main__":
openenv_harvestgym.egg-info/PKG-INFO CHANGED
@@ -11,6 +11,8 @@ Requires-Dist: requests>=2.31.0
11
  Requires-Dist: rank-bm25>=0.2.2
12
  Requires-Dist: openai>=1.0.0
13
  Requires-Dist: numpy>=1.24.0
 
 
14
  Provides-Extra: dev
15
  Requires-Dist: pytest>=8.0.0; extra == "dev"
16
  Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
 
11
  Requires-Dist: rank-bm25>=0.2.2
12
  Requires-Dist: openai>=1.0.0
13
  Requires-Dist: numpy>=1.24.0
14
+ Requires-Dist: beautifulsoup4>=4.14.3
15
+ Requires-Dist: lxml>=6.0.2
16
  Provides-Extra: dev
17
  Requires-Dist: pytest>=8.0.0; extra == "dev"
18
  Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_harvestgym.egg-info/SOURCES.txt CHANGED
@@ -14,6 +14,7 @@ server/models.py
14
  server/tools/__init__.py
15
  server/tools/browser_agent.py
16
  server/tools/curl_exec.py
 
17
  server/tools/search_endpoints.py
18
  server/tools/search_episode_data.py
19
  tests/test_e2e_episode.py
 
14
  server/tools/__init__.py
15
  server/tools/browser_agent.py
16
  server/tools/curl_exec.py
17
+ server/tools/embed_cache.py
18
  server/tools/search_endpoints.py
19
  server/tools/search_episode_data.py
20
  tests/test_e2e_episode.py
openenv_harvestgym.egg-info/requires.txt CHANGED
@@ -6,6 +6,8 @@ requests>=2.31.0
6
  rank-bm25>=0.2.2
7
  openai>=1.0.0
8
  numpy>=1.24.0
 
 
9
 
10
  [dev]
11
  pytest>=8.0.0
 
6
  rank-bm25>=0.2.2
7
  openai>=1.0.0
8
  numpy>=1.24.0
9
+ beautifulsoup4>=4.14.3
10
+ lxml>=6.0.2
11
 
12
  [dev]
13
  pytest>=8.0.0
parameter_pools.json CHANGED
@@ -4,7 +4,7 @@
4
  "generated_at": "2026-04-08",
5
  "source": {
6
  "categories": "GET /rest/V1/categories/list (live EC2, port 7780)",
7
- "products": "GET /rest/V1/products type_id=simple + configurable (live EC2, port 7780)",
8
  "forums": "HTML scrape of /forums page (live EC2, port 9999) + HTTP 200 verification per slug",
9
  "wikipedia": "Well-known Wikipedia titles \u2014 verified by grader at runtime via HEAD /wikipedia_en.../A/{slug}",
10
  "admin_skus": "Generated (HAR-TEST-NNN namespace, no collision with existing catalog)",
@@ -13,10 +13,10 @@
13
  "grader_matching_notes": {
14
  "template_1": "category_id stored for grader; category_name is what appears in task string",
15
  "template_2": "expected_slug stored for grader (verifies HTTP 200); display title is in task string",
16
- "template_3": "sku stored for grader (verifies cart item); product name is in task string",
17
  "template_4": "forum_name must exist and return posts; no exact value matching needed",
18
  "template_5": "title is free-form generated; grader only checks post was created in that forum",
19
- "template_6": "sku stored for grader (verifies order was placed); product name is in task string",
20
  "template_7": "sku+price are exact \u2014 grader calls GET /rest/V1/products/{sku} to verify creation"
21
  }
22
  },
@@ -29,149 +29,37 @@
29
  ],
30
  "pool": {
31
  "category_name": [
32
- {
33
- "name": "Gear",
34
- "category_id": 3
35
- },
36
  {
37
  "name": "Bags",
38
  "category_id": 4
39
  },
40
  {
41
- "name": "Fitness Equipment",
42
- "category_id": 5
43
- },
44
- {
45
- "name": "Watches",
46
- "category_id": 6
47
- },
48
- {
49
- "name": "New Luma Yoga Collection",
50
- "category_id": 8
51
- },
52
- {
53
- "name": "Training",
54
- "category_id": 9
55
- },
56
- {
57
- "name": "Video Download",
58
- "category_id": 10
59
- },
60
- {
61
- "name": "Men",
62
- "category_id": 11
63
- },
64
- {
65
- "name": "Tops",
66
- "category_id": 12
67
- },
68
- {
69
- "name": "Bottoms",
70
- "category_id": 13
71
- },
72
- {
73
- "name": "Jackets",
74
- "category_id": 14
75
- },
76
- {
77
- "name": "Hoodies & Sweatshirts",
78
- "category_id": 15
79
- },
80
- {
81
- "name": "Tees",
82
- "category_id": 16
83
- },
84
- {
85
- "name": "Tanks",
86
- "category_id": 17
87
- },
88
- {
89
- "name": "Pants",
90
- "category_id": 18
91
- },
92
- {
93
- "name": "Shorts",
94
- "category_id": 19
95
- },
96
- {
97
- "name": "Women",
98
- "category_id": 20
99
- },
100
- {
101
- "name": "Tops",
102
- "category_id": 21
103
- },
104
- {
105
- "name": "Bottoms",
106
- "category_id": 22
107
  },
108
  {
109
  "name": "Jackets",
110
- "category_id": 23
111
- },
112
- {
113
- "name": "Hoodies & Sweatshirts",
114
- "category_id": 24
115
- },
116
- {
117
- "name": "Tees",
118
- "category_id": 25
119
- },
120
- {
121
- "name": "Bras & Tanks",
122
- "category_id": 26
123
- },
124
- {
125
- "name": "Pants",
126
- "category_id": 27
127
- },
128
- {
129
- "name": "Shorts",
130
- "category_id": 28
131
- },
132
- {
133
- "name": "Women Sale",
134
- "category_id": 30
135
  },
136
  {
137
- "name": "Men Sale",
138
- "category_id": 31
139
  },
140
  {
141
  "name": "Pants",
142
- "category_id": 32
143
- },
144
- {
145
- "name": "Tees",
146
- "category_id": 33
147
- },
148
- {
149
- "name": "Erin Recommends",
150
- "category_id": 34
151
- },
152
- {
153
- "name": "Performance Fabrics",
154
- "category_id": 35
155
- },
156
- {
157
- "name": "Eco Friendly",
158
- "category_id": 36
159
- },
160
- {
161
- "name": "Sale",
162
- "category_id": 37
163
  },
164
  {
165
- "name": "What's New",
166
- "category_id": 38
167
  },
168
  {
169
- "name": "Performance Sportswear New",
170
- "category_id": 39
171
  },
172
  {
173
- "name": "Eco Collection New",
174
- "category_id": 40
175
  }
176
  ]
177
  }
@@ -298,236 +186,94 @@
298
  "pool": {
299
  "product_name": [
300
  {
301
- "name": "Joust Duffle Bag",
302
- "sku": "24-MB01"
303
- },
304
- {
305
- "name": "Strive Shoulder Pack",
306
- "sku": "24-MB04"
307
- },
308
- {
309
- "name": "Crown Summit Backpack",
310
- "sku": "24-MB03"
311
- },
312
- {
313
- "name": "Wayfarer Messenger Bag",
314
- "sku": "24-MB05"
315
- },
316
- {
317
- "name": "Rival Field Messenger",
318
- "sku": "24-MB06"
319
- },
320
- {
321
- "name": "Fusion Backpack",
322
- "sku": "24-MB02"
323
- },
324
- {
325
- "name": "Impulse Duffle",
326
- "sku": "24-UB02"
327
- },
328
- {
329
- "name": "Voyage Yoga Bag",
330
- "sku": "24-WB01"
331
- },
332
- {
333
- "name": "Compete Track Tote",
334
- "sku": "24-WB02"
335
- },
336
- {
337
- "name": "Savvy Shoulder Tote",
338
- "sku": "24-WB05"
339
- },
340
- {
341
- "name": "Endeavor Daytrip Backpack",
342
- "sku": "24-WB06"
343
- },
344
- {
345
- "name": "Driven Backpack",
346
- "sku": "24-WB03"
347
- },
348
- {
349
- "name": "Overnight Duffle",
350
- "sku": "24-WB07"
351
- },
352
- {
353
- "name": "Push It Messenger Bag",
354
- "sku": "24-WB04"
355
- },
356
- {
357
- "name": "Affirm Water Bottle",
358
- "sku": "24-UG06"
359
- },
360
- {
361
- "name": "Dual Handle Cardio Ball",
362
- "sku": "24-UG07"
363
- },
364
- {
365
- "name": "Zing Jump Rope",
366
- "sku": "24-UG04"
367
- },
368
- {
369
- "name": "Pursuit Lumaflex&trade; Tone Band",
370
- "sku": "24-UG02"
371
- },
372
- {
373
- "name": "Go-Get'r Pushup Grips",
374
- "sku": "24-UG05"
375
- },
376
- {
377
- "name": "Quest Lumaflex&trade; Band",
378
- "sku": "24-UG01"
379
- },
380
- {
381
- "name": "Sprite Foam Yoga Brick",
382
- "sku": "24-WG084"
383
- },
384
- {
385
- "name": "Sprite Foam Roller",
386
- "sku": "24-WG088"
387
- },
388
- {
389
- "name": "Harmony Lumaflex&trade; Strength Band Kit",
390
- "sku": "24-UG03"
391
- },
392
- {
393
- "name": "Sprite Stasis Ball 55 cm",
394
- "sku": "24-WG081-gray"
395
- },
396
- {
397
- "name": "Sprite Stasis Ball 65 cm",
398
- "sku": "24-WG082-gray"
399
- },
400
- {
401
- "name": "Sprite Stasis Ball 75 cm",
402
- "sku": "24-WG083-gray"
403
- },
404
- {
405
- "name": "Sprite Yoga Strap 6 foot",
406
- "sku": "24-WG085"
407
- },
408
- {
409
- "name": "Sprite Yoga Strap 8 foot",
410
- "sku": "24-WG086"
411
- },
412
- {
413
- "name": "Sprite Yoga Strap 10 foot",
414
- "sku": "24-WG087"
415
- },
416
- {
417
- "name": "Aim Analog Watch",
418
- "sku": "24-MG04"
419
- },
420
- {
421
- "name": "Endurance Watch",
422
- "sku": "24-MG01"
423
- },
424
- {
425
- "name": "Summit Watch",
426
- "sku": "24-MG03"
427
- },
428
- {
429
- "name": "Cruise Dual Analog Watch",
430
- "sku": "24-MG05"
431
- },
432
- {
433
- "name": "Dash Digital Watch",
434
- "sku": "24-MG02"
435
- },
436
- {
437
- "name": "Luma Analog Watch",
438
- "sku": "24-WG09"
439
- },
440
- {
441
- "name": "Bolo Sport Watch",
442
- "sku": "24-WG01"
443
- },
444
- {
445
- "name": "Clamber Watch",
446
- "sku": "24-WG03"
447
- },
448
- {
449
- "name": "Didi Sport Watch",
450
- "sku": "24-WG02"
451
  },
452
  {
453
- "name": "Stellar Solar Jacket",
454
- "sku": "WJ01"
 
455
  },
456
  {
457
- "name": "Josie Yoga Jacket",
458
- "sku": "WJ02"
 
459
  },
460
  {
461
- "name": "Augusta Pullover Jacket",
462
- "sku": "WJ03"
 
463
  },
464
  {
465
- "name": "Ingrid Running Jacket",
466
- "sku": "WJ04"
 
467
  },
468
  {
469
- "name": "Riona Full Zip Jacket",
470
- "sku": "WJ05"
 
471
  },
472
  {
473
- "name": "Juno Jacket",
474
- "sku": "WJ06"
 
475
  },
476
  {
477
- "name": "Inez Full Zip Jacket",
478
- "sku": "WJ07"
 
479
  },
480
  {
481
- "name": "Adrienne Trek Jacket",
482
- "sku": "WJ08"
 
483
  },
484
  {
485
- "name": "Jade Yoga Jacket",
486
- "sku": "WJ09"
 
487
  },
488
  {
489
- "name": "Nadia Elements Shell",
490
- "sku": "WJ10"
 
491
  },
492
  {
493
- "name": "Neve Studio Dance Jacket",
494
- "sku": "WJ11"
 
495
  },
496
  {
497
- "name": "Olivia 1/4 Zip Light Jacket",
498
- "sku": "WJ12"
 
499
  },
500
  {
501
- "name": "Chaz Kangeroo Hoodie",
502
- "sku": "MH01"
 
503
  },
504
  {
505
- "name": "Teton Pullover Hoodie",
506
- "sku": "MH02"
 
507
  },
508
  {
509
- "name": "Bruno Compete Hoodie",
510
- "sku": "MH03"
 
511
  },
512
  {
513
- "name": "Frankie Sweatshirt",
514
- "sku": "MH04"
 
515
  },
516
  {
517
- "name": "Hollister Backyard Sweatshirt",
518
- "sku": "MH05"
519
- },
520
- {
521
- "name": "Stark Fundamental Hoodie",
522
- "sku": "MH06"
523
- },
524
- {
525
- "name": "Hero Hoodie",
526
- "sku": "MH07"
527
- },
528
- {
529
- "name": "Oslo Trek Hoodie",
530
- "sku": "MH08"
531
  }
532
  ]
533
  }
@@ -739,236 +485,94 @@
739
  "pool": {
740
  "product_name": [
741
  {
742
- "name": "Joust Duffle Bag",
743
- "sku": "24-MB01"
744
- },
745
- {
746
- "name": "Strive Shoulder Pack",
747
- "sku": "24-MB04"
748
- },
749
- {
750
- "name": "Crown Summit Backpack",
751
- "sku": "24-MB03"
752
- },
753
- {
754
- "name": "Wayfarer Messenger Bag",
755
- "sku": "24-MB05"
756
- },
757
- {
758
- "name": "Rival Field Messenger",
759
- "sku": "24-MB06"
760
- },
761
- {
762
- "name": "Fusion Backpack",
763
- "sku": "24-MB02"
764
- },
765
- {
766
- "name": "Impulse Duffle",
767
- "sku": "24-UB02"
768
- },
769
- {
770
- "name": "Voyage Yoga Bag",
771
- "sku": "24-WB01"
772
- },
773
- {
774
- "name": "Compete Track Tote",
775
- "sku": "24-WB02"
776
- },
777
- {
778
- "name": "Savvy Shoulder Tote",
779
- "sku": "24-WB05"
780
- },
781
- {
782
- "name": "Endeavor Daytrip Backpack",
783
- "sku": "24-WB06"
784
- },
785
- {
786
- "name": "Driven Backpack",
787
- "sku": "24-WB03"
788
- },
789
- {
790
- "name": "Overnight Duffle",
791
- "sku": "24-WB07"
792
- },
793
- {
794
- "name": "Push It Messenger Bag",
795
- "sku": "24-WB04"
796
- },
797
- {
798
- "name": "Affirm Water Bottle",
799
- "sku": "24-UG06"
800
- },
801
- {
802
- "name": "Dual Handle Cardio Ball",
803
- "sku": "24-UG07"
804
- },
805
- {
806
- "name": "Zing Jump Rope",
807
- "sku": "24-UG04"
808
- },
809
- {
810
- "name": "Pursuit Lumaflex&trade; Tone Band",
811
- "sku": "24-UG02"
812
- },
813
- {
814
- "name": "Go-Get'r Pushup Grips",
815
- "sku": "24-UG05"
816
- },
817
- {
818
- "name": "Quest Lumaflex&trade; Band",
819
- "sku": "24-UG01"
820
- },
821
- {
822
- "name": "Sprite Foam Yoga Brick",
823
- "sku": "24-WG084"
824
- },
825
- {
826
- "name": "Sprite Foam Roller",
827
- "sku": "24-WG088"
828
- },
829
- {
830
- "name": "Harmony Lumaflex&trade; Strength Band Kit",
831
- "sku": "24-UG03"
832
- },
833
- {
834
- "name": "Sprite Stasis Ball 55 cm",
835
- "sku": "24-WG081-gray"
836
- },
837
- {
838
- "name": "Sprite Stasis Ball 65 cm",
839
- "sku": "24-WG082-gray"
840
- },
841
- {
842
- "name": "Sprite Stasis Ball 75 cm",
843
- "sku": "24-WG083-gray"
844
- },
845
- {
846
- "name": "Sprite Yoga Strap 6 foot",
847
- "sku": "24-WG085"
848
- },
849
- {
850
- "name": "Sprite Yoga Strap 8 foot",
851
- "sku": "24-WG086"
852
- },
853
- {
854
- "name": "Sprite Yoga Strap 10 foot",
855
- "sku": "24-WG087"
856
- },
857
- {
858
- "name": "Aim Analog Watch",
859
- "sku": "24-MG04"
860
- },
861
- {
862
- "name": "Endurance Watch",
863
- "sku": "24-MG01"
864
- },
865
- {
866
- "name": "Summit Watch",
867
- "sku": "24-MG03"
868
- },
869
- {
870
- "name": "Cruise Dual Analog Watch",
871
- "sku": "24-MG05"
872
- },
873
- {
874
- "name": "Dash Digital Watch",
875
- "sku": "24-MG02"
876
- },
877
- {
878
- "name": "Luma Analog Watch",
879
- "sku": "24-WG09"
880
- },
881
- {
882
- "name": "Bolo Sport Watch",
883
- "sku": "24-WG01"
884
- },
885
- {
886
- "name": "Clamber Watch",
887
- "sku": "24-WG03"
888
- },
889
- {
890
- "name": "Didi Sport Watch",
891
- "sku": "24-WG02"
892
- },
893
- {
894
- "name": "Stellar Solar Jacket",
895
- "sku": "WJ01"
896
- },
897
- {
898
- "name": "Josie Yoga Jacket",
899
- "sku": "WJ02"
900
- },
901
- {
902
- "name": "Augusta Pullover Jacket",
903
- "sku": "WJ03"
904
  },
905
  {
906
- "name": "Ingrid Running Jacket",
907
- "sku": "WJ04"
 
908
  },
909
  {
910
- "name": "Riona Full Zip Jacket",
911
- "sku": "WJ05"
 
912
  },
913
  {
914
- "name": "Juno Jacket",
915
- "sku": "WJ06"
 
916
  },
917
  {
918
- "name": "Inez Full Zip Jacket",
919
- "sku": "WJ07"
 
920
  },
921
  {
922
- "name": "Adrienne Trek Jacket",
923
- "sku": "WJ08"
 
924
  },
925
  {
926
- "name": "Jade Yoga Jacket",
927
- "sku": "WJ09"
 
928
  },
929
  {
930
- "name": "Nadia Elements Shell",
931
- "sku": "WJ10"
 
932
  },
933
  {
934
- "name": "Neve Studio Dance Jacket",
935
- "sku": "WJ11"
 
936
  },
937
  {
938
- "name": "Olivia 1/4 Zip Light Jacket",
939
- "sku": "WJ12"
 
940
  },
941
  {
942
- "name": "Chaz Kangeroo Hoodie",
943
- "sku": "MH01"
 
944
  },
945
  {
946
- "name": "Teton Pullover Hoodie",
947
- "sku": "MH02"
 
948
  },
949
  {
950
- "name": "Bruno Compete Hoodie",
951
- "sku": "MH03"
 
952
  },
953
  {
954
- "name": "Frankie Sweatshirt",
955
- "sku": "MH04"
 
956
  },
957
  {
958
- "name": "Hollister Backyard Sweatshirt",
959
- "sku": "MH05"
 
960
  },
961
  {
962
- "name": "Stark Fundamental Hoodie",
963
- "sku": "MH06"
 
964
  },
965
  {
966
- "name": "Hero Hoodie",
967
- "sku": "MH07"
 
968
  },
969
  {
970
- "name": "Oslo Trek Hoodie",
971
- "sku": "MH08"
 
972
  }
973
  ]
974
  }
 
4
  "generated_at": "2026-04-08",
5
  "source": {
6
  "categories": "GET /rest/V1/categories/list (live EC2, port 7780)",
7
+ "products": "HTML scrape of search results page on live EC2 store (port 7770) \u2014 product_id is the Magento entity ID used in add-to-cart forms; sku is PROD-{product_id} as the store REST API is auth-gated",
8
  "forums": "HTML scrape of /forums page (live EC2, port 9999) + HTTP 200 verification per slug",
9
  "wikipedia": "Well-known Wikipedia titles \u2014 verified by grader at runtime via HEAD /wikipedia_en.../A/{slug}",
10
  "admin_skus": "Generated (HAR-TEST-NNN namespace, no collision with existing catalog)",
 
13
  "grader_matching_notes": {
14
  "template_1": "category_id stored for grader; category_name is what appears in task string",
15
  "template_2": "expected_slug stored for grader (verifies HTTP 200); display title is in task string",
16
+ "template_3": "product_id stored for grader (checks POST /checkout/cart/add + cart probe); product name is in task string for HTML search flow",
17
  "template_4": "forum_name must exist and return posts; no exact value matching needed",
18
  "template_5": "title is free-form generated; grader only checks post was created in that forum",
19
+ "template_6": "product_id stored for grader; name is in task string; checkout grader checks REST guest-cart stages OR HTML checkout flow",
20
  "template_7": "sku+price are exact \u2014 grader calls GET /rest/V1/products/{sku} to verify creation"
21
  }
22
  },
 
29
  ],
30
  "pool": {
31
  "category_name": [
 
 
 
 
32
  {
33
  "name": "Bags",
34
  "category_id": 4
35
  },
36
  {
37
+ "name": "Backpack",
38
+ "category_id": 4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  },
40
  {
41
  "name": "Jackets",
42
+ "category_id": 11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  },
44
  {
45
+ "name": "Hoodies",
46
+ "category_id": 9
47
  },
48
  {
49
  "name": "Pants",
50
+ "category_id": 13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  },
52
  {
53
+ "name": "Shoes",
54
+ "category_id": 3
55
  },
56
  {
57
+ "name": "Boots",
58
+ "category_id": 3
59
  },
60
  {
61
+ "name": "Slippers",
62
+ "category_id": 3
63
  }
64
  ]
65
  }
 
186
  "pool": {
187
  "product_name": [
188
  {
189
+ "name": "Camera Backpack Bagsmar DSLR Waterproof",
190
+ "sku": "PROD-89940",
191
+ "product_id": 89940
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
  },
193
  {
194
+ "name": "Totes Womens Cold Weather Boots Nicole",
195
+ "sku": "PROD-29409",
196
+ "product_id": 29409
197
  },
198
  {
199
+ "name": "Totes Womens Snow Boots Jami Lace Up",
200
+ "sku": "PROD-83651",
201
+ "product_id": 83651
202
  },
203
  {
204
+ "name": "Noldares Flannel Jacket For Men Plaid",
205
+ "sku": "PROD-59237",
206
+ "product_id": 59237
207
  },
208
  {
209
+ "name": "Inesver Womens Leather Jacket Open Front",
210
+ "sku": "PROD-30743",
211
+ "product_id": 30743
212
  },
213
  {
214
+ "name": "Womens Corduroy Coat Plaid Hoodie Long Jacket",
215
+ "sku": "PROD-13227",
216
+ "product_id": 13227
217
  },
218
  {
219
+ "name": "Womens Fleece Jacket With Hood Winter",
220
+ "sku": "PROD-60773",
221
+ "product_id": 60773
222
  },
223
  {
224
+ "name": "Champion Hoodie Big And Tall Zip Up",
225
+ "sku": "PROD-64850",
226
+ "product_id": 64850
227
  },
228
  {
229
+ "name": "Matching Couples Hoodie Set",
230
+ "sku": "PROD-60915",
231
+ "product_id": 60915
232
  },
233
  {
234
+ "name": "Mens Novelty 3D Printed Pullover Hoodie",
235
+ "sku": "PROD-62228",
236
+ "product_id": 62228
237
  },
238
  {
239
+ "name": "Mens Slim Fit Cargo Pants Athletic",
240
+ "sku": "PROD-65987",
241
+ "product_id": 65987
242
  },
243
  {
244
+ "name": "Mens Ripstop Cargo Pants Tactical Hiking",
245
+ "sku": "PROD-10245",
246
+ "product_id": 10245
247
  },
248
  {
249
+ "name": "Womens Flowy Boho Harem Pants Yoga",
250
+ "sku": "PROD-64374",
251
+ "product_id": 64374
252
  },
253
  {
254
+ "name": "Womens High Waist Harem Pants Stripe",
255
+ "sku": "PROD-61333",
256
+ "product_id": 61333
257
  },
258
  {
259
+ "name": "Shoeslocker Womens Cozy Memory Foam Slippers",
260
+ "sku": "PROD-94779",
261
+ "product_id": 94779
262
  },
263
  {
264
+ "name": "Mens Canvas Korean Fashion Casual Shoes",
265
+ "sku": "PROD-60868",
266
+ "product_id": 60868
267
  },
268
  {
269
+ "name": "Unisex Diving Shoes Ultralight Anti Slip",
270
+ "sku": "PROD-12364",
271
+ "product_id": 12364
272
  },
273
  {
274
+ "name": "Womens Loafers Fashion Retro Single Shoes",
275
+ "sku": "PROD-63738",
276
+ "product_id": 63738
 
 
 
 
 
 
 
 
 
 
 
277
  }
278
  ]
279
  }
 
485
  "pool": {
486
  "product_name": [
487
  {
488
+ "name": "Camera Backpack Bagsmar DSLR Waterproof",
489
+ "sku": "PROD-89940",
490
+ "product_id": 89940
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
491
  },
492
  {
493
+ "name": "Totes Womens Cold Weather Boots Nicole",
494
+ "sku": "PROD-29409",
495
+ "product_id": 29409
496
  },
497
  {
498
+ "name": "Totes Womens Snow Boots Jami Lace Up",
499
+ "sku": "PROD-83651",
500
+ "product_id": 83651
501
  },
502
  {
503
+ "name": "Noldares Flannel Jacket For Men Plaid",
504
+ "sku": "PROD-59237",
505
+ "product_id": 59237
506
  },
507
  {
508
+ "name": "Inesver Womens Leather Jacket Open Front",
509
+ "sku": "PROD-30743",
510
+ "product_id": 30743
511
  },
512
  {
513
+ "name": "Womens Corduroy Coat Plaid Hoodie Long Jacket",
514
+ "sku": "PROD-13227",
515
+ "product_id": 13227
516
  },
517
  {
518
+ "name": "Womens Fleece Jacket With Hood Winter",
519
+ "sku": "PROD-60773",
520
+ "product_id": 60773
521
  },
522
  {
523
+ "name": "Champion Hoodie Big And Tall Zip Up",
524
+ "sku": "PROD-64850",
525
+ "product_id": 64850
526
  },
527
  {
528
+ "name": "Matching Couples Hoodie Set",
529
+ "sku": "PROD-60915",
530
+ "product_id": 60915
531
  },
532
  {
533
+ "name": "Mens Novelty 3D Printed Pullover Hoodie",
534
+ "sku": "PROD-62228",
535
+ "product_id": 62228
536
  },
537
  {
538
+ "name": "Mens Slim Fit Cargo Pants Athletic",
539
+ "sku": "PROD-65987",
540
+ "product_id": 65987
541
  },
542
  {
543
+ "name": "Mens Ripstop Cargo Pants Tactical Hiking",
544
+ "sku": "PROD-10245",
545
+ "product_id": 10245
546
  },
547
  {
548
+ "name": "Womens Flowy Boho Harem Pants Yoga",
549
+ "sku": "PROD-64374",
550
+ "product_id": 64374
551
  },
552
  {
553
+ "name": "Womens High Waist Harem Pants Stripe",
554
+ "sku": "PROD-61333",
555
+ "product_id": 61333
556
  },
557
  {
558
+ "name": "Shoeslocker Womens Cozy Memory Foam Slippers",
559
+ "sku": "PROD-94779",
560
+ "product_id": 94779
561
  },
562
  {
563
+ "name": "Mens Canvas Korean Fashion Casual Shoes",
564
+ "sku": "PROD-60868",
565
+ "product_id": 60868
566
  },
567
  {
568
+ "name": "Unisex Diving Shoes Ultralight Anti Slip",
569
+ "sku": "PROD-12364",
570
+ "product_id": 12364
571
  },
572
  {
573
+ "name": "Womens Loafers Fashion Retro Single Shoes",
574
+ "sku": "PROD-63738",
575
+ "product_id": 63738
576
  }
577
  ]
578
  }
pyproject.toml CHANGED
@@ -16,6 +16,8 @@ dependencies = [
16
  "rank-bm25>=0.2.2",
17
  "openai>=1.0.0",
18
  "numpy>=1.24.0",
 
 
19
  ]
20
 
21
  [project.optional-dependencies]
 
16
  "rank-bm25>=0.2.2",
17
  "openai>=1.0.0",
18
  "numpy>=1.24.0",
19
+ "beautifulsoup4>=4.14.3",
20
+ "lxml>=6.0.2",
21
  ]
22
 
23
  [project.optional-dependencies]
scripts/inspect_har_endpoints.py ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ inspect_har_endpoints.py
4
+
5
+ Runs extract_openapi_spec() on every HAR file in hars/ and prints a full
6
+ summary of discovered endpoints — method, path, status code, auth, and a
7
+ snippet of the request/response body where available.
8
+
9
+ Usage:
10
+ python scripts/inspect_har_endpoints.py [--json]
11
+
12
+ Flags:
13
+ --json Emit machine-readable JSON instead of the human-readable table
14
+ """
15
+
16
+ from __future__ import annotations
17
+
18
+ import json
19
+ import sys
20
+ from pathlib import Path
21
+
22
+ # ---------------------------------------------------------------------------
23
+ # Path setup — make the package importable without installing
24
+ # ---------------------------------------------------------------------------
25
+
26
+ REPO_ROOT = Path(__file__).resolve().parent.parent
27
+ sys.path.insert(0, str(REPO_ROOT))
28
+
29
+ from server.tools.browser_agent import extract_openapi_spec # noqa: E402
30
+
31
+
32
+ # ---------------------------------------------------------------------------
33
+ # HAR files to inspect
34
+ # ---------------------------------------------------------------------------
35
+
36
+ HARS_DIR = REPO_ROOT / "hars"
37
+
38
+ HAR_FILES = {
39
+ "shopping": HARS_DIR / "shopping.har",
40
+ "shopping_admin": HARS_DIR / "shopping_admin.har",
41
+ "forum": HARS_DIR / "forum.har",
42
+ "wikipedia": HARS_DIR / "wikipedia.har",
43
+ }
44
+
45
+ # Fake base URLs — only used for pass-through in extract_openapi_spec
46
+ APP_BASE_URLS = {
47
+ "shopping": "http://localhost:7770",
48
+ "shopping_admin": "http://localhost:7780",
49
+ "forum": "http://localhost:9999",
50
+ "wikipedia": "http://localhost:8888",
51
+ }
52
+
53
+
54
+ # ---------------------------------------------------------------------------
55
+ # Pretty-print helpers
56
+ # ---------------------------------------------------------------------------
57
+
58
+ _COL_W = 80
59
+
60
+
61
+ def _hr(char: str = "─") -> None:
62
+ print(char * _COL_W)
63
+
64
+
65
+ def _body_snippet(value) -> str | None:
66
+ if value is None:
67
+ return None
68
+ if isinstance(value, str):
69
+ snippet = value[:120]
70
+ else:
71
+ snippet = json.dumps(value)[:120]
72
+ return snippet + ("…" if len(str(snippet)) >= 120 else "")
73
+
74
+
75
+ def _print_entry(idx: int, entry: dict) -> None:
76
+ auth_flag = "🔐 AUTH" if entry["auth_observed"] else "open"
77
+ print(f" [{idx:>3}] {entry['method']:<7} {entry['path']}")
78
+ print(f" status={entry['status_code']} ct={entry['response_content_type'] or '—'} {auth_flag}")
79
+ if entry.get("query_params"):
80
+ print(f" query: {entry['query_params'][:100]}")
81
+ req_snippet = _body_snippet(entry.get("request_body"))
82
+ if req_snippet:
83
+ print(f" req_body: {req_snippet}")
84
+ resp_snippet = _body_snippet(entry.get("response_body_sample"))
85
+ if resp_snippet:
86
+ print(f" resp_sample: {resp_snippet}")
87
+
88
+
89
+ def _method_counts(entries: list[dict]) -> dict[str, int]:
90
+ counts: dict[str, int] = {}
91
+ for e in entries:
92
+ counts[e["method"]] = counts.get(e["method"], 0) + 1
93
+ return dict(sorted(counts.items()))
94
+
95
+
96
+ def print_app_summary(app_name: str, entries: list[dict], raw_total: int | None = None) -> None:
97
+ _hr("═")
98
+ header = f" APP: {app_name.upper()} ({len(entries)} unique API endpoints"
99
+ if raw_total is not None:
100
+ header += f" extracted from {raw_total} raw HAR entries"
101
+ header += ")"
102
+ print(header)
103
+ counts = _method_counts(entries)
104
+ print(f" Methods: {counts}")
105
+ auth_count = sum(1 for e in entries if e["auth_observed"])
106
+ print(f" Auth-required endpoints: {auth_count}/{len(entries)}")
107
+ _hr()
108
+ if not entries:
109
+ print(" (no API-like entries survived filtering)")
110
+ for i, entry in enumerate(entries, 1):
111
+ _print_entry(i, entry)
112
+ print()
113
+
114
+
115
+ # ---------------------------------------------------------------------------
116
+ # JSON mode
117
+ # ---------------------------------------------------------------------------
118
+
119
+ def emit_json(results: dict) -> None:
120
+ # Convert to a JSON-safe structure
121
+ output = {}
122
+ for app_name, entries in results.items():
123
+ output[app_name] = {
124
+ "total": len(entries),
125
+ "method_counts": _method_counts(entries),
126
+ "endpoints": entries,
127
+ }
128
+ print(json.dumps(output, indent=2))
129
+
130
+
131
+ # ---------------------------------------------------------------------------
132
+ # Verification / assertion checks
133
+ # ---------------------------------------------------------------------------
134
+
135
+
136
+ # NOTE: These HAR files are sparse — each was recorded for a narrow task
137
+ # scenario, not as a full API crawl. The vast majority of HAR entries are
138
+ # static assets (/static/ prefix) that the extractor correctly filters out.
139
+ # Thresholds below reflect the actual usable API surface in each file.
140
+ SANITY_CHECKS: dict[str, dict] = {
141
+ "shopping": {
142
+ "min_endpoints": 1,
143
+ "expected_methods": {"GET"},
144
+ "note": "Sparse HAR — only checkout success page recorded; "
145
+ "213 total entries but 212 are /static/ assets.",
146
+ },
147
+ "shopping_admin": {
148
+ "min_endpoints": 2,
149
+ "expected_methods": {"GET", "POST"},
150
+ "note": "Sparse HAR — product save/edit + MUI JSON endpoint; "
151
+ "353 total entries but 350 are /static/ assets.",
152
+ },
153
+ "forum": {
154
+ "min_endpoints": 2,
155
+ "expected_methods": {"GET", "POST"},
156
+ "note": "Sparse HAR — one POST submission + one forum thread GET; "
157
+ "24 total entries but 22 are .js build files.",
158
+ },
159
+ "wikipedia": {
160
+ "min_endpoints": 0,
161
+ "expected_methods": set(),
162
+ "note": "Sparse HAR — only an article HTML page + /-/mw/ style/JS assets; "
163
+ "no XHR/REST traffic recorded.",
164
+ },
165
+ }
166
+
167
+
168
+ def run_checks(results: dict) -> bool:
169
+ print("\n" + "─" * _COL_W)
170
+ print("SANITY CHECKS (thresholds calibrated to actual HAR content)")
171
+ print("─" * _COL_W)
172
+ all_passed = True
173
+ for app_name, checks in SANITY_CHECKS.items():
174
+ entries = results.get(app_name, [])
175
+ methods_found = set(e["method"] for e in entries)
176
+ n = len(entries)
177
+
178
+ min_ok = n >= checks["min_endpoints"]
179
+ exp = checks["expected_methods"]
180
+ methods_ok = exp.issubset(methods_found) if exp else True
181
+
182
+ status = "PASS" if (min_ok and methods_ok) else "FAIL"
183
+ if status == "FAIL":
184
+ all_passed = False
185
+
186
+ print(f" {status} {app_name}")
187
+ print(f" endpoints : {n} (min={checks['min_endpoints']}) {'✓' if min_ok else '✗'}")
188
+ if exp:
189
+ print(f" methods : {sorted(methods_found)} "
190
+ f"(expected ⊇ {sorted(exp)}) {'✓' if methods_ok else '✗'}")
191
+ print(f" note : {checks['note']}")
192
+ print("─" * _COL_W)
193
+ print("Overall:", "ALL PASSED ✓" if all_passed else "SOME FAILED ✗")
194
+ return all_passed
195
+
196
+
197
+ # ---------------------------------------------------------------------------
198
+ # Main
199
+ # ---------------------------------------------------------------------------
200
+
201
+ def main() -> int:
202
+ emit_json_mode = "--json" in sys.argv
203
+
204
+ results: dict[str, list[dict]] = {}
205
+ raw_totals: dict[str, int] = {}
206
+ missing: list[str] = []
207
+
208
+ for app_name, har_path in HAR_FILES.items():
209
+ if not har_path.exists():
210
+ print(f"[WARN] HAR not found: {har_path}", file=sys.stderr)
211
+ missing.append(app_name)
212
+ results[app_name] = []
213
+ raw_totals[app_name] = 0
214
+ continue
215
+
216
+ with open(har_path) as f:
217
+ har_data = json.load(f)
218
+
219
+ raw_totals[app_name] = len(har_data.get("log", {}).get("entries", []))
220
+ entries = extract_openapi_spec(har_data, APP_BASE_URLS[app_name])
221
+ results[app_name] = entries
222
+
223
+ if emit_json_mode:
224
+ emit_json(results)
225
+ return 0
226
+
227
+ # Human-readable output
228
+ for app_name, entries in results.items():
229
+ print_app_summary(app_name, entries, raw_totals.get(app_name))
230
+
231
+ passed = run_checks(results)
232
+
233
+ if missing:
234
+ print(f"\n[WARN] Missing HAR files for: {', '.join(missing)}")
235
+
236
+ return 0 if passed else 1
237
+
238
+
239
+ if __name__ == "__main__":
240
+ sys.exit(main())
server/judge.py CHANGED
@@ -140,23 +140,38 @@ def _get_curl_steps(episode: Episode):
140
  def grade_template_1(episode: Episode, task: Task) -> float:
141
  """Easy — Shopping: List products in category {category_name}"""
142
  category_name = task.params.get("category_name", "")
 
143
 
144
  for step in _get_curl_steps(episode):
145
  cp = step.curl_parsed
146
  if cp.status_code == 200:
147
  body = cp.response_body
 
148
  if isinstance(body, dict) and "items" in body:
149
  items = body["items"]
150
  if len(items) > 0:
151
- # Check if any item mentions the category
152
  for item in items:
153
  if _item_matches_category(item, category_name):
154
  return 1.0
155
- # Items returned but can't verify category — partial
156
  return 0.3
157
- # Also check if it's a raw list
158
  if isinstance(body, list) and len(body) > 0:
159
  return 0.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
  return 0.0
162
 
@@ -220,14 +235,14 @@ def grade_template_3(episode: Episode, task: Task) -> float:
220
  """Medium — Shopping: Add {product_name} to a guest cart"""
221
  product_name = task.params.get("product_name", "")
222
  sku = task.params.get("sku")
 
223
 
224
- # Primary: check if add-to-cart responded with item_id
225
  for step in _get_curl_steps(episode):
226
  cp = step.curl_parsed
227
  if cp.status_code == 200:
228
  body = cp.response_body
229
  if isinstance(body, dict) and "item_id" in body:
230
- # Verify the sku if we have it
231
  if sku and body.get("sku") == sku:
232
  return 1.0
233
  if _fuzzy_match(str(body.get("name", "")), product_name):
@@ -235,7 +250,29 @@ def grade_template_3(episode: Episode, task: Task) -> float:
235
  if body.get("item_id"):
236
  return 1.0
237
 
238
- # Try live probe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
  cart_id = _extract_cart_id(episode)
240
  if cart_id:
241
  probe = _judge_probe(f"/rest/V1/guest-carts/{cart_id}", task.base_url)
@@ -247,13 +284,13 @@ def grade_template_3(episode: Episode, task: Task) -> float:
247
  if _fuzzy_match(str(item.get("name", "")), product_name):
248
  return 1.0
249
  if len(items) == 0:
250
- return 0.2 # cart created, item not added
251
 
252
- # Partial: cart was created
253
  if cart_id:
254
  return 0.2
255
 
256
- # Partial: attempted cart creation
257
  if any("guest-carts" in (s.curl_parsed.path or "") and
258
  s.curl_parsed.method == "POST"
259
  for s in _get_curl_steps(episode)):
@@ -424,7 +461,7 @@ def grade_template_6(episode: Episode, task: Task) -> float:
424
 
425
 
426
  def _extract_admin_token(episode: Episode) -> str | None:
427
- """Find admin bearer token from episode trajectory."""
428
  for step in _get_curl_steps(episode):
429
  cp = step.curl_parsed
430
  if cp.status_code == 200 and "integration/admin/token" in cp.path:
@@ -434,6 +471,49 @@ def _extract_admin_token(episode: Episode) -> str | None:
434
  return None
435
 
436
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
437
  def _attempted_product_creation(episode: Episode, sku: str) -> bool:
438
  """Check if the model attempted to create a product with this SKU."""
439
  for step in _get_curl_steps(episode):
@@ -666,12 +746,15 @@ def evaluate(episode: Episode) -> EpisodeResult:
666
 
667
  task_score = grader(episode, task)
668
  param_score = verify_parameter_sourcing(episode, task)
669
- auth_obtained = _check_forum_auth(episode) or bool(_extract_admin_token(episode))
670
 
671
  # Compute reward
672
  reward = _score_to_reward(task_score, template_id)
673
 
674
- # Bonus for auth obtained even on task failure
 
 
 
675
  if task_score < 0.5 and auth_obtained:
676
  reward = max(reward, AUTH_BONUS)
677
 
 
140
  def grade_template_1(episode: Episode, task: Task) -> float:
141
  """Easy — Shopping: List products in category {category_name}"""
142
  category_name = task.params.get("category_name", "")
143
+ category_lower = category_name.lower()
144
 
145
  for step in _get_curl_steps(episode):
146
  cp = step.curl_parsed
147
  if cp.status_code == 200:
148
  body = cp.response_body
149
+ # REST API JSON response (ideal path: /rest/V1/products)
150
  if isinstance(body, dict) and "items" in body:
151
  items = body["items"]
152
  if len(items) > 0:
 
153
  for item in items:
154
  if _item_matches_category(item, category_name):
155
  return 1.0
 
156
  return 0.3
157
+ # Raw list
158
  if isinstance(body, list) and len(body) > 0:
159
  return 0.3
160
+ # Distilled HTML page (from html_distiller) — check for search results page
161
+ # that contains product forms. page_type/forms/text are the distiller's keys.
162
+ if isinstance(body, dict) and "page_type" in body:
163
+ forms = body.get("forms", [])
164
+ text = body.get("text", "") or ""
165
+ title = (body.get("title") or "").lower()
166
+ # A search/category results page has multiple POST add-to-cart forms
167
+ product_forms = [f for f in forms if f.get("method") == "POST"
168
+ and "product" in f.get("fields", {})]
169
+ if product_forms:
170
+ # Check that the page is about the requested category
171
+ if category_lower in title or category_lower in text.lower():
172
+ return 1.0
173
+ # Products listed but category name not verifiable from title — partial
174
+ return 0.5
175
 
176
  return 0.0
177
 
 
235
  """Medium — Shopping: Add {product_name} to a guest cart"""
236
  product_name = task.params.get("product_name", "")
237
  sku = task.params.get("sku")
238
+ product_id = str(task.params.get("product_id", ""))
239
 
240
+ # Primary: REST API — check if add-to-cart responded with item_id
241
  for step in _get_curl_steps(episode):
242
  cp = step.curl_parsed
243
  if cp.status_code == 200:
244
  body = cp.response_body
245
  if isinstance(body, dict) and "item_id" in body:
 
246
  if sku and body.get("sku") == sku:
247
  return 1.0
248
  if _fuzzy_match(str(body.get("name", "")), product_name):
 
250
  if body.get("item_id"):
251
  return 1.0
252
 
253
+ # Secondary: HTML form-based add-to-cart (POST to /checkout/cart/add)
254
+ # A 302 redirect or 200 response from this endpoint means item was accepted
255
+ for step in _get_curl_steps(episode):
256
+ cp = step.curl_parsed
257
+ if cp.method == "POST" and "/checkout/cart/add" in (cp.path or ""):
258
+ if cp.status_code in (200, 302):
259
+ # Optionally verify the correct product_id was posted
260
+ body_str = str(cp.body or "")
261
+ correct_product = (not product_id) or (product_id in body_str)
262
+
263
+ # Probe cart to confirm item presence
264
+ probe = _judge_probe("/checkout/cart/", task.base_url)
265
+ if probe and probe.status_code == 200:
266
+ cart_text = (probe.body if isinstance(probe.body, str) else str(probe.body)).lower()
267
+ # Cart page mentions product name or has quantity indicators
268
+ if product_name.lower()[:15] in cart_text:
269
+ return 1.0
270
+ if "qty" in cart_text or "quantity" in cart_text or "item" in cart_text:
271
+ return 0.8 if correct_product else 0.6
272
+ # POST succeeded without cart confirmation
273
+ return 0.7 if correct_product else 0.5
274
+
275
+ # Try live probe via REST guest-cart
276
  cart_id = _extract_cart_id(episode)
277
  if cart_id:
278
  probe = _judge_probe(f"/rest/V1/guest-carts/{cart_id}", task.base_url)
 
284
  if _fuzzy_match(str(item.get("name", "")), product_name):
285
  return 1.0
286
  if len(items) == 0:
287
+ return 0.2 # cart created, item not added yet
288
 
289
+ # Partial: REST cart was created
290
  if cart_id:
291
  return 0.2
292
 
293
+ # Partial: attempted cart creation via REST
294
  if any("guest-carts" in (s.curl_parsed.path or "") and
295
  s.curl_parsed.method == "POST"
296
  for s in _get_curl_steps(episode)):
 
461
 
462
 
463
  def _extract_admin_token(episode: Episode) -> str | None:
464
+ """Find admin bearer token from shopping-admin trajectory (used by graders)."""
465
  for step in _get_curl_steps(episode):
466
  cp = step.curl_parsed
467
  if cp.status_code == 200 and "integration/admin/token" in cp.path:
 
471
  return None
472
 
473
 
474
+ def _check_any_auth_obtained(episode: Episode) -> bool:
475
+ """
476
+ Generic check: did the agent successfully obtain ANY form of authentication?
477
+
478
+ Detects:
479
+ - Forum/CSRF token authentication
480
+ - Shopping-admin integration token
481
+ - Any 200 response returning a bare token string (bearer, user token, API key)
482
+ - Any 200 response returning a dict with a token field (access_token, id_token, etc.)
483
+
484
+ Application-agnostic — the model discovers auth endpoints via browser_agent /
485
+ search_endpoints; this simply rewards the intermediate step of obtaining auth.
486
+ """
487
+ # Forum/CSRF auth
488
+ if _check_forum_auth(episode):
489
+ return True
490
+
491
+ # Shopping admin token
492
+ if _extract_admin_token(episode):
493
+ return True
494
+
495
+ # Generic: any successful response that looks like it returned an auth token
496
+ for step in _get_curl_steps(episode):
497
+ cp = step.curl_parsed
498
+ if cp.status_code != 200:
499
+ continue
500
+ body = cp.response_body
501
+
502
+ # Plain string token (e.g. Magento user/guest tokens, API keys)
503
+ if isinstance(body, str):
504
+ stripped = body.strip().strip('"')
505
+ if re.fullmatch(r"[A-Za-z0-9+/=_\-\.]{20,}", stripped):
506
+ return True
507
+
508
+ # Dict with a recognised token field
509
+ if isinstance(body, dict):
510
+ for k in ("token", "access_token", "id_token", "auth_token", "bearer"):
511
+ if k in body and isinstance(body[k], str) and len(body[k]) > 10:
512
+ return True
513
+
514
+ return False
515
+
516
+
517
  def _attempted_product_creation(episode: Episode, sku: str) -> bool:
518
  """Check if the model attempted to create a product with this SKU."""
519
  for step in _get_curl_steps(episode):
 
746
 
747
  task_score = grader(episode, task)
748
  param_score = verify_parameter_sourcing(episode, task)
749
+ auth_obtained = _check_any_auth_obtained(episode)
750
 
751
  # Compute reward
752
  reward = _score_to_reward(task_score, template_id)
753
 
754
+ # Auth bonus: if the task failed but the agent successfully obtained any form
755
+ # of authentication (bearer token, session cookie, CSRF token, etc.), floor
756
+ # the reward at AUTH_BONUS. This is application-agnostic — obtaining auth is
757
+ # a useful intermediate skill regardless of the specific task template.
758
  if task_score < 0.5 and auth_obtained:
759
  reward = max(reward, AUTH_BONUS)
760
 
server/models.py CHANGED
@@ -75,6 +75,8 @@ REWARD_NEW_PATH = 0.1 # curl path not seen before this episode
75
  REWARD_CORRECT_PARAM = 0.25 # judge: correct parameter sourcing (applied at end)
76
  REWARD_SESSION_VALUE = 0.1 # auth token/cookie correctly used
77
  PENALTY_REPEATED_CALL = -0.15 # exact duplicate curl command
 
 
78
  PENALTY_BROWSER_AGENT_AGAIN = -0.3 # browser_agent called after step 1
79
  PENALTY_MALFORMED_CURL = -0.1 # curl can't be parsed/executed
80
  PENALTY_4XX = -0.05 # recoverable HTTP error
@@ -103,14 +105,49 @@ TASK_NAME_TO_TEMPLATE = {
103
  "har_pipeline_hard": 6,
104
  }
105
 
106
- TEMPLATE_DESCRIPTIONS = {
107
- 1: "List products in category {category_name}",
108
- 2: "Retrieve the Wikipedia article for '{title}'",
109
- 3: "Add '{product_name}' to a guest cart",
110
- 4: "Retrieve all posts in the '{forum_category}' forum (you must log in first)",
111
- 5: "Create a forum post titled '{title}' in the '{category}' forum",
112
- 6: "Complete a guest checkout for '{product_name}'",
113
- 7: "Create a new product in the admin panel with SKU '{sku}' and price {price}",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  }
115
 
116
 
@@ -139,7 +176,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
139
  items = pool.get("category_name", [{"name": "Gear", "category_id": 3}])
140
  chosen = random.choice(items)
141
  params = {"category_name": chosen["name"], "category_id": chosen.get("category_id")}
142
- description = TEMPLATE_DESCRIPTIONS[1].format(**params)
143
 
144
  elif template_id == 2:
145
  items = pool.get("title", [{"title": "Python (programming language)", "expected_slug": "Python_(programming_language)"}])
@@ -148,7 +185,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
148
  chosen = random.choice(items)
149
  title = chosen.get("title", chosen) if isinstance(chosen, dict) else chosen
150
  params = {"title": title, "expected_slug": chosen.get("expected_slug", title.replace(" ", "_"))}
151
- description = TEMPLATE_DESCRIPTIONS[2].format(**params)
152
 
153
  elif template_id == 3:
154
  items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
@@ -157,8 +194,11 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
157
  chosen = random.choice(items)
158
  product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
159
  sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
 
160
  params = {"product_name": product_name, "sku": sku}
161
- description = TEMPLATE_DESCRIPTIONS[3].format(**params)
 
 
162
 
163
  elif template_id == 4:
164
  items = pool.get("forum_category", [{"slug": "general", "name": "General"}])
@@ -167,7 +207,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
167
  chosen = random.choice(items)
168
  forum_cat = chosen.get("slug", chosen.get("name", "general")) if isinstance(chosen, dict) else chosen
169
  params = {"forum_category": forum_cat}
170
- description = TEMPLATE_DESCRIPTIONS[4].format(**params)
171
 
172
  elif template_id == 5:
173
  categories = pool.get("forum_category", [{"slug": "general"}])
@@ -180,7 +220,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
180
  chosen_title = random.choice(titles) if isinstance(titles[0], str) else random.choice(titles).get("title", "Test post")
181
  forum_cat = chosen_cat.get("slug", "general") if isinstance(chosen_cat, dict) else chosen_cat
182
  params = {"title": chosen_title, "category": forum_cat}
183
- description = TEMPLATE_DESCRIPTIONS[5].format(**params)
184
 
185
  elif template_id == 6:
186
  items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
@@ -189,8 +229,11 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
189
  chosen = random.choice(items)
190
  product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
191
  sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
 
192
  params = {"product_name": product_name, "sku": sku}
193
- description = TEMPLATE_DESCRIPTIONS[6].format(**params)
 
 
194
 
195
  elif template_id == 7:
196
  items = pool.get("admin_sku", [{"sku": "HAR-TEST-001", "price": "29.99"}])
@@ -200,7 +243,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
200
  sku = chosen.get("sku", "HAR-TEST-001") if isinstance(chosen, dict) else chosen
201
  price = str(chosen.get("price", "29.99")) if isinstance(chosen, dict) else "29.99"
202
  params = {"sku": sku, "price": price}
203
- description = TEMPLATE_DESCRIPTIONS[7].format(**params)
204
 
205
  else:
206
  params = {}
@@ -211,6 +254,19 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
211
  return description, params, base_url
212
 
213
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
  # ---------------------------------------------------------------------------
215
  # Environment
216
  # ---------------------------------------------------------------------------
@@ -235,6 +291,7 @@ class HARvestGymEnvironment(Environment):
235
  self._episode_store: dict = {} # embeddings, BM25 corpus, etc.
236
  self._called_paths: set = set() # for new-path reward
237
  self._last_curl_commands: list = [] # for duplicate detection
 
238
  self._step_rewards: list[float] = []
239
  self._done = False
240
 
@@ -297,6 +354,12 @@ class HARvestGymEnvironment(Environment):
297
  task_name = self._task_name
298
  if task_name in TASK_NAME_TO_TEMPLATE:
299
  return TASK_NAME_TO_TEMPLATE[task_name]
 
 
 
 
 
 
300
  # Try integer
301
  try:
302
  tid = int(task_name)
@@ -310,17 +373,30 @@ class HARvestGymEnvironment(Environment):
310
  """Reset environment: clear episode state, sample new task."""
311
  from .episode import Episode, Task
312
 
313
- template_id = self._get_template_id()
314
- description, params, base_url = _sample_task(template_id, self._parameter_pools)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
315
 
316
- meta = TEMPLATE_META[template_id]
317
  self._current_task = Task(
318
  template_id=template_id,
319
  description=description,
320
  params=params,
321
- app=meta["app"],
322
  base_url=base_url,
323
- difficulty=meta["tier"],
324
  )
325
 
326
  self._episode = Episode(task=self._current_task)
@@ -328,6 +404,7 @@ class HARvestGymEnvironment(Environment):
328
  self._episode_store = {}
329
  self._called_paths = set()
330
  self._last_curl_commands = []
 
331
  self._step_rewards = []
332
  self._done = False
333
  self._state = State(episode_id=str(uuid4()), step_count=0)
@@ -344,8 +421,8 @@ class HARvestGymEnvironment(Environment):
344
  reward=0.0,
345
  metadata={
346
  "template_id": template_id,
347
- "difficulty": meta["tier"],
348
- "app": meta["app"],
349
  },
350
  )
351
 
@@ -397,7 +474,9 @@ class HARvestGymEnvironment(Environment):
397
  headers=parsed["headers"],
398
  body=parsed["body"],
399
  status_code=resp.get("status_code", 0),
400
- response_body=resp.get("body"),
 
 
401
  response_headers=resp.get("headers", {}),
402
  )
403
  except Exception:
@@ -502,18 +581,33 @@ class HARvestGymEnvironment(Environment):
502
  reward += PENALTY_MALFORMED_CURL
503
  elif 200 <= status < 300:
504
  reward += REWARD_VALID_API_CALL
505
- # New path bonus
506
  from urllib.parse import urlparse
507
  from .tools.browser_agent import _normalise_path
508
  try:
509
- parsed_for_path = __import__("shlex").split(command)
510
- for t in parsed_for_path:
511
- if t.startswith("http"):
512
- path = _normalise_path(urlparse(t.strip("'\"")).path)
513
- if path and path not in self._called_paths:
514
- self._called_paths.add(path)
515
- reward += REWARD_NEW_PATH
 
 
 
 
 
 
 
516
  break
 
 
 
 
 
 
 
 
517
  except Exception:
518
  pass
519
  elif 400 <= status < 500:
 
75
  REWARD_CORRECT_PARAM = 0.25 # judge: correct parameter sourcing (applied at end)
76
  REWARD_SESSION_VALUE = 0.1 # auth token/cookie correctly used
77
  PENALTY_REPEATED_CALL = -0.15 # exact duplicate curl command
78
+ PENALTY_REPEATED_DIFF_PARAM_CALL = -0.05 # duplicate curl but with different parameters
79
+ PENALTY_REPEATED_PATH = -0.15 # same (method, normalised path) called more than once
80
  PENALTY_BROWSER_AGENT_AGAIN = -0.3 # browser_agent called after step 1
81
  PENALTY_MALFORMED_CURL = -0.1 # curl can't be parsed/executed
82
  PENALTY_4XX = -0.05 # recoverable HTTP error
 
105
  "har_pipeline_hard": 6,
106
  }
107
 
108
+ TEMPLATE_DESCRIPTIONS: dict[int, list[str]] = {
109
+ 1: [
110
+ "List products in category {category_name}",
111
+ "Show all products under the {category_name} category",
112
+ "Fetch the product listing for the '{category_name}' category",
113
+ "What products are available in the {category_name} category?",
114
+ ],
115
+ 2: [
116
+ "Retrieve the Wikipedia article for '{title}'",
117
+ "Fetch the Wikipedia page about '{title}'",
118
+ "Get the Wikipedia entry for '{title}'",
119
+ "Look up '{title}' on Wikipedia and return the article",
120
+ ],
121
+ 3: [
122
+ "Find '{product_name}' in the store and add it to the shopping cart",
123
+ "Add '{product_name}' to the cart",
124
+ "Shop for '{product_name}' and put it in the cart",
125
+ "I want to buy '{product_name}' — add it to my cart",
126
+ ],
127
+ 4: [
128
+ "Retrieve all posts in the '{forum_category}' forum (you must log in first)",
129
+ "Fetch the post list for the '{forum_category}' forum category",
130
+ "Get all threads in the '{forum_category}' forum section",
131
+ "List the forum posts under '{forum_category}' (authentication required)",
132
+ ],
133
+ 5: [
134
+ "Create a post titled '{title}' in the '{category}' forum. Note: authentication is required.",
135
+ "Post a new thread called '{title}' in the '{category}' forum",
136
+ "Submit a forum post with the title '{title}' to the '{category}' section",
137
+ "Publish '{title}' as a new post in the '{category}' forum",
138
+ ],
139
+ 6: [
140
+ "Complete a full guest checkout for '{product_name}'. The checkout involves multiple dependent steps — each step produces a value needed by the next. The task is complete when a confirmed order is placed.",
141
+ "Place a guest order for '{product_name}'. The process spans several API calls that build on each other; you are done when an order confirmation is received.",
142
+ "Buy '{product_name}' as a guest user and complete the checkout. Each stage of the checkout requires information returned by the previous stage.",
143
+ "Finish a guest checkout for '{product_name}'. Work through each step in sequence — the output of every step feeds into the next — until the order is confirmed.",
144
+ ],
145
+ 7: [
146
+ "Create a new product in the admin panel with SKU '{sku}' and price {price}. Admin access is required.",
147
+ "Add a product to the catalog via the admin interface: SKU '{sku}', price {price}",
148
+ "As an admin, create a new product listing with SKU '{sku}' priced at {price}",
149
+ "Use admin credentials to create a product with SKU '{sku}' and a price of {price}",
150
+ ],
151
  }
152
 
153
 
 
176
  items = pool.get("category_name", [{"name": "Gear", "category_id": 3}])
177
  chosen = random.choice(items)
178
  params = {"category_name": chosen["name"], "category_id": chosen.get("category_id")}
179
+ description = random.choice(TEMPLATE_DESCRIPTIONS[1]).format(**params)
180
 
181
  elif template_id == 2:
182
  items = pool.get("title", [{"title": "Python (programming language)", "expected_slug": "Python_(programming_language)"}])
 
185
  chosen = random.choice(items)
186
  title = chosen.get("title", chosen) if isinstance(chosen, dict) else chosen
187
  params = {"title": title, "expected_slug": chosen.get("expected_slug", title.replace(" ", "_"))}
188
+ description = random.choice(TEMPLATE_DESCRIPTIONS[2]).format(**params)
189
 
190
  elif template_id == 3:
191
  items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
 
194
  chosen = random.choice(items)
195
  product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
196
  sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
197
+ product_id = chosen.get("product_id") if isinstance(chosen, dict) else None
198
  params = {"product_name": product_name, "sku": sku}
199
+ if product_id:
200
+ params["product_id"] = product_id
201
+ description = random.choice(TEMPLATE_DESCRIPTIONS[3]).format(**params)
202
 
203
  elif template_id == 4:
204
  items = pool.get("forum_category", [{"slug": "general", "name": "General"}])
 
207
  chosen = random.choice(items)
208
  forum_cat = chosen.get("slug", chosen.get("name", "general")) if isinstance(chosen, dict) else chosen
209
  params = {"forum_category": forum_cat}
210
+ description = random.choice(TEMPLATE_DESCRIPTIONS[4]).format(**params)
211
 
212
  elif template_id == 5:
213
  categories = pool.get("forum_category", [{"slug": "general"}])
 
220
  chosen_title = random.choice(titles) if isinstance(titles[0], str) else random.choice(titles).get("title", "Test post")
221
  forum_cat = chosen_cat.get("slug", "general") if isinstance(chosen_cat, dict) else chosen_cat
222
  params = {"title": chosen_title, "category": forum_cat}
223
+ description = random.choice(TEMPLATE_DESCRIPTIONS[5]).format(**params)
224
 
225
  elif template_id == 6:
226
  items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
 
229
  chosen = random.choice(items)
230
  product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
231
  sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
232
+ product_id = chosen.get("product_id") if isinstance(chosen, dict) else None
233
  params = {"product_name": product_name, "sku": sku}
234
+ if product_id:
235
+ params["product_id"] = product_id
236
+ description = random.choice(TEMPLATE_DESCRIPTIONS[6]).format(**params)
237
 
238
  elif template_id == 7:
239
  items = pool.get("admin_sku", [{"sku": "HAR-TEST-001", "price": "29.99"}])
 
243
  sku = chosen.get("sku", "HAR-TEST-001") if isinstance(chosen, dict) else chosen
244
  price = str(chosen.get("price", "29.99")) if isinstance(chosen, dict) else "29.99"
245
  params = {"sku": sku, "price": price}
246
+ description = random.choice(TEMPLATE_DESCRIPTIONS[7]).format(**params)
247
 
248
  else:
249
  params = {}
 
254
  return description, params, base_url
255
 
256
 
257
+ def _load_fixed_task_from_env() -> dict | None:
258
+ """Load an exact task specification when the caller wants deterministic reset()."""
259
+ raw = os.environ.get("HARVGYM_TASK_SPEC_JSON", "").strip()
260
+ if not raw:
261
+ return None
262
+ try:
263
+ parsed = json.loads(raw)
264
+ except json.JSONDecodeError:
265
+ print("[HARvestGym] Ignoring invalid HARVGYM_TASK_SPEC_JSON", flush=True)
266
+ return None
267
+ return parsed if isinstance(parsed, dict) else None
268
+
269
+
270
  # ---------------------------------------------------------------------------
271
  # Environment
272
  # ---------------------------------------------------------------------------
 
291
  self._episode_store: dict = {} # embeddings, BM25 corpus, etc.
292
  self._called_paths: set = set() # for new-path reward
293
  self._last_curl_commands: list = [] # for duplicate detection
294
+ self._called_methods_paths: list[tuple[str, str]] = [] # for same-path penalty
295
  self._step_rewards: list[float] = []
296
  self._done = False
297
 
 
354
  task_name = self._task_name
355
  if task_name in TASK_NAME_TO_TEMPLATE:
356
  return TASK_NAME_TO_TEMPLATE[task_name]
357
+ if task_name.startswith("easy_"):
358
+ return 1
359
+ if task_name.startswith("medium_"):
360
+ return 3
361
+ if task_name.startswith("hard_"):
362
+ return 6
363
  # Try integer
364
  try:
365
  tid = int(task_name)
 
373
  """Reset environment: clear episode state, sample new task."""
374
  from .episode import Episode, Task
375
 
376
+ fixed_task = _load_fixed_task_from_env()
377
+
378
+ if fixed_task:
379
+ template_id = int(fixed_task.get("template_id", self._get_template_id()))
380
+ meta = TEMPLATE_META.get(template_id, TEMPLATE_META[self._get_template_id()])
381
+ params = dict(fixed_task.get("params") or {})
382
+ description = fixed_task.get("description") or TEMPLATE_DESCRIPTIONS[template_id].format(**params)
383
+ base_url = fixed_task.get("base_url") or f"http://{EC2_HOST}:{meta['base_url_port']}/"
384
+ difficulty = fixed_task.get("difficulty") or meta["tier"]
385
+ app = fixed_task.get("app") or meta["app"]
386
+ else:
387
+ template_id = self._get_template_id()
388
+ description, params, base_url = _sample_task(template_id, self._parameter_pools)
389
+ meta = TEMPLATE_META[template_id]
390
+ difficulty = meta["tier"]
391
+ app = meta["app"]
392
 
 
393
  self._current_task = Task(
394
  template_id=template_id,
395
  description=description,
396
  params=params,
397
+ app=app,
398
  base_url=base_url,
399
+ difficulty=difficulty,
400
  )
401
 
402
  self._episode = Episode(task=self._current_task)
 
404
  self._episode_store = {}
405
  self._called_paths = set()
406
  self._last_curl_commands = []
407
+ self._called_methods_paths = []
408
  self._step_rewards = []
409
  self._done = False
410
  self._state = State(episode_id=str(uuid4()), step_count=0)
 
421
  reward=0.0,
422
  metadata={
423
  "template_id": template_id,
424
+ "difficulty": difficulty,
425
+ "app": app,
426
  },
427
  )
428
 
 
474
  headers=parsed["headers"],
475
  body=parsed["body"],
476
  status_code=resp.get("status_code", 0),
477
+ # Use _judge_body (full structured body) for judge grading;
478
+ # falls back to body (truncated) if not present
479
+ response_body=resp.get("_judge_body", resp.get("body")),
480
  response_headers=resp.get("headers", {}),
481
  )
482
  except Exception:
 
581
  reward += PENALTY_MALFORMED_CURL
582
  elif 200 <= status < 300:
583
  reward += REWARD_VALID_API_CALL
584
+ # New path bonus + same-path penalty
585
  from urllib.parse import urlparse
586
  from .tools.browser_agent import _normalise_path
587
  try:
588
+ import shlex as _shlex
589
+ # Extract HTTP method (-X flag or infer from data flags)
590
+ _tokens = _shlex.split(command)
591
+ _method = "GET"
592
+ for _i, _tok in enumerate(_tokens):
593
+ if _tok in ("-X", "--request") and _i + 1 < len(_tokens):
594
+ _method = _tokens[_i + 1].upper()
595
+ break
596
+ if _method == "GET" and any(t in command for t in ("-d ", "--data", "-F ")):
597
+ _method = "POST"
598
+ _norm_path = None
599
+ for _t in _tokens:
600
+ if _t.startswith("http"):
601
+ _norm_path = _normalise_path(urlparse(_t.strip("'\"")).path)
602
  break
603
+ if _norm_path:
604
+ _mp = (_method, _norm_path)
605
+ if _mp in self._called_methods_paths:
606
+ reward += PENALTY_REPEATED_PATH
607
+ self._called_methods_paths.append(_mp)
608
+ if _norm_path not in self._called_paths:
609
+ self._called_paths.add(_norm_path)
610
+ reward += REWARD_NEW_PATH
611
  except Exception:
612
  pass
613
  elif 400 <= status < 500:
server/tools/browser_agent.py CHANGED
@@ -7,24 +7,17 @@ calls, REST endpoints, form submissions), and builds embeddings via the
7
  HuggingFace Inference API for semantic search_endpoints().
8
 
9
  Architecture:
 
 
 
 
 
 
 
 
10
  - Embeddings are cached on disk via embed_cache.py (max 2000 entries).
11
- On the first run for an app, the API is called once. All subsequent
12
- runs (and every episode within a training run) are pure cache hits —
13
- zero API cost.
14
-
15
- - Source priority:
16
- 1. HAR file (primary) — endpoints observed from browser traffic.
17
- If HAR has < HAR_MIN_ENTRIES meaningful endpoints, it is a partial
18
- recording and we augment with the API catalog (see below).
19
- 2. API catalog (fallback) — full structured spec extracted from source
20
- code. Used ONLY when the HAR is sparse. This is equivalent to
21
- the "live browser session" described in BROWSER_AGENT.md §Stage 2.
22
-
23
- - The catalog is ALSO used by the judge for parameter-sourcing grading.
24
- It serves double duty, but the two uses are completely independent:
25
- the judge compares tool call parameters against catalog ground truth,
26
- while the agent uses catalog entries as a search corpus when HAR alone
27
- is insufficient.
28
  """
29
 
30
  from __future__ import annotations
@@ -43,11 +36,6 @@ import numpy as np
43
  # ---------------------------------------------------------------------------
44
 
45
  HARS_DIR = Path(__file__).parent.parent.parent / "hars"
46
- CATALOGS_DIR = Path(__file__).parent.parent.parent / "catalogs"
47
-
48
- # If a HAR yields fewer than this many unique endpoints it is considered a
49
- # partial recording and the API catalog is used to fill in the rest.
50
- HAR_MIN_ENTRIES = 10
51
 
52
  HAR_MAP: dict[str, str] = {
53
  ":7770": "shopping.har",
@@ -140,6 +128,11 @@ def _is_api_like(path: str, method: str, resp_ct: str, req_ct: str) -> bool:
140
  return False
141
 
142
 
 
 
 
 
 
143
  def _normalise_path(path: str) -> str:
144
  for pattern, replacement in _ID_PATTERNS:
145
  path = pattern.sub(replacement, path)
@@ -195,9 +188,14 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
195
  """
196
  Extract an OpenAPI-like spec from HAR data.
197
 
198
- Includes: REST calls, XHR/fetch, form POSTs, any JSON-responding GET.
 
 
 
199
  Excludes: static assets (JS/CSS/images/fonts), analytics, CDN.
200
  """
 
 
201
  entries = har_data.get("log", {}).get("entries", [])
202
  seen: set[str] = set()
203
  spec_entries = []
@@ -219,7 +217,10 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
219
  parsed_url = urlparse(raw_url)
220
  path = parsed_url.path
221
 
222
- if not _is_api_like(path, method, resp_ct, req_ct):
 
 
 
223
  continue
224
 
225
  path_norm = _normalise_path(path)
@@ -233,16 +234,56 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
233
  for h in req.get("headers", [])
234
  )
235
 
236
- spec_entries.append({
237
- "method": method,
238
- "path": path_norm,
239
- "query_params": parsed_url.query or None,
240
- "request_body": _extract_body(req),
241
- "status_code": resp.get("status", 0),
242
- "response_content_type": resp_ct,
243
- "response_body_sample": _truncate_response_sample(resp),
244
- "auth_observed": has_auth,
245
- })
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
 
247
  return spec_entries
248
 
@@ -255,16 +296,26 @@ def spec_entry_to_text(entry: dict, app_name: str) -> str:
255
  f"status: {entry['status_code']}",
256
  f"auth: {'required' if entry['auth_observed'] else 'none'}",
257
  ]
258
- if entry.get("query_params"):
259
- parts.append(f"query: {entry['query_params']}")
260
- if entry.get("request_body"):
261
- body = entry["request_body"]
262
- body_str = json.dumps(body)[:_BODY_SAMPLE_CHARS] if not isinstance(body, str) else body[:_BODY_SAMPLE_CHARS]
263
- parts.append(f"body: {body_str}")
264
- if entry.get("response_body_sample") is not None:
265
- rsp = entry["response_body_sample"]
266
- rsp_str = json.dumps(rsp)[:_BODY_SAMPLE_CHARS] if not isinstance(rsp, str) else str(rsp)[:_BODY_SAMPLE_CHARS]
267
- parts.append(f"response_sample: {rsp_str}")
 
 
 
 
 
 
 
 
 
 
268
  return " | ".join(parts)
269
 
270
 
@@ -385,57 +436,9 @@ def embed_query_via_api(query: str) -> np.ndarray | None:
385
  return _embed_with_cache([query])
386
 
387
 
388
- def catalog_to_spec_entries(app_name: str) -> list[dict]:
389
- """
390
- Load the API catalog as spec entries.
391
-
392
- Used ONLY when the HAR yields fewer than HAR_MIN_ENTRIES endpoints
393
- (i.e. it is a partial/stub recording). This is equivalent to the
394
- live-browser-session fallback described in BROWSER_AGENT.md §Stage 2.
395
-
396
- The judge uses the same catalog for parameter-sourcing grading, but
397
- the two uses are independent — the agent's search corpus and the
398
- judge's ground-truth are different concepts that happen to share the
399
- same underlying data file.
400
- """
401
- catalog_path = CATALOGS_DIR / f"{app_name}.json"
402
- if not catalog_path.exists():
403
- return []
404
- try:
405
- with open(catalog_path) as f:
406
- data = json.load(f)
407
- endpoints = data if isinstance(data, list) else data.get("endpoints", [])
408
- spec_entries = []
409
- for ep in endpoints:
410
- endpoint_str = ep.get("endpoint", "")
411
- if endpoint_str and " " in endpoint_str:
412
- method, path = endpoint_str.split(" ", 1)
413
- method = method.upper()
414
- else:
415
- path = ep.get("path", endpoint_str)
416
- method = ep.get("method", "GET").upper()
417
- if not path:
418
- continue
419
- auth = ep.get("auth", ep.get("authentication", "none"))
420
- spec_entries.append({
421
- "method": method,
422
- "path": path,
423
- "query_params": None,
424
- "request_body": ep.get("body_params") or ep.get("body"),
425
- "status_code": 200,
426
- "response_content_type": "application/json",
427
- "response_body_sample": ep.get("response_fields") or ep.get("response_sample"),
428
- "auth_observed": auth not in ("none", "None", None, ""),
429
- })
430
- return spec_entries
431
- except Exception as e:
432
- print(f"[browser_agent] Could not load catalog '{app_name}': {e}", flush=True)
433
- return []
434
-
435
-
436
  def build_endpoint_embeddings(spec_entries: list[dict], app_name: str):
437
  """
438
- Build embeddings for all spec entries (HAR-extracted + catalog fallback).
439
  Returns (embeddings_array, text_chunks).
440
  Embeddings are retrieved from or saved to the persistent cache.
441
  """
@@ -512,22 +515,6 @@ def run_browser_agent(task: str, url: str, episode_store=None) -> dict:
512
  flush=True,
513
  )
514
 
515
- # Augment with catalog when HAR is a partial recording
516
- # (The catalog = source-code-extracted API spec; serves the same role as a
517
- # live browser session when no full HAR is available.)
518
- if len(spec_entries) < HAR_MIN_ENTRIES:
519
- catalog_entries = catalog_to_spec_entries(app_name)
520
- if catalog_entries:
521
- print(
522
- f"[browser_agent] HAR sparse ({len(spec_entries)} entries < {HAR_MIN_ENTRIES}), "
523
- f"augmenting from catalog ({len(catalog_entries)} entries)",
524
- flush=True,
525
- )
526
- har_paths = {e["path"] for e in spec_entries}
527
- for ce in catalog_entries:
528
- if ce["path"] not in har_paths:
529
- spec_entries.append(ce)
530
-
531
  # Build / retrieve embeddings via cache
532
  if spec_entries and episode_store is not None:
533
  try:
@@ -547,14 +534,19 @@ def run_browser_agent(task: str, url: str, episode_store=None) -> dict:
547
  _store_empty(episode_store, app_name)
548
 
549
  summary = [{"method": e["method"], "path": e["path"]} for e in spec_entries]
 
 
550
  return {
551
  "app": app_name,
552
  "endpoints": summary,
553
  "total_endpoints": len(summary),
 
 
554
  "note": (
555
- f"Discovered {len(summary)} API endpoints from recorded traffic. "
556
- "Use search_endpoints(query) to get full schema, parameters, and auth "
557
- "details for any endpoint."
 
558
  ),
559
  }
560
 
 
7
  HuggingFace Inference API for semantic search_endpoints().
8
 
9
  Architecture:
10
+ - The HAR file is the sole source of the agent's API knowledge.
11
+ The agent discovers endpoints only from what was recorded in the HAR.
12
+ If the HAR is sparse, the browser agent recording needs to be improved —
13
+ the product does not patch this by injecting other data sources.
14
+
15
+ - The API catalog (catalogs/*.json) is used exclusively by the judge
16
+ for parameter-sourcing grading. It plays no role in the training loop.
17
+
18
  - Embeddings are cached on disk via embed_cache.py (max 2000 entries).
19
+ First run: calls HF Inference API. All subsequent episodes in the same
20
+ training run are pure cache hits — zero API cost.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  """
22
 
23
  from __future__ import annotations
 
36
  # ---------------------------------------------------------------------------
37
 
38
  HARS_DIR = Path(__file__).parent.parent.parent / "hars"
 
 
 
 
 
39
 
40
  HAR_MAP: dict[str, str] = {
41
  ":7770": "shopping.har",
 
128
  return False
129
 
130
 
131
+ def _is_html_page(method: str, resp_ct: str) -> bool:
132
+ """Return True for HTML GET responses that may contain SSR data."""
133
+ return method == "GET" and "text/html" in resp_ct
134
+
135
+
136
  def _normalise_path(path: str) -> str:
137
  for pattern, replacement in _ID_PATTERNS:
138
  path = pattern.sub(replacement, path)
 
188
  """
189
  Extract an OpenAPI-like spec from HAR data.
190
 
191
+ Includes:
192
+ - REST calls, XHR/fetch, form POSTs, any JSON-responding GET
193
+ - HTML GET pages that have a non-empty response body (distilled via html_distiller)
194
+
195
  Excludes: static assets (JS/CSS/images/fonts), analytics, CDN.
196
  """
197
+ from .html_distiller import distill_html
198
+
199
  entries = har_data.get("log", {}).get("entries", [])
200
  seen: set[str] = set()
201
  spec_entries = []
 
217
  parsed_url = urlparse(raw_url)
218
  path = parsed_url.path
219
 
220
+ is_html = _is_html_page(method, resp_ct)
221
+ is_api = _is_api_like(path, method, resp_ct, req_ct)
222
+
223
+ if not is_api and not is_html:
224
  continue
225
 
226
  path_norm = _normalise_path(path)
 
234
  for h in req.get("headers", [])
235
  )
236
 
237
+ if is_html:
238
+ # Attempt to distil the HTML body captured in the HAR
239
+ html_body = entry.get("response", {}).get("content", {}).get("text", "") or ""
240
+ if not html_body:
241
+ # HAR was recorded without "Save response body" — still include the
242
+ # page as a stub so the agent knows the route exists
243
+ distilled = None
244
+ distilled_summary = None
245
+ else:
246
+ distilled = distill_html(html_body, base_url=raw_url)
247
+ # Build a short summary for the spec text (used for embeddings)
248
+ blob_count = len(distilled.get("data_blobs", []))
249
+ form_count = len(distilled.get("forms", []))
250
+ blob_keys = []
251
+ for b in distilled.get("data_blobs", [])[:3]:
252
+ blob_keys.extend(b.get("keys", [])[:5])
253
+ distilled_summary = {
254
+ "page_type": distilled.get("page_type"),
255
+ "title": distilled.get("title"),
256
+ "data_blobs": blob_count,
257
+ "forms": form_count,
258
+ "blob_top_keys": blob_keys[:20],
259
+ "text_preview": (distilled.get("text") or "")[:200],
260
+ }
261
+
262
+ spec_entries.append({
263
+ "method": method,
264
+ "path": path_norm,
265
+ "query_params": parsed_url.query or None,
266
+ "request_body": None,
267
+ "status_code": resp.get("status", 0),
268
+ "response_content_type": resp_ct,
269
+ "response_body_sample": distilled_summary,
270
+ "auth_observed": has_auth,
271
+ "is_html_page": True,
272
+ # Store full distilled dict so the agent can retrieve it via search_endpoints
273
+ "_distilled": distilled,
274
+ })
275
+ else:
276
+ spec_entries.append({
277
+ "method": method,
278
+ "path": path_norm,
279
+ "query_params": parsed_url.query or None,
280
+ "request_body": _extract_body(req),
281
+ "status_code": resp.get("status", 0),
282
+ "response_content_type": resp_ct,
283
+ "response_body_sample": _truncate_response_sample(resp),
284
+ "auth_observed": has_auth,
285
+ "is_html_page": False,
286
+ })
287
 
288
  return spec_entries
289
 
 
296
  f"status: {entry['status_code']}",
297
  f"auth: {'required' if entry['auth_observed'] else 'none'}",
298
  ]
299
+ if entry.get("is_html_page"):
300
+ parts.append("type: html_page")
301
+ sample = entry.get("response_body_sample") or {}
302
+ if sample.get("title"):
303
+ parts.append(f"title: {sample['title']}")
304
+ if sample.get("blob_top_keys"):
305
+ parts.append(f"data_keys: {' '.join(sample['blob_top_keys'][:15])}")
306
+ if sample.get("text_preview"):
307
+ parts.append(f"text: {sample['text_preview'][:200]}")
308
+ else:
309
+ if entry.get("query_params"):
310
+ parts.append(f"query: {entry['query_params']}")
311
+ if entry.get("request_body"):
312
+ body = entry["request_body"]
313
+ body_str = json.dumps(body)[:_BODY_SAMPLE_CHARS] if not isinstance(body, str) else body[:_BODY_SAMPLE_CHARS]
314
+ parts.append(f"body: {body_str}")
315
+ if entry.get("response_body_sample") is not None:
316
+ rsp = entry["response_body_sample"]
317
+ rsp_str = json.dumps(rsp)[:_BODY_SAMPLE_CHARS] if not isinstance(rsp, str) else str(rsp)[:_BODY_SAMPLE_CHARS]
318
+ parts.append(f"response_sample: {rsp_str}")
319
  return " | ".join(parts)
320
 
321
 
 
436
  return _embed_with_cache([query])
437
 
438
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
439
  def build_endpoint_embeddings(spec_entries: list[dict], app_name: str):
440
  """
441
+ Build embeddings for HAR-extracted spec entries.
442
  Returns (embeddings_array, text_chunks).
443
  Embeddings are retrieved from or saved to the persistent cache.
444
  """
 
515
  flush=True,
516
  )
517
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
518
  # Build / retrieve embeddings via cache
519
  if spec_entries and episode_store is not None:
520
  try:
 
534
  _store_empty(episode_store, app_name)
535
 
536
  summary = [{"method": e["method"], "path": e["path"]} for e in spec_entries]
537
+ api_count = sum(1 for e in spec_entries if not e.get("is_html_page"))
538
+ html_count = sum(1 for e in spec_entries if e.get("is_html_page"))
539
  return {
540
  "app": app_name,
541
  "endpoints": summary,
542
  "total_endpoints": len(summary),
543
+ "api_endpoints": api_count,
544
+ "html_pages": html_count,
545
  "note": (
546
+ f"Discovered {api_count} API endpoints and {html_count} HTML page(s) "
547
+ f"from recorded traffic. "
548
+ "Use search_endpoints(query) to get full schema, parameters, auth details, "
549
+ "and page content (for HTML pages: embedded data blobs, forms, CSRF tokens)."
550
  ),
551
  }
552
 
server/tools/curl_exec.py CHANGED
@@ -375,31 +375,49 @@ def curl_exec(command: str, session_state: dict, episode_store: dict,
375
  except (json.JSONDecodeError, ValueError):
376
  parsed_body = body_text
377
 
378
- # Extract tokens from body
379
- _extract_tokens_from_body(parsed_body, session_state)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
380
 
381
- # Index into episode BM25 store BEFORE truncation
382
  _index_into_episode_store(
383
  episode_store=episode_store,
384
  request_body=parsed["body"],
385
- response_body=parsed_body,
386
  url=parsed["url"],
387
  method=parsed["method"],
388
  status_code=status_code,
389
  )
390
 
391
- # Apply smart truncation
392
- if status_code >= 400:
393
- # Never truncate errors
394
- truncated_body = parsed_body
395
- else:
396
- body_for_truncation = body_text if isinstance(parsed_body, str) else json.dumps(parsed_body)
397
- truncated_body = smart_truncate(body_for_truncation, resp_ct)
398
-
399
  return {
400
  "status_code": status_code,
401
  "headers": resp_headers,
402
  "body": truncated_body,
 
 
 
403
  }
404
 
405
 
@@ -410,10 +428,18 @@ def curl_exec(command: str, session_state: dict, episode_store: dict,
410
  def _index_into_episode_store(episode_store: dict, request_body: Any,
411
  response_body: Any, url: str, method: str,
412
  status_code: int) -> None:
413
- """Index request/response into episode BM25 store for search_episode_data()."""
 
 
 
 
 
 
 
414
  if "bm25_corpus" not in episode_store:
415
  episode_store["bm25_corpus"] = []
416
  episode_store["bm25_metadata"] = []
 
417
 
418
  def _to_text(obj: Any) -> str:
419
  if obj is None:
@@ -422,13 +448,24 @@ def _index_into_episode_store(episode_store: dict, request_body: Any,
422
  return obj
423
  return json.dumps(obj)
424
 
425
- entry_text = f"url: {url} | method: {method} | status: {status_code} | " \
426
- f"request: {_to_text(request_body)} | response: {_to_text(response_body)}"
 
 
 
 
 
 
 
 
 
427
 
 
428
  episode_store["bm25_corpus"].append(entry_text)
429
  episode_store["bm25_metadata"].append({
430
  "url": url,
431
  "method": method,
432
  "status_code": status_code,
433
- "response_body": response_body,
434
  })
 
 
 
375
  except (json.JSONDecodeError, ValueError):
376
  parsed_body = body_text
377
 
378
+ # Distil HTML responses into structured compact form
379
+ is_html_response = "text/html" in resp_ct
380
+ if is_html_response and isinstance(parsed_body, str) and parsed_body:
381
+ from .html_distiller import distill_html, distill_html_compact
382
+ distilled = distill_html(parsed_body, base_url=parsed["url"])
383
+ # Auto-extract form_key from HTML forms into session_state for reuse
384
+ for form in distilled.get("forms", []):
385
+ fk = form.get("fields", {}).get("form_key")
386
+ if fk and fk != "hidden":
387
+ session_state["form_key"] = fk
388
+ break
389
+ # Store the full distilled dict (not raw HTML) for search_episode_data
390
+ raw_body_for_store = distilled
391
+ # What we return to the agent is the compact text summary
392
+ truncated_body: Any = distill_html_compact(parsed_body, base_url=parsed["url"])
393
+ else:
394
+ raw_body_for_store = parsed_body
395
+ # Extract tokens from body (only for non-HTML responses)
396
+ _extract_tokens_from_body(parsed_body, session_state)
397
+ # Apply smart truncation
398
+ if status_code >= 400:
399
+ truncated_body = parsed_body
400
+ else:
401
+ body_for_truncation = body_text if isinstance(parsed_body, str) else json.dumps(parsed_body)
402
+ truncated_body = smart_truncate(body_for_truncation, resp_ct)
403
 
404
+ # Index into episode BM25 store
405
  _index_into_episode_store(
406
  episode_store=episode_store,
407
  request_body=parsed["body"],
408
+ response_body=raw_body_for_store,
409
  url=parsed["url"],
410
  method=parsed["method"],
411
  status_code=status_code,
412
  )
413
 
 
 
 
 
 
 
 
 
414
  return {
415
  "status_code": status_code,
416
  "headers": resp_headers,
417
  "body": truncated_body,
418
+ # _judge_body: full structured body for the judge (not shown to the model)
419
+ # For HTML: the distilled dict; for JSON/text: same as body
420
+ "_judge_body": raw_body_for_store,
421
  }
422
 
423
 
 
428
  def _index_into_episode_store(episode_store: dict, request_body: Any,
429
  response_body: Any, url: str, method: str,
430
  status_code: int) -> None:
431
+ """
432
+ Index request/response into the episode store for search_episode_data().
433
+
434
+ Three parallel structures are maintained:
435
+ bm25_corpus — truncated text strings for BM25 / embedding (lean, fast)
436
+ bm25_metadata — url/method/status_code per entry (no body, saves memory)
437
+ episode_raw_bodies — {index: full_untruncated_response_body} for retrieval
438
+ """
439
  if "bm25_corpus" not in episode_store:
440
  episode_store["bm25_corpus"] = []
441
  episode_store["bm25_metadata"] = []
442
+ episode_store["episode_raw_bodies"] = {}
443
 
444
  def _to_text(obj: Any) -> str:
445
  if obj is None:
 
448
  return obj
449
  return json.dumps(obj)
450
 
451
+ # Lean text for BM25 / embedding cap at 2000 chars so embeddings stay within
452
+ # the model's token limit without losing the key signal (url + first part of body).
453
+ # For distilled HTML (stored as a dict), serialize the distilled form — it's already
454
+ # compact (text content, blob keys, form actions) rather than raw HTML.
455
+ resp_text = _to_text(response_body)
456
+ lean_resp = resp_text[:2000] if len(resp_text) > 2000 else resp_text
457
+
458
+ entry_text = (
459
+ f"url: {url} method: {method} status: {status_code} "
460
+ f"request: {_to_text(request_body)} response: {lean_resp}"
461
+ )
462
 
463
+ idx = len(episode_store["bm25_corpus"])
464
  episode_store["bm25_corpus"].append(entry_text)
465
  episode_store["bm25_metadata"].append({
466
  "url": url,
467
  "method": method,
468
  "status_code": status_code,
 
469
  })
470
+ # Store full untruncated body keyed by index — never truncated
471
+ episode_store["episode_raw_bodies"][idx] = response_body
server/tools/html_distiller.py ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ html_distiller — technology-agnostic HTML distillation for the RL agent.
3
+
4
+ Converts an HTML response body into a compact, structured dict that the agent
5
+ and the embedding index can work with. Raw HTML is never returned as-is —
6
+ it is expensive (200 KB+) and mostly noise (CSS classes, JS bundles, nav chrome).
7
+
8
+ What is extracted (in priority order):
9
+ 1. Embedded JSON data blobs — server-injected structured data that is the
10
+ *actual payload* for SSR pages:
11
+ • <script type="application/json"> (Next.js, generic)
12
+ • <script type="text/x-magento-init"> (Magento 2)
13
+ • window.__INITIAL_STATE__ = {...} (Redux-style SSR)
14
+ • window.__NEXT_DATA__ = {...} (Next.js legacy)
15
+ • window.__nuxt__ = {...} / window.__NUXT__ = {} (Nuxt.js)
16
+ • <script id="__NEXT_DATA__"> (Next.js)
17
+ • Any <script> tag containing only valid JSON
18
+ These are technology-specific patterns, but the extraction logic is written
19
+ generically — it looks for the common conventions rather than hardcoding
20
+ Magento. A React/Next.js app will be handled by the same code path.
21
+
22
+ 2. HTML forms — discovers new POST endpoints (form.action) and captures
23
+ auth-critical fields (CSRF tokens, hidden inputs).
24
+
25
+ 3. Visible text content — the human-readable body after stripping all
26
+ scripts, styles, and nav/header/footer chrome. Capped at MAX_TEXT_CHARS.
27
+
28
+ Output schema (always a dict with the same keys — absent items are None/[]):
29
+ {
30
+ "page_type": str, # "data_page" | "form_page" | "text_page"
31
+ "title": str | None, # <title> text
32
+ "description": str | None, # <meta name="description">
33
+ "data_blobs": [ # extracted JSON payloads
34
+ {"source": str, "data": any, "keys": [str]} # keys = top-level keys
35
+ ],
36
+ "forms": [
37
+ {
38
+ "action": str, # endpoint URL (relative or absolute)
39
+ "method": str, # GET | POST
40
+ "fields": { # name → value (includes hidden inputs)
41
+ "field_name": "field_value_or_type"
42
+ }
43
+ }
44
+ ],
45
+ "text": str | None, # stripped visible text (capped)
46
+ "raw_truncated": str, # first RAW_PREVIEW_CHARS of raw HTML (fallback)
47
+ }
48
+
49
+ Usage:
50
+ from server.tools.html_distiller import distill_html
51
+
52
+ result = distill_html(html_string, base_url="http://example.com/page")
53
+ # result["data_blobs"] — structured data, e.g. product listings
54
+ # result["forms"] — form actions + CSRF tokens
55
+ # result["text"] — stripped readable text
56
+ """
57
+
58
+ from __future__ import annotations
59
+
60
+ import json
61
+ import re
62
+ from typing import Any
63
+ from urllib.parse import urljoin
64
+
65
+ try:
66
+ from bs4 import BeautifulSoup
67
+ _BS4_AVAILABLE = True
68
+ except ImportError:
69
+ _BS4_AVAILABLE = False
70
+
71
+
72
+ # ---------------------------------------------------------------------------
73
+ # Constants
74
+ # ---------------------------------------------------------------------------
75
+
76
+ MAX_TEXT_CHARS = 20000 # max chars of stripped visible text to keep
77
+ MAX_BLOB_KEYS = 40 # max top-level keys to surface from a JSON blob
78
+ MAX_BLOB_DEPTH_PREVIEW = 2 # how many levels of nesting to summarise
79
+ RAW_PREVIEW_CHARS = 1000 # fallback raw HTML preview if BS4 unavailable
80
+ MAX_BLOBS = 10 # max embedded JSON blobs to extract
81
+ MAX_FORMS = 5 # max forms to extract
82
+ MAX_ITEMS_IN_ARRAY = 3 # preview items for large arrays in blobs
83
+
84
+
85
+ # ---------------------------------------------------------------------------
86
+ # Public entry point
87
+ # ---------------------------------------------------------------------------
88
+
89
+ def distill_html(html: str, base_url: str = "") -> dict:
90
+ """
91
+ Distil an HTML page into a structured, compact representation.
92
+
93
+ Args:
94
+ html: Raw HTML string (may be very large).
95
+ base_url: The URL this page was fetched from, used to resolve
96
+ relative URLs in form actions.
97
+
98
+ Returns:
99
+ Distilled dict (see module docstring for schema).
100
+ """
101
+ if not html:
102
+ return _empty_result()
103
+
104
+ if not _BS4_AVAILABLE:
105
+ return {
106
+ **_empty_result(),
107
+ "raw_truncated": html[:RAW_PREVIEW_CHARS],
108
+ "_note": "beautifulsoup4 not installed; only raw preview returned.",
109
+ }
110
+
111
+ try:
112
+ # lxml is faster and more forgiving than html.parser for large pages
113
+ soup = BeautifulSoup(html, "lxml")
114
+ except Exception:
115
+ soup = BeautifulSoup(html, "html.parser")
116
+
117
+ title = _extract_title(soup)
118
+ description = _extract_meta_description(soup)
119
+ data_blobs = _extract_data_blobs(soup)
120
+ forms = _extract_forms(soup, base_url)
121
+ text = _extract_visible_text(soup)
122
+
123
+ # Determine page_type based on what we found
124
+ if data_blobs:
125
+ page_type = "data_page"
126
+ elif forms:
127
+ page_type = "form_page"
128
+ else:
129
+ page_type = "text_page"
130
+
131
+ return {
132
+ "page_type": page_type,
133
+ "title": title,
134
+ "description": description,
135
+ "data_blobs": data_blobs,
136
+ "forms": forms,
137
+ "text": text,
138
+ "raw_truncated": html[:RAW_PREVIEW_CHARS],
139
+ }
140
+
141
+
142
+ def distill_html_compact(html: str, base_url: str = "") -> str:
143
+ """
144
+ Return a compact text representation of the distilled HTML,
145
+ suitable for returning to the agent in curl_exec responses.
146
+
147
+ Aims for < 3000 chars while preserving all actionable information.
148
+ """
149
+ d = distill_html(html, base_url)
150
+
151
+ parts: list[str] = []
152
+
153
+ if d["title"]:
154
+ parts.append(f"[Page: {d['title']}]")
155
+
156
+ if d["description"]:
157
+ parts.append(f"[Description: {d['description']}]")
158
+
159
+ if d["data_blobs"]:
160
+ parts.append(f"[Embedded data — {len(d['data_blobs'])} block(s)]")
161
+ for i, blob in enumerate(d["data_blobs"]):
162
+ src = blob.get("source", "?")
163
+ data = blob.get("data")
164
+ preview = _compact_blob_preview(data)
165
+ parts.append(f" blob[{i}] from <{src}>: {preview}")
166
+
167
+ if d["forms"]:
168
+ parts.append(f"[Forms — {len(d['forms'])} found]")
169
+ for form in d["forms"]:
170
+ action = form["action"] or "(current page)"
171
+ method = form["method"]
172
+ fields = form["fields"]
173
+ # Strip noisy base64-encoded redirect fields; keep actionable fields only
174
+ _SKIP_FIELDS = {"uenc"}
175
+ clean_fields = {k: v for k, v in fields.items() if k not in _SKIP_FIELDS}
176
+ csrf = {k: v for k, v in clean_fields.items()
177
+ if "csrf" in k.lower() or "token" in k.lower()
178
+ or k.startswith("_") or clean_fields.get(k, "") == "hidden"}
179
+ field_summary = ", ".join(f"{k}={repr(v)}" for k, v in list(clean_fields.items())[:6])
180
+ parts.append(f" {method} {action}")
181
+ parts.append(f" fields: {field_summary}")
182
+ if csrf:
183
+ parts.append(f" csrf/hidden: {csrf}")
184
+
185
+ if d["text"]:
186
+ parts.append(f"[Text content]\n{d['text'][:800]}")
187
+
188
+ result = "\n".join(parts)
189
+ if not result:
190
+ # Absolute fallback: raw preview
191
+ return html[:RAW_PREVIEW_CHARS]
192
+ return result
193
+
194
+
195
+ # ---------------------------------------------------------------------------
196
+ # Extraction helpers
197
+ # ---------------------------------------------------------------------------
198
+
199
+ def _extract_title(soup) -> str | None:
200
+ tag = soup.find("title")
201
+ if tag:
202
+ return tag.get_text(strip=True) or None
203
+ return None
204
+
205
+
206
+ def _extract_meta_description(soup) -> str | None:
207
+ tag = soup.find("meta", attrs={"name": "description"})
208
+ if tag and tag.get("content"):
209
+ return tag["content"].strip() or None
210
+ return None
211
+
212
+
213
+ # Patterns for window.X = {...} assignments in inline scripts
214
+ _WINDOW_ASSIGN_RE = re.compile(
215
+ r'window\.__?([A-Za-z0-9_]+)__?\s*=\s*(\{.*?\}|\[.*?\])',
216
+ re.DOTALL,
217
+ )
218
+
219
+ # Known SSR data script types
220
+ _DATA_SCRIPT_TYPES = {
221
+ "application/json",
222
+ "text/x-magento-init",
223
+ "application/ld+json", # structured data / schema.org
224
+ }
225
+
226
+ # Known SSR script IDs
227
+ _DATA_SCRIPT_IDS = {
228
+ "__next_data__",
229
+ "__nuxt__",
230
+ "initial-state",
231
+ "redux-state",
232
+ "app-state",
233
+ "page-data",
234
+ "server-data",
235
+ "bootstrap-data",
236
+ }
237
+
238
+
239
+ def _try_parse_json(text: str) -> tuple[bool, Any]:
240
+ """Returns (success, parsed_value)."""
241
+ text = text.strip()
242
+ if not text:
243
+ return False, None
244
+ try:
245
+ return True, json.loads(text)
246
+ except (json.JSONDecodeError, ValueError):
247
+ return False, None
248
+
249
+
250
+ def _summarise_json_keys(obj: Any, depth: int = 0) -> list[str]:
251
+ """Return top-level keys (and one level of nested keys) for a JSON object."""
252
+ if not isinstance(obj, dict):
253
+ if isinstance(obj, list) and obj:
254
+ return _summarise_json_keys(obj[0], depth)
255
+ return []
256
+ keys = list(obj.keys())
257
+ if depth < 1:
258
+ nested = []
259
+ for k in keys[:5]:
260
+ v = obj[k]
261
+ if isinstance(v, dict):
262
+ sub = list(v.keys())[:5]
263
+ nested.append(f"{k}.{{{','.join(sub)}}}")
264
+ elif isinstance(v, list) and v and isinstance(v[0], dict):
265
+ sub = list(v[0].keys())[:4]
266
+ nested.append(f"{k}[].{{{','.join(sub)}}}")
267
+ return keys + nested
268
+ return keys
269
+
270
+
271
+ def _extract_data_blobs(soup) -> list[dict]:
272
+ """
273
+ Extract all embedded JSON data blobs from <script> tags and window.X = {...} patterns.
274
+ """
275
+ blobs: list[dict] = []
276
+ seen_sources: set[str] = set()
277
+
278
+ # 1. <script type="..."> tags with known data types
279
+ for script in soup.find_all("script"):
280
+ if len(blobs) >= MAX_BLOBS:
281
+ break
282
+
283
+ script_type = (script.get("type") or "").lower().strip()
284
+ script_id = (script.get("id") or "").lower().strip()
285
+ text = script.string or ""
286
+
287
+ source = None
288
+ if script_type in _DATA_SCRIPT_TYPES:
289
+ source = script_type
290
+ elif script_id in _DATA_SCRIPT_IDS:
291
+ source = f"id={script.get('id')}"
292
+ elif script_type in ("", "text/javascript", "module"):
293
+ # Check for window.X = {...} patterns
294
+ for m in _WINDOW_ASSIGN_RE.finditer(text):
295
+ var_name = f"window.__{m.group(1)}__"
296
+ ok, data = _try_parse_json(m.group(2))
297
+ if ok and isinstance(data, (dict, list)):
298
+ source_key = var_name
299
+ if source_key not in seen_sources:
300
+ seen_sources.add(source_key)
301
+ blobs.append({
302
+ "source": var_name,
303
+ "data": _preview_blob(data),
304
+ "keys": _summarise_json_keys(data)[:MAX_BLOB_KEYS],
305
+ })
306
+ continue # already handled window patterns above
307
+ else:
308
+ continue
309
+
310
+ if not text.strip():
311
+ continue
312
+
313
+ ok, data = _try_parse_json(text)
314
+ if not ok:
315
+ continue
316
+
317
+ # Skip tiny or trivially small blobs (no useful data)
318
+ if isinstance(data, dict) and len(data) <= 1 and not any(
319
+ isinstance(v, (dict, list)) for v in data.values()
320
+ ):
321
+ continue
322
+
323
+ source_key = f"{source}:{script_id or 'anon'}"
324
+ if source_key in seen_sources:
325
+ continue
326
+ seen_sources.add(source_key)
327
+
328
+ blobs.append({
329
+ "source": source,
330
+ "data": _preview_blob(data),
331
+ "keys": _summarise_json_keys(data)[:MAX_BLOB_KEYS],
332
+ })
333
+
334
+ return blobs
335
+
336
+
337
+ def _preview_blob(data: Any) -> Any:
338
+ """
339
+ Return a compact preview of a JSON blob — large arrays are trimmed,
340
+ deeply nested objects are summarised.
341
+ """
342
+ if isinstance(data, list):
343
+ if len(data) > MAX_ITEMS_IN_ARRAY:
344
+ return {
345
+ "sample": [_preview_blob(item) for item in data[:MAX_ITEMS_IN_ARRAY]],
346
+ "total": len(data),
347
+ "_note": f"{len(data)} items total. Use search_episode_data() for specifics.",
348
+ }
349
+ return [_preview_blob(item) for item in data]
350
+
351
+ if isinstance(data, dict):
352
+ result = {}
353
+ for k, v in list(data.items())[:MAX_BLOB_KEYS]:
354
+ if isinstance(v, list) and len(v) > MAX_ITEMS_IN_ARRAY:
355
+ result[k] = {
356
+ "sample": [_preview_blob(i) for i in v[:MAX_ITEMS_IN_ARRAY]],
357
+ "total": len(v),
358
+ "_note": f"{len(v)} items. Use search_episode_data() for specifics.",
359
+ }
360
+ elif isinstance(v, dict) and len(v) > 30:
361
+ # Only collapse very large dicts — preserve small-to-medium ones fully
362
+ # since they often contain critical IDs (e.g. product option configs)
363
+ result[k] = {
364
+ "_keys": list(v.keys())[:20],
365
+ "_note": "large nested object — call search_episode_data() for full content",
366
+ }
367
+ else:
368
+ result[k] = v
369
+ return result
370
+
371
+ return data
372
+
373
+
374
+ def _extract_forms(soup, base_url: str) -> list[dict]:
375
+ """
376
+ Extract all forms: action URL, method, and all named fields with their values.
377
+ Hidden inputs (CSRF tokens, form_key, etc.) are included.
378
+ """
379
+ forms = []
380
+ for form in soup.find_all("form")[:MAX_FORMS]:
381
+ action = form.get("action", "") or ""
382
+ if base_url and action and not action.startswith("http"):
383
+ action = urljoin(base_url, action)
384
+ method = (form.get("method") or "GET").upper()
385
+
386
+ fields: dict[str, str] = {}
387
+ for inp in form.find_all(["input", "select", "textarea"]):
388
+ name = inp.get("name")
389
+ if not name:
390
+ continue
391
+ inp_type = (inp.get("type") or "text").lower()
392
+ value = inp.get("value", "")
393
+ if inp_type == "hidden":
394
+ # Hidden inputs: store actual value (CSRF tokens etc.)
395
+ fields[name] = value
396
+ elif inp_type in ("submit", "button", "reset"):
397
+ continue
398
+ elif inp_type == "checkbox":
399
+ fields[name] = "checkbox"
400
+ elif inp_type == "radio":
401
+ if name not in fields:
402
+ fields[name] = "radio"
403
+ else:
404
+ # text, email, password, number, etc.
405
+ fields[name] = inp_type if not value else value
406
+
407
+ forms.append({
408
+ "action": action,
409
+ "method": method,
410
+ "fields": fields,
411
+ })
412
+
413
+ return forms
414
+
415
+
416
+ # Tags whose text content is irrelevant noise
417
+ _NOISE_TAGS = {
418
+ "script", "style", "noscript", "head", "meta", "link",
419
+ "header", "footer", "nav", "aside",
420
+ "svg", "path", "symbol",
421
+ "[document]",
422
+ }
423
+
424
+
425
+ def _extract_visible_text(soup) -> str | None:
426
+ """
427
+ Extract visible text content from the page.
428
+
429
+ Strips scripts, styles, navigation, and other noise.
430
+ Returns plain text, capped at MAX_TEXT_CHARS.
431
+ """
432
+ # Remove noise tags in-place
433
+ for tag in soup.find_all(_NOISE_TAGS):
434
+ tag.decompose()
435
+
436
+ # Get text from what's left — use separator so words don't jam together
437
+ text = soup.get_text(separator=" ", strip=True)
438
+
439
+ # Collapse whitespace
440
+ text = re.sub(r"\s{2,}", " ", text).strip()
441
+
442
+ if not text:
443
+ return None
444
+
445
+ return text[:MAX_TEXT_CHARS]
446
+
447
+
448
+ def _compact_blob_preview(data: Any) -> str:
449
+ """One-line preview of a JSON blob for the compact text representation."""
450
+ if data is None:
451
+ return "null"
452
+ if isinstance(data, bool):
453
+ return str(data).lower()
454
+ if isinstance(data, (int, float)):
455
+ return str(data)
456
+ if isinstance(data, str):
457
+ return repr(data[:80])
458
+ if isinstance(data, list):
459
+ total = data.get("total") if isinstance(data, dict) else len(data)
460
+ sample = data.get("sample") if isinstance(data, dict) else data[:1]
461
+ if sample:
462
+ first_keys = list(sample[0].keys())[:4] if isinstance(sample[0], dict) else []
463
+ return f"array({total} items), first keys: {first_keys}"
464
+ return f"array({len(data)} items)"
465
+ if isinstance(data, dict):
466
+ # If it has a "total" note it's our preview wrapper
467
+ if "_note" in data and "total" in data:
468
+ sample = data.get("sample", [])
469
+ keys = list(sample[0].keys())[:4] if sample and isinstance(sample[0], dict) else []
470
+ return f"array({data['total']} items), first item keys: {keys}"
471
+ keys = list(data.keys())[:8]
472
+ return f"object({len(data)} keys): {keys}"
473
+ return str(data)[:100]
474
+
475
+
476
+ def _empty_result() -> dict:
477
+ return {
478
+ "page_type": "text_page",
479
+ "title": None,
480
+ "description": None,
481
+ "data_blobs": [],
482
+ "forms": [],
483
+ "text": None,
484
+ "raw_truncated": "",
485
+ }
server/tools/search_episode_data.py CHANGED
@@ -1,87 +1,320 @@
1
  """
2
- search_episode_data tool BM25 + semantic search over accumulated episode response data.
3
 
4
- Searches all request/response bodies from prior curl_exec calls in this episode.
 
 
 
 
 
 
 
 
5
  """
6
 
7
  from __future__ import annotations
8
 
9
  import json
 
10
  import re
11
  from typing import Any
12
 
13
 
14
- def search_episode_data(query: str, episode_store: dict) -> list[dict]:
15
- """
16
- Hybrid BM25 + keyword search over episode accumulated response bodies.
17
 
18
- Args:
19
- query: Keyword or natural language query (e.g. "Radiant Tee sku", "_csrf_token")
20
- episode_store: Per-episode store containing bm25_corpus and bm25_metadata
 
 
 
 
 
 
 
 
 
21
 
22
- Returns:
23
- Top-5 matching JSON objects from episode history, annotated with step info
24
  """
25
- corpus: list[str] = episode_store.get("bm25_corpus", [])
26
- metadata: list[dict] = episode_store.get("bm25_metadata", [])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- if not corpus:
29
- return [{"note": "No episode data yet. Make API calls with curl_exec() first."}]
30
 
31
- # Try BM25 ranking
 
 
 
 
 
 
 
 
 
 
32
  try:
33
- from rank_bm25 import BM25Okapi
 
 
 
 
 
 
34
 
35
- tokenized_corpus = [_tokenize(doc) for doc in corpus]
36
- tokenized_query = _tokenize(query)
37
- bm25 = BM25Okapi(tokenized_corpus)
38
- scores = bm25.get_scores(tokenized_query)
39
 
40
- # Get top 5 by BM25 score
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  import numpy as np
42
- top_k = min(5, len(scores))
43
- top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
44
-
45
- results = []
46
- for idx in top_indices:
47
- if scores[idx] > 0:
48
- meta = metadata[idx]
49
- result = {
50
- "step": idx + 1,
51
- "url": meta.get("url", ""),
52
- "method": meta.get("method", ""),
53
- "status_code": meta.get("status_code", 0),
54
- "data": meta.get("response_body"),
55
- }
56
- results.append(result)
57
-
58
- if results:
59
- return results
 
 
 
 
 
 
 
 
 
 
 
60
 
 
 
 
 
 
 
 
 
 
61
  except ImportError:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  pass
63
- except Exception as e:
64
- print(f"[search_episode_data] BM25 error: {e}", flush=True)
65
 
66
- # Fallback: keyword match
67
- query_lower = query.lower()
68
- query_terms = query_lower.split()
69
- results = []
70
- for idx, doc in enumerate(corpus):
71
- if any(term in doc.lower() for term in query_terms):
72
- meta = metadata[idx]
73
- results.append({
74
- "step": idx + 1,
75
- "url": meta.get("url", ""),
76
- "method": meta.get("method", ""),
77
- "status_code": meta.get("status_code", 0),
78
- "data": meta.get("response_body"),
79
- })
80
- return results[:5] if results else [{"note": f"No results found for: {query}"}]
81
 
82
 
83
  def _tokenize(text: str) -> list[str]:
84
- """Simple whitespace + punctuation tokenizer for BM25."""
85
  text = text.lower()
86
  tokens = re.findall(r"[a-z0-9_\-\.]+", text)
87
  return tokens if tokens else [""]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """
2
+ search_episode_data — semantic + BM25 search over accumulated episode API responses.
3
 
4
+ Each curl_exec call stores its full, untruncated response body in episode_store under
5
+ ``episode_raw_bodies``. This tool embeds those bodies (via the same HF API used by
6
+ browser_agent) and performs cosine-similarity search against the model's query, falling
7
+ back to BM25 keyword search when embeddings are unavailable.
8
+
9
+ Results are returned as compact previews so they fit in the LLM context window:
10
+ - Nested trees (e.g. category trees with children_data) are flattened to id+name pairs.
11
+ - Large item arrays are shown as a short sample with a total-count note.
12
+ - The model can issue more specific queries to drill into any result.
13
  """
14
 
15
  from __future__ import annotations
16
 
17
  import json
18
+ import os
19
  import re
20
  from typing import Any
21
 
22
 
23
+ # ---------------------------------------------------------------------------
24
+ # Compact preview helpers
25
+ # ---------------------------------------------------------------------------
26
 
27
+ def _flatten_tree(obj: Any, id_key: str = "id", name_key: str = "name") -> list[dict]:
28
+ """Recursively flatten any nested tree structure into [{id, name}] pairs."""
29
+ results: list[dict] = []
30
+ if isinstance(obj, dict):
31
+ if id_key in obj and name_key in obj:
32
+ results.append({id_key: obj[id_key], name_key: obj[name_key]})
33
+ for v in obj.values():
34
+ results.extend(_flatten_tree(v, id_key, name_key))
35
+ elif isinstance(obj, list):
36
+ for item in obj:
37
+ results.extend(_flatten_tree(item, id_key, name_key))
38
+ return results
39
 
40
+
41
+ def _compact_preview(response_body: Any, max_items: int = 3) -> dict:
42
  """
43
+ Return a compact, context-friendly preview of a response body.
44
+
45
+ - Distilled HTML (has page_type key) → structured summary with forms/products.
46
+ - Nested trees with children_data → flat {id, name} list.
47
+ - Lists / items arrays → short sample + total count.
48
+ - Scalars / errors → returned as-is.
49
+ - The preview always includes a note showing how many objects exist in total.
50
+ """
51
+ if not isinstance(response_body, (dict, list)):
52
+ return {"value": response_body}
53
+
54
+ # --- distilled HTML page (from html_distiller) ---
55
+ if isinstance(response_body, dict) and "page_type" in response_body and "forms" in response_body:
56
+ result: dict = {}
57
+ if response_body.get("title"):
58
+ result["page_title"] = response_body["title"]
59
+ # Forms — most actionable: show action URL, method, and fields (strip base64 uenc)
60
+ forms = response_body.get("forms", [])
61
+ if forms:
62
+ clean_forms = []
63
+ for form in forms[:8]:
64
+ fields = {k: v for k, v in form.get("fields", {}).items()
65
+ if k not in ("uenc",) and len(str(v)) < 100}
66
+ clean_forms.append({
67
+ "action": form.get("action", ""),
68
+ "method": form.get("method", "GET"),
69
+ "fields": fields,
70
+ })
71
+ result["forms"] = clean_forms
72
+ # Data blobs — show top-level keys and compact preview of small blobs
73
+ blobs = response_body.get("data_blobs", [])
74
+ if blobs:
75
+ blob_summary = []
76
+ for blob in blobs[:3]:
77
+ data = blob.get("data")
78
+ if isinstance(data, (dict, list)):
79
+ s = json.dumps(data)
80
+ blob_summary.append({"source": blob.get("source"), "preview": s[:300]})
81
+ else:
82
+ blob_summary.append({"source": blob.get("source"), "keys": blob.get("keys", [])})
83
+ result["data_blobs"] = blob_summary
84
+ # Visible text — first 600 chars
85
+ text = response_body.get("text")
86
+ if text:
87
+ result["page_text"] = text[:600]
88
+ return result
89
+
90
+ # --- nested tree (e.g. category tree) ---
91
+ if isinstance(response_body, dict) and "children_data" in response_body:
92
+ flat = _flatten_tree(response_body)
93
+ sample = flat[:max_items]
94
+ note = (
95
+ f"Flattened tree — {len(flat)} total entries. "
96
+ f"Showing first {len(sample)}. "
97
+ "Use search_episode_data with a more specific name/id query to find a particular entry."
98
+ )
99
+ return {"entries_sample": sample, "total": len(flat), "note": note}
100
+
101
+ # --- top-level list ---
102
+ if isinstance(response_body, list):
103
+ total = len(response_body)
104
+ sample = [_pick_key_fields(i) for i in response_body[:max_items]]
105
+ note = (
106
+ f"{total} item(s) total. Showing first {len(sample)}. "
107
+ "Refine your search_episode_data query to find a specific item."
108
+ ) if total > max_items else f"{total} item(s)."
109
+ return {"items_sample": sample, "total": total, "note": note}
110
+
111
+ # --- dict with an "items" array (common paginated response) ---
112
+ if isinstance(response_body, dict) and "items" in response_body:
113
+ items = response_body.get("items", [])
114
+ total = response_body.get("total_count", len(items))
115
+ sample = [_pick_key_fields(i) for i in items[:max_items]]
116
+ note = (
117
+ f"{total} item(s) total. Showing first {len(sample)}. "
118
+ "Refine your search_episode_data query to find a specific item."
119
+ ) if len(items) > max_items else f"{len(items)} item(s)."
120
+ result = dict(response_body)
121
+ result["items"] = sample
122
+ result["_preview_note"] = note
123
+ result["total_count"] = total
124
+ return result
125
+
126
+ # --- plain dict — return as-is (usually already small) ---
127
+ return response_body
128
+
129
+
130
+ def _pick_key_fields(item: Any) -> Any:
131
+ """For list items, keep only the most useful fields to reduce context size."""
132
+ if not isinstance(item, dict):
133
+ return item
134
+ KEEP = {"id", "sku", "name", "price", "category_id", "title", "slug",
135
+ "item_id", "quote_id", "qty", "status", "order_id", "email",
136
+ "username", "token", "cartId", "cart_id"}
137
+ kept = {k: v for k, v in item.items() if k in KEEP}
138
+ return kept if kept else item # fallback: return full item if no key fields match
139
 
 
 
140
 
141
+ # ---------------------------------------------------------------------------
142
+ # Text representation for embedding / BM25
143
+ # ---------------------------------------------------------------------------
144
+
145
+ def _body_to_search_text(url: str, method: str, status_code: int,
146
+ response_body: Any) -> str:
147
+ """
148
+ Produce a searchable text string that represents a stored API response.
149
+ We embed this text so the model can find responses by semantic query.
150
+ The full body is stored separately (in episode_raw_bodies) for retrieval.
151
+ """
152
  try:
153
+ body_str = json.dumps(response_body) if not isinstance(response_body, str) else response_body
154
+ except Exception:
155
+ body_str = str(response_body)
156
+
157
+ # Truncate for embedding (model has 512-token limit; 2000 chars is ~400 tokens)
158
+ if len(body_str) > 2000:
159
+ body_str = body_str[:2000]
160
 
161
+ return f"url: {url} method: {method} status: {status_code} response: {body_str}"
 
 
 
162
 
163
+
164
+ # ---------------------------------------------------------------------------
165
+ # Semantic embedding search
166
+ # ---------------------------------------------------------------------------
167
+
168
+ def _get_episode_embeddings(episode_store: dict) -> tuple[Any, list[str]] | None:
169
+ """
170
+ Build or retrieve embeddings for all stored episode responses.
171
+
172
+ Returns (embeddings_array, text_list) or None if embeddings unavailable.
173
+ Embeddings are cached in episode_store["response_embeddings"] after first build.
174
+ New responses added since last build are embedded incrementally.
175
+ """
176
+ try:
177
  import numpy as np
178
+ from .browser_agent import _embed_with_cache
179
+ except ImportError:
180
+ return None
181
+
182
+ texts: list[str] = episode_store.get("bm25_corpus", [])
183
+ if not texts:
184
+ return None
185
+
186
+ cached_embs = episode_store.get("response_embeddings")
187
+ cached_count = len(cached_embs) if cached_embs is not None else 0
188
+
189
+ if cached_count == len(texts):
190
+ # All texts already embedded
191
+ return cached_embs, texts
192
+
193
+ # Embed any new texts added since last call
194
+ new_texts = texts[cached_count:]
195
+ new_embs = _embed_with_cache(new_texts)
196
+ if new_embs is None:
197
+ return None
198
+
199
+ if cached_embs is not None and len(cached_embs) > 0:
200
+ combined = np.vstack([cached_embs, new_embs])
201
+ else:
202
+ combined = new_embs
203
+
204
+ episode_store["response_embeddings"] = combined
205
+ return combined, texts
206
+
207
 
208
+ def _semantic_search(query: str, episode_store: dict,
209
+ top_k: int = 5) -> list[int] | None:
210
+ """
211
+ Return top_k indices ranked by cosine similarity to the query.
212
+ Returns None if embeddings are unavailable (fall back to BM25).
213
+ """
214
+ try:
215
+ import numpy as np
216
+ from .browser_agent import _embed_with_cache
217
  except ImportError:
218
+ return None
219
+
220
+ result = _get_episode_embeddings(episode_store)
221
+ if result is None:
222
+ return None
223
+
224
+ embs, _ = result
225
+ query_emb = _embed_with_cache([query])
226
+ if query_emb is None:
227
+ return None
228
+
229
+ scores = embs @ query_emb[0] # dot product = cosine sim (both L2-normalised)
230
+ top_k = min(top_k, len(scores))
231
+ return sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
232
+
233
+
234
+ # ---------------------------------------------------------------------------
235
+ # BM25 fallback
236
+ # ---------------------------------------------------------------------------
237
+
238
+ def _bm25_search(query: str, corpus: list[str], top_k: int = 5) -> list[int]:
239
+ """Return top_k indices by BM25 score, or keyword-match fallback."""
240
+ try:
241
+ from rank_bm25 import BM25Okapi
242
+ import numpy as np
243
+
244
+ tokenized = [_tokenize(doc) for doc in corpus]
245
+ bm25 = BM25Okapi(tokenized)
246
+ scores = bm25.get_scores(_tokenize(query))
247
+ top = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
248
+ return [i for i in top[:top_k] if scores[i] > 0]
249
+ except Exception:
250
  pass
 
 
251
 
252
+ # Keyword fallback
253
+ q_lower = query.lower()
254
+ terms = q_lower.split()
255
+ hits = [i for i, doc in enumerate(corpus) if any(t in doc.lower() for t in terms)]
256
+ return hits[:top_k]
 
 
 
 
 
 
 
 
 
 
257
 
258
 
259
  def _tokenize(text: str) -> list[str]:
 
260
  text = text.lower()
261
  tokens = re.findall(r"[a-z0-9_\-\.]+", text)
262
  return tokens if tokens else [""]
263
+
264
+
265
+ # ---------------------------------------------------------------------------
266
+ # Public API
267
+ # ---------------------------------------------------------------------------
268
+
269
+ def search_episode_data(query: str, episode_store: dict) -> list[dict]:
270
+ """
271
+ Semantic + BM25 search over all API responses collected during this episode.
272
+
273
+ Each response is stored in full (untruncated) in the episode store.
274
+ Results are returned as compact previews so they fit the LLM context window:
275
+ - Nested trees are flattened to {id, name} pairs with a total-count note.
276
+ - Large arrays show a short sample with a note like "47 items total".
277
+ - Use more specific queries to drill into a particular response.
278
+
279
+ Args:
280
+ query: Natural language or keyword query (e.g. "category id for Pants",
281
+ "cart id", "SKU for Radiant Tee", "_csrf_token").
282
+ episode_store: Per-episode mutable store populated by curl_exec.
283
+
284
+ Returns:
285
+ List of up to 5 matching results, each with:
286
+ step, url, method, status_code, data (compact preview).
287
+ """
288
+ corpus: list[str] = episode_store.get("bm25_corpus", [])
289
+ metadata: list[dict] = episode_store.get("bm25_metadata", [])
290
+
291
+ if not corpus:
292
+ return [{"note": "No episode data yet. Make API calls with curl_exec() first."}]
293
+
294
+ # Try semantic search first
295
+ indices = _semantic_search(query, episode_store, top_k=5)
296
+
297
+ # Fall back to BM25 if semantic unavailable
298
+ if indices is None:
299
+ indices = _bm25_search(query, corpus, top_k=5)
300
+
301
+ if not indices:
302
+ return [{"note": f"No results found for: {query!r}. "
303
+ "Try a different query or check your curl_exec call history."}]
304
+
305
+ results = []
306
+ for idx in indices:
307
+ if idx >= len(metadata):
308
+ continue
309
+ meta = metadata[idx]
310
+ # Full untruncated body is in episode_raw_bodies; metadata holds it too
311
+ raw_body = episode_store.get("episode_raw_bodies", {}).get(idx, meta.get("response_body"))
312
+ results.append({
313
+ "step": idx + 1,
314
+ "url": meta.get("url", ""),
315
+ "method": meta.get("method", ""),
316
+ "status_code": meta.get("status_code", 0),
317
+ "data": _compact_preview(raw_body),
318
+ })
319
+
320
+ return results
uv.lock CHANGED
@@ -171,6 +171,19 @@ wheels = [
171
  { url = "https://files.pythonhosted.org/packages/71/cc/18245721fa7747065ab478316c7fea7c74777d07f37ae60db2e84f8172e8/beartype-0.22.9-py3-none-any.whl", hash = "sha256:d16c9bbc61ea14637596c5f6fbff2ee99cbe3573e46a716401734ef50c3060c2", size = 1333658, upload-time = "2025-12-13T06:50:28.266Z" },
172
  ]
173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  [[package]]
175
  name = "brotli"
176
  version = "1.2.0"
@@ -1328,6 +1341,130 @@ wheels = [
1328
  { url = "https://files.pythonhosted.org/packages/81/db/e655086b7f3a705df045bf0933bdd9c2f79bb3c97bfef1384598bb79a217/keyring-25.7.0-py3-none-any.whl", hash = "sha256:be4a0b195f149690c166e850609a477c532ddbfbaed96a404d4e43f8d5e2689f", size = 39160, upload-time = "2025-11-16T16:26:08.402Z" },
1329
  ]
1330
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1331
  [[package]]
1332
  name = "markdown-it-py"
1333
  version = "4.0.0"
@@ -1884,7 +2021,9 @@ name = "openenv-harvestgym"
1884
  version = "0.1.0"
1885
  source = { editable = "." }
1886
  dependencies = [
 
1887
  { name = "fastapi" },
 
1888
  { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
1889
  { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
1890
  { name = "openai" },
@@ -1907,7 +2046,9 @@ embeddings = [
1907
 
1908
  [package.metadata]
1909
  requires-dist = [
 
1910
  { name = "fastapi", specifier = ">=0.100.0" },
 
1911
  { name = "numpy", specifier = ">=1.24.0" },
1912
  { name = "openai", specifier = ">=1.0.0" },
1913
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
@@ -3373,6 +3514,15 @@ wheels = [
3373
  { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
3374
  ]
3375
 
 
 
 
 
 
 
 
 
 
3376
  [[package]]
3377
  name = "sse-starlette"
3378
  version = "3.3.4"
 
171
  { url = "https://files.pythonhosted.org/packages/71/cc/18245721fa7747065ab478316c7fea7c74777d07f37ae60db2e84f8172e8/beartype-0.22.9-py3-none-any.whl", hash = "sha256:d16c9bbc61ea14637596c5f6fbff2ee99cbe3573e46a716401734ef50c3060c2", size = 1333658, upload-time = "2025-12-13T06:50:28.266Z" },
172
  ]
173
 
174
+ [[package]]
175
+ name = "beautifulsoup4"
176
+ version = "4.14.3"
177
+ source = { registry = "https://pypi.org/simple" }
178
+ dependencies = [
179
+ { name = "soupsieve" },
180
+ { name = "typing-extensions" },
181
+ ]
182
+ sdist = { url = "https://files.pythonhosted.org/packages/c3/b0/1c6a16426d389813b48d95e26898aff79abbde42ad353958ad95cc8c9b21/beautifulsoup4-4.14.3.tar.gz", hash = "sha256:6292b1c5186d356bba669ef9f7f051757099565ad9ada5dd630bd9de5fa7fb86", size = 627737, upload-time = "2025-11-30T15:08:26.084Z" }
183
+ wheels = [
184
+ { url = "https://files.pythonhosted.org/packages/1a/39/47f9197bdd44df24d67ac8893641e16f386c984a0619ef2ee4c51fbbc019/beautifulsoup4-4.14.3-py3-none-any.whl", hash = "sha256:0918bfe44902e6ad8d57732ba310582e98da931428d231a5ecb9e7c703a735bb", size = 107721, upload-time = "2025-11-30T15:08:24.087Z" },
185
+ ]
186
+
187
  [[package]]
188
  name = "brotli"
189
  version = "1.2.0"
 
1341
  { url = "https://files.pythonhosted.org/packages/81/db/e655086b7f3a705df045bf0933bdd9c2f79bb3c97bfef1384598bb79a217/keyring-25.7.0-py3-none-any.whl", hash = "sha256:be4a0b195f149690c166e850609a477c532ddbfbaed96a404d4e43f8d5e2689f", size = 39160, upload-time = "2025-11-16T16:26:08.402Z" },
1342
  ]
1343
 
1344
+ [[package]]
1345
+ name = "lxml"
1346
+ version = "6.0.2"
1347
+ source = { registry = "https://pypi.org/simple" }
1348
+ sdist = { url = "https://files.pythonhosted.org/packages/aa/88/262177de60548e5a2bfc46ad28232c9e9cbde697bd94132aeb80364675cb/lxml-6.0.2.tar.gz", hash = "sha256:cd79f3367bd74b317dda655dc8fcfa304d9eb6e4fb06b7168c5cf27f96e0cd62", size = 4073426, upload-time = "2025-09-22T04:04:59.287Z" }
1349
+ wheels = [
1350
+ { url = "https://files.pythonhosted.org/packages/db/8a/f8192a08237ef2fb1b19733f709db88a4c43bc8ab8357f01cb41a27e7f6a/lxml-6.0.2-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e77dd455b9a16bbd2a5036a63ddbd479c19572af81b624e79ef422f929eef388", size = 8590589, upload-time = "2025-09-22T04:00:10.51Z" },
1351
+ { url = "https://files.pythonhosted.org/packages/12/64/27bcd07ae17ff5e5536e8d88f4c7d581b48963817a13de11f3ac3329bfa2/lxml-6.0.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:5d444858b9f07cefff6455b983aea9a67f7462ba1f6cbe4a21e8bf6791bf2153", size = 4629671, upload-time = "2025-09-22T04:00:15.411Z" },
1352
+ { url = "https://files.pythonhosted.org/packages/02/5a/a7d53b3291c324e0b6e48f3c797be63836cc52156ddf8f33cd72aac78866/lxml-6.0.2-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f952dacaa552f3bb8834908dddd500ba7d508e6ea6eb8c52eb2d28f48ca06a31", size = 4999961, upload-time = "2025-09-22T04:00:17.619Z" },
1353
+ { url = "https://files.pythonhosted.org/packages/f5/55/d465e9b89df1761674d8672bb3e4ae2c47033b01ec243964b6e334c6743f/lxml-6.0.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:71695772df6acea9f3c0e59e44ba8ac50c4f125217e84aab21074a1a55e7e5c9", size = 5157087, upload-time = "2025-09-22T04:00:19.868Z" },
1354
+ { url = "https://files.pythonhosted.org/packages/62/38/3073cd7e3e8dfc3ba3c3a139e33bee3a82de2bfb0925714351ad3d255c13/lxml-6.0.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:17f68764f35fd78d7c4cc4ef209a184c38b65440378013d24b8aecd327c3e0c8", size = 5067620, upload-time = "2025-09-22T04:00:21.877Z" },
1355
+ { url = "https://files.pythonhosted.org/packages/4a/d3/1e001588c5e2205637b08985597827d3827dbaaece16348c8822bfe61c29/lxml-6.0.2-cp310-cp310-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:058027e261afed589eddcfe530fcc6f3402d7fd7e89bfd0532df82ebc1563dba", size = 5406664, upload-time = "2025-09-22T04:00:23.714Z" },
1356
+ { url = "https://files.pythonhosted.org/packages/20/cf/cab09478699b003857ed6ebfe95e9fb9fa3d3c25f1353b905c9b73cfb624/lxml-6.0.2-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a8ffaeec5dfea5881d4c9d8913a32d10cfe3923495386106e4a24d45300ef79c", size = 5289397, upload-time = "2025-09-22T04:00:25.544Z" },
1357
+ { url = "https://files.pythonhosted.org/packages/a3/84/02a2d0c38ac9a8b9f9e5e1bbd3f24b3f426044ad618b552e9549ee91bd63/lxml-6.0.2-cp310-cp310-manylinux_2_31_armv7l.whl", hash = "sha256:f2e3b1a6bb38de0bc713edd4d612969dd250ca8b724be8d460001a387507021c", size = 4772178, upload-time = "2025-09-22T04:00:27.602Z" },
1358
+ { url = "https://files.pythonhosted.org/packages/56/87/e1ceadcc031ec4aa605fe95476892d0b0ba3b7f8c7dcdf88fdeff59a9c86/lxml-6.0.2-cp310-cp310-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:d6690ec5ec1cce0385cb20896b16be35247ac8c2046e493d03232f1c2414d321", size = 5358148, upload-time = "2025-09-22T04:00:29.323Z" },
1359
+ { url = "https://files.pythonhosted.org/packages/fe/13/5bb6cf42bb228353fd4ac5f162c6a84fd68a4d6f67c1031c8cf97e131fc6/lxml-6.0.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:f2a50c3c1d11cad0ebebbac357a97b26aa79d2bcaf46f256551152aa85d3a4d1", size = 5112035, upload-time = "2025-09-22T04:00:31.061Z" },
1360
+ { url = "https://files.pythonhosted.org/packages/e4/e2/ea0498552102e59834e297c5c6dff8d8ded3db72ed5e8aad77871476f073/lxml-6.0.2-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:3efe1b21c7801ffa29a1112fab3b0f643628c30472d507f39544fd48e9549e34", size = 4799111, upload-time = "2025-09-22T04:00:33.11Z" },
1361
+ { url = "https://files.pythonhosted.org/packages/6a/9e/8de42b52a73abb8af86c66c969b3b4c2a96567b6ac74637c037d2e3baa60/lxml-6.0.2-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:59c45e125140b2c4b33920d21d83681940ca29f0b83f8629ea1a2196dc8cfe6a", size = 5351662, upload-time = "2025-09-22T04:00:35.237Z" },
1362
+ { url = "https://files.pythonhosted.org/packages/28/a2/de776a573dfb15114509a37351937c367530865edb10a90189d0b4b9b70a/lxml-6.0.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:452b899faa64f1805943ec1c0c9ebeaece01a1af83e130b69cdefeda180bb42c", size = 5314973, upload-time = "2025-09-22T04:00:37.086Z" },
1363
+ { url = "https://files.pythonhosted.org/packages/50/a0/3ae1b1f8964c271b5eec91db2043cf8c6c0bce101ebb2a633b51b044db6c/lxml-6.0.2-cp310-cp310-win32.whl", hash = "sha256:1e786a464c191ca43b133906c6903a7e4d56bef376b75d97ccbb8ec5cf1f0a4b", size = 3611953, upload-time = "2025-09-22T04:00:39.224Z" },
1364
+ { url = "https://files.pythonhosted.org/packages/d1/70/bd42491f0634aad41bdfc1e46f5cff98825fb6185688dc82baa35d509f1a/lxml-6.0.2-cp310-cp310-win_amd64.whl", hash = "sha256:dacf3c64ef3f7440e3167aa4b49aa9e0fb99e0aa4f9ff03795640bf94531bcb0", size = 4032695, upload-time = "2025-09-22T04:00:41.402Z" },
1365
+ { url = "https://files.pythonhosted.org/packages/d2/d0/05c6a72299f54c2c561a6c6cbb2f512e047fca20ea97a05e57931f194ac4/lxml-6.0.2-cp310-cp310-win_arm64.whl", hash = "sha256:45f93e6f75123f88d7f0cfd90f2d05f441b808562bf0bc01070a00f53f5028b5", size = 3680051, upload-time = "2025-09-22T04:00:43.525Z" },
1366
+ { url = "https://files.pythonhosted.org/packages/77/d5/becbe1e2569b474a23f0c672ead8a29ac50b2dc1d5b9de184831bda8d14c/lxml-6.0.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:13e35cbc684aadf05d8711a5d1b5857c92e5e580efa9a0d2be197199c8def607", size = 8634365, upload-time = "2025-09-22T04:00:45.672Z" },
1367
+ { url = "https://files.pythonhosted.org/packages/28/66/1ced58f12e804644426b85d0bb8a4478ca77bc1761455da310505f1a3526/lxml-6.0.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3b1675e096e17c6fe9c0e8c81434f5736c0739ff9ac6123c87c2d452f48fc938", size = 4650793, upload-time = "2025-09-22T04:00:47.783Z" },
1368
+ { url = "https://files.pythonhosted.org/packages/11/84/549098ffea39dfd167e3f174b4ce983d0eed61f9d8d25b7bf2a57c3247fc/lxml-6.0.2-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:8ac6e5811ae2870953390452e3476694196f98d447573234592d30488147404d", size = 4944362, upload-time = "2025-09-22T04:00:49.845Z" },
1369
+ { url = "https://files.pythonhosted.org/packages/ac/bd/f207f16abf9749d2037453d56b643a7471d8fde855a231a12d1e095c4f01/lxml-6.0.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5aa0fc67ae19d7a64c3fe725dc9a1bb11f80e01f78289d05c6f62545affec438", size = 5083152, upload-time = "2025-09-22T04:00:51.709Z" },
1370
+ { url = "https://files.pythonhosted.org/packages/15/ae/bd813e87d8941d52ad5b65071b1affb48da01c4ed3c9c99e40abb266fbff/lxml-6.0.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:de496365750cc472b4e7902a485d3f152ecf57bd3ba03ddd5578ed8ceb4c5964", size = 5023539, upload-time = "2025-09-22T04:00:53.593Z" },
1371
+ { url = "https://files.pythonhosted.org/packages/02/cd/9bfef16bd1d874fbe0cb51afb00329540f30a3283beb9f0780adbb7eec03/lxml-6.0.2-cp311-cp311-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:200069a593c5e40b8f6fc0d84d86d970ba43138c3e68619ffa234bc9bb806a4d", size = 5344853, upload-time = "2025-09-22T04:00:55.524Z" },
1372
+ { url = "https://files.pythonhosted.org/packages/b8/89/ea8f91594bc5dbb879734d35a6f2b0ad50605d7fb419de2b63d4211765cc/lxml-6.0.2-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7d2de809c2ee3b888b59f995625385f74629707c9355e0ff856445cdcae682b7", size = 5225133, upload-time = "2025-09-22T04:00:57.269Z" },
1373
+ { url = "https://files.pythonhosted.org/packages/b9/37/9c735274f5dbec726b2db99b98a43950395ba3d4a1043083dba2ad814170/lxml-6.0.2-cp311-cp311-manylinux_2_31_armv7l.whl", hash = "sha256:b2c3da8d93cf5db60e8858c17684c47d01fee6405e554fb55018dd85fc23b178", size = 4677944, upload-time = "2025-09-22T04:00:59.052Z" },
1374
+ { url = "https://files.pythonhosted.org/packages/20/28/7dfe1ba3475d8bfca3878365075abe002e05d40dfaaeb7ec01b4c587d533/lxml-6.0.2-cp311-cp311-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:442de7530296ef5e188373a1ea5789a46ce90c4847e597856570439621d9c553", size = 5284535, upload-time = "2025-09-22T04:01:01.335Z" },
1375
+ { url = "https://files.pythonhosted.org/packages/e7/cf/5f14bc0de763498fc29510e3532bf2b4b3a1c1d5d0dff2e900c16ba021ef/lxml-6.0.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:2593c77efde7bfea7f6389f1ab249b15ed4aa5bc5cb5131faa3b843c429fbedb", size = 5067343, upload-time = "2025-09-22T04:01:03.13Z" },
1376
+ { url = "https://files.pythonhosted.org/packages/1c/b0/bb8275ab5472f32b28cfbbcc6db7c9d092482d3439ca279d8d6fa02f7025/lxml-6.0.2-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:3e3cb08855967a20f553ff32d147e14329b3ae70ced6edc2f282b94afbc74b2a", size = 4725419, upload-time = "2025-09-22T04:01:05.013Z" },
1377
+ { url = "https://files.pythonhosted.org/packages/25/4c/7c222753bc72edca3b99dbadba1b064209bc8ed4ad448af990e60dcce462/lxml-6.0.2-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:2ed6c667fcbb8c19c6791bbf40b7268ef8ddf5a96940ba9404b9f9a304832f6c", size = 5275008, upload-time = "2025-09-22T04:01:07.327Z" },
1378
+ { url = "https://files.pythonhosted.org/packages/6c/8c/478a0dc6b6ed661451379447cdbec77c05741a75736d97e5b2b729687828/lxml-6.0.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b8f18914faec94132e5b91e69d76a5c1d7b0c73e2489ea8929c4aaa10b76bbf7", size = 5248906, upload-time = "2025-09-22T04:01:09.452Z" },
1379
+ { url = "https://files.pythonhosted.org/packages/2d/d9/5be3a6ab2784cdf9accb0703b65e1b64fcdd9311c9f007630c7db0cfcce1/lxml-6.0.2-cp311-cp311-win32.whl", hash = "sha256:6605c604e6daa9e0d7f0a2137bdc47a2e93b59c60a65466353e37f8272f47c46", size = 3610357, upload-time = "2025-09-22T04:01:11.102Z" },
1380
+ { url = "https://files.pythonhosted.org/packages/e2/7d/ca6fb13349b473d5732fb0ee3eec8f6c80fc0688e76b7d79c1008481bf1f/lxml-6.0.2-cp311-cp311-win_amd64.whl", hash = "sha256:e5867f2651016a3afd8dd2c8238baa66f1e2802f44bc17e236f547ace6647078", size = 4036583, upload-time = "2025-09-22T04:01:12.766Z" },
1381
+ { url = "https://files.pythonhosted.org/packages/ab/a2/51363b5ecd3eab46563645f3a2c3836a2fc67d01a1b87c5017040f39f567/lxml-6.0.2-cp311-cp311-win_arm64.whl", hash = "sha256:4197fb2534ee05fd3e7afaab5d8bfd6c2e186f65ea7f9cd6a82809c887bd1285", size = 3680591, upload-time = "2025-09-22T04:01:14.874Z" },
1382
+ { url = "https://files.pythonhosted.org/packages/f3/c8/8ff2bc6b920c84355146cd1ab7d181bc543b89241cfb1ebee824a7c81457/lxml-6.0.2-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:a59f5448ba2ceccd06995c95ea59a7674a10de0810f2ce90c9006f3cbc044456", size = 8661887, upload-time = "2025-09-22T04:01:17.265Z" },
1383
+ { url = "https://files.pythonhosted.org/packages/37/6f/9aae1008083bb501ef63284220ce81638332f9ccbfa53765b2b7502203cf/lxml-6.0.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:e8113639f3296706fbac34a30813929e29247718e88173ad849f57ca59754924", size = 4667818, upload-time = "2025-09-22T04:01:19.688Z" },
1384
+ { url = "https://files.pythonhosted.org/packages/f1/ca/31fb37f99f37f1536c133476674c10b577e409c0a624384147653e38baf2/lxml-6.0.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:a8bef9b9825fa8bc816a6e641bb67219489229ebc648be422af695f6e7a4fa7f", size = 4950807, upload-time = "2025-09-22T04:01:21.487Z" },
1385
+ { url = "https://files.pythonhosted.org/packages/da/87/f6cb9442e4bada8aab5ae7e1046264f62fdbeaa6e3f6211b93f4c0dd97f1/lxml-6.0.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:65ea18d710fd14e0186c2f973dc60bb52039a275f82d3c44a0e42b43440ea534", size = 5109179, upload-time = "2025-09-22T04:01:23.32Z" },
1386
+ { url = "https://files.pythonhosted.org/packages/c8/20/a7760713e65888db79bbae4f6146a6ae5c04e4a204a3c48896c408cd6ed2/lxml-6.0.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c371aa98126a0d4c739ca93ceffa0fd7a5d732e3ac66a46e74339acd4d334564", size = 5023044, upload-time = "2025-09-22T04:01:25.118Z" },
1387
+ { url = "https://files.pythonhosted.org/packages/a2/b0/7e64e0460fcb36471899f75831509098f3fd7cd02a3833ac517433cb4f8f/lxml-6.0.2-cp312-cp312-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:700efd30c0fa1a3581d80a748157397559396090a51d306ea59a70020223d16f", size = 5359685, upload-time = "2025-09-22T04:01:27.398Z" },
1388
+ { url = "https://files.pythonhosted.org/packages/b9/e1/e5df362e9ca4e2f48ed6411bd4b3a0ae737cc842e96877f5bf9428055ab4/lxml-6.0.2-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c33e66d44fe60e72397b487ee92e01da0d09ba2d66df8eae42d77b6d06e5eba0", size = 5654127, upload-time = "2025-09-22T04:01:29.629Z" },
1389
+ { url = "https://files.pythonhosted.org/packages/c6/d1/232b3309a02d60f11e71857778bfcd4acbdb86c07db8260caf7d008b08f8/lxml-6.0.2-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:90a345bbeaf9d0587a3aaffb7006aa39ccb6ff0e96a57286c0cb2fd1520ea192", size = 5253958, upload-time = "2025-09-22T04:01:31.535Z" },
1390
+ { url = "https://files.pythonhosted.org/packages/35/35/d955a070994725c4f7d80583a96cab9c107c57a125b20bb5f708fe941011/lxml-6.0.2-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:064fdadaf7a21af3ed1dcaa106b854077fbeada827c18f72aec9346847cd65d0", size = 4711541, upload-time = "2025-09-22T04:01:33.801Z" },
1391
+ { url = "https://files.pythonhosted.org/packages/1e/be/667d17363b38a78c4bd63cfd4b4632029fd68d2c2dc81f25ce9eb5224dd5/lxml-6.0.2-cp312-cp312-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fbc74f42c3525ac4ffa4b89cbdd00057b6196bcefe8bce794abd42d33a018092", size = 5267426, upload-time = "2025-09-22T04:01:35.639Z" },
1392
+ { url = "https://files.pythonhosted.org/packages/ea/47/62c70aa4a1c26569bc958c9ca86af2bb4e1f614e8c04fb2989833874f7ae/lxml-6.0.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6ddff43f702905a4e32bc24f3f2e2edfe0f8fde3277d481bffb709a4cced7a1f", size = 5064917, upload-time = "2025-09-22T04:01:37.448Z" },
1393
+ { url = "https://files.pythonhosted.org/packages/bd/55/6ceddaca353ebd0f1908ef712c597f8570cc9c58130dbb89903198e441fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:6da5185951d72e6f5352166e3da7b0dc27aa70bd1090b0eb3f7f7212b53f1bb8", size = 4788795, upload-time = "2025-09-22T04:01:39.165Z" },
1394
+ { url = "https://files.pythonhosted.org/packages/cf/e8/fd63e15da5e3fd4c2146f8bbb3c14e94ab850589beab88e547b2dbce22e1/lxml-6.0.2-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:57a86e1ebb4020a38d295c04fc79603c7899e0df71588043eb218722dabc087f", size = 5676759, upload-time = "2025-09-22T04:01:41.506Z" },
1395
+ { url = "https://files.pythonhosted.org/packages/76/47/b3ec58dc5c374697f5ba37412cd2728f427d056315d124dd4b61da381877/lxml-6.0.2-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2047d8234fe735ab77802ce5f2297e410ff40f5238aec569ad7c8e163d7b19a6", size = 5255666, upload-time = "2025-09-22T04:01:43.363Z" },
1396
+ { url = "https://files.pythonhosted.org/packages/19/93/03ba725df4c3d72afd9596eef4a37a837ce8e4806010569bedfcd2cb68fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:6f91fd2b2ea15a6800c8e24418c0775a1694eefc011392da73bc6cef2623b322", size = 5277989, upload-time = "2025-09-22T04:01:45.215Z" },
1397
+ { url = "https://files.pythonhosted.org/packages/c6/80/c06de80bfce881d0ad738576f243911fccf992687ae09fd80b734712b39c/lxml-6.0.2-cp312-cp312-win32.whl", hash = "sha256:3ae2ce7d6fedfb3414a2b6c5e20b249c4c607f72cb8d2bb7cc9c6ec7c6f4e849", size = 3611456, upload-time = "2025-09-22T04:01:48.243Z" },
1398
+ { url = "https://files.pythonhosted.org/packages/f7/d7/0cdfb6c3e30893463fb3d1e52bc5f5f99684a03c29a0b6b605cfae879cd5/lxml-6.0.2-cp312-cp312-win_amd64.whl", hash = "sha256:72c87e5ee4e58a8354fb9c7c84cbf95a1c8236c127a5d1b7683f04bed8361e1f", size = 4011793, upload-time = "2025-09-22T04:01:50.042Z" },
1399
+ { url = "https://files.pythonhosted.org/packages/ea/7b/93c73c67db235931527301ed3785f849c78991e2e34f3fd9a6663ffda4c5/lxml-6.0.2-cp312-cp312-win_arm64.whl", hash = "sha256:61cb10eeb95570153e0c0e554f58df92ecf5109f75eacad4a95baa709e26c3d6", size = 3672836, upload-time = "2025-09-22T04:01:52.145Z" },
1400
+ { url = "https://files.pythonhosted.org/packages/53/fd/4e8f0540608977aea078bf6d79f128e0e2c2bba8af1acf775c30baa70460/lxml-6.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:9b33d21594afab46f37ae58dfadd06636f154923c4e8a4d754b0127554eb2e77", size = 8648494, upload-time = "2025-09-22T04:01:54.242Z" },
1401
+ { url = "https://files.pythonhosted.org/packages/5d/f4/2a94a3d3dfd6c6b433501b8d470a1960a20ecce93245cf2db1706adf6c19/lxml-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:6c8963287d7a4c5c9a432ff487c52e9c5618667179c18a204bdedb27310f022f", size = 4661146, upload-time = "2025-09-22T04:01:56.282Z" },
1402
+ { url = "https://files.pythonhosted.org/packages/25/2e/4efa677fa6b322013035d38016f6ae859d06cac67437ca7dc708a6af7028/lxml-6.0.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:1941354d92699fb5ffe6ed7b32f9649e43c2feb4b97205f75866f7d21aa91452", size = 4946932, upload-time = "2025-09-22T04:01:58.989Z" },
1403
+ { url = "https://files.pythonhosted.org/packages/ce/0f/526e78a6d38d109fdbaa5049c62e1d32fdd70c75fb61c4eadf3045d3d124/lxml-6.0.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:bb2f6ca0ae2d983ded09357b84af659c954722bbf04dea98030064996d156048", size = 5100060, upload-time = "2025-09-22T04:02:00.812Z" },
1404
+ { url = "https://files.pythonhosted.org/packages/81/76/99de58d81fa702cc0ea7edae4f4640416c2062813a00ff24bd70ac1d9c9b/lxml-6.0.2-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:eb2a12d704f180a902d7fa778c6d71f36ceb7b0d317f34cdc76a5d05aa1dd1df", size = 5019000, upload-time = "2025-09-22T04:02:02.671Z" },
1405
+ { url = "https://files.pythonhosted.org/packages/b5/35/9e57d25482bc9a9882cb0037fdb9cc18f4b79d85df94fa9d2a89562f1d25/lxml-6.0.2-cp313-cp313-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:6ec0e3f745021bfed19c456647f0298d60a24c9ff86d9d051f52b509663feeb1", size = 5348496, upload-time = "2025-09-22T04:02:04.904Z" },
1406
+ { url = "https://files.pythonhosted.org/packages/a6/8e/cb99bd0b83ccc3e8f0f528e9aa1f7a9965dfec08c617070c5db8d63a87ce/lxml-6.0.2-cp313-cp313-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:846ae9a12d54e368933b9759052d6206a9e8b250291109c48e350c1f1f49d916", size = 5643779, upload-time = "2025-09-22T04:02:06.689Z" },
1407
+ { url = "https://files.pythonhosted.org/packages/d0/34/9e591954939276bb679b73773836c6684c22e56d05980e31d52a9a8deb18/lxml-6.0.2-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef9266d2aa545d7374938fb5c484531ef5a2ec7f2d573e62f8ce722c735685fd", size = 5244072, upload-time = "2025-09-22T04:02:08.587Z" },
1408
+ { url = "https://files.pythonhosted.org/packages/8d/27/b29ff065f9aaca443ee377aff699714fcbffb371b4fce5ac4ca759e436d5/lxml-6.0.2-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:4077b7c79f31755df33b795dc12119cb557a0106bfdab0d2c2d97bd3cf3dffa6", size = 4718675, upload-time = "2025-09-22T04:02:10.783Z" },
1409
+ { url = "https://files.pythonhosted.org/packages/2b/9f/f756f9c2cd27caa1a6ef8c32ae47aadea697f5c2c6d07b0dae133c244fbe/lxml-6.0.2-cp313-cp313-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a7c5d5e5f1081955358533be077166ee97ed2571d6a66bdba6ec2f609a715d1a", size = 5255171, upload-time = "2025-09-22T04:02:12.631Z" },
1410
+ { url = "https://files.pythonhosted.org/packages/61/46/bb85ea42d2cb1bd8395484fd72f38e3389611aa496ac7772da9205bbda0e/lxml-6.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:8f8d0cbd0674ee89863a523e6994ac25fd5be9c8486acfc3e5ccea679bad2679", size = 5057175, upload-time = "2025-09-22T04:02:14.718Z" },
1411
+ { url = "https://files.pythonhosted.org/packages/95/0c/443fc476dcc8e41577f0af70458c50fe299a97bb6b7505bb1ae09aa7f9ac/lxml-6.0.2-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:2cbcbf6d6e924c28f04a43f3b6f6e272312a090f269eff68a2982e13e5d57659", size = 4785688, upload-time = "2025-09-22T04:02:16.957Z" },
1412
+ { url = "https://files.pythonhosted.org/packages/48/78/6ef0b359d45bb9697bc5a626e1992fa5d27aa3f8004b137b2314793b50a0/lxml-6.0.2-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:dfb874cfa53340009af6bdd7e54ebc0d21012a60a4e65d927c2e477112e63484", size = 5660655, upload-time = "2025-09-22T04:02:18.815Z" },
1413
+ { url = "https://files.pythonhosted.org/packages/ff/ea/e1d33808f386bc1339d08c0dcada6e4712d4ed8e93fcad5f057070b7988a/lxml-6.0.2-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:fb8dae0b6b8b7f9e96c26fdd8121522ce5de9bb5538010870bd538683d30e9a2", size = 5247695, upload-time = "2025-09-22T04:02:20.593Z" },
1414
+ { url = "https://files.pythonhosted.org/packages/4f/47/eba75dfd8183673725255247a603b4ad606f4ae657b60c6c145b381697da/lxml-6.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:358d9adae670b63e95bc59747c72f4dc97c9ec58881d4627fe0120da0f90d314", size = 5269841, upload-time = "2025-09-22T04:02:22.489Z" },
1415
+ { url = "https://files.pythonhosted.org/packages/76/04/5c5e2b8577bc936e219becb2e98cdb1aca14a4921a12995b9d0c523502ae/lxml-6.0.2-cp313-cp313-win32.whl", hash = "sha256:e8cd2415f372e7e5a789d743d133ae474290a90b9023197fd78f32e2dc6873e2", size = 3610700, upload-time = "2025-09-22T04:02:24.465Z" },
1416
+ { url = "https://files.pythonhosted.org/packages/fe/0a/4643ccc6bb8b143e9f9640aa54e38255f9d3b45feb2cbe7ae2ca47e8782e/lxml-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:b30d46379644fbfc3ab81f8f82ae4de55179414651f110a1514f0b1f8f6cb2d7", size = 4010347, upload-time = "2025-09-22T04:02:26.286Z" },
1417
+ { url = "https://files.pythonhosted.org/packages/31/ef/dcf1d29c3f530577f61e5fe2f1bd72929acf779953668a8a47a479ae6f26/lxml-6.0.2-cp313-cp313-win_arm64.whl", hash = "sha256:13dcecc9946dca97b11b7c40d29fba63b55ab4170d3c0cf8c0c164343b9bfdcf", size = 3671248, upload-time = "2025-09-22T04:02:27.918Z" },
1418
+ { url = "https://files.pythonhosted.org/packages/03/15/d4a377b385ab693ce97b472fe0c77c2b16ec79590e688b3ccc71fba19884/lxml-6.0.2-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:b0c732aa23de8f8aec23f4b580d1e52905ef468afb4abeafd3fec77042abb6fe", size = 8659801, upload-time = "2025-09-22T04:02:30.113Z" },
1419
+ { url = "https://files.pythonhosted.org/packages/c8/e8/c128e37589463668794d503afaeb003987373c5f94d667124ffd8078bbd9/lxml-6.0.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4468e3b83e10e0317a89a33d28f7aeba1caa4d1a6fd457d115dd4ffe90c5931d", size = 4659403, upload-time = "2025-09-22T04:02:32.119Z" },
1420
+ { url = "https://files.pythonhosted.org/packages/00/ce/74903904339decdf7da7847bb5741fc98a5451b42fc419a86c0c13d26fe2/lxml-6.0.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:abd44571493973bad4598a3be7e1d807ed45aa2adaf7ab92ab7c62609569b17d", size = 4966974, upload-time = "2025-09-22T04:02:34.155Z" },
1421
+ { url = "https://files.pythonhosted.org/packages/1f/d3/131dec79ce61c5567fecf82515bd9bc36395df42501b50f7f7f3bd065df0/lxml-6.0.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:370cd78d5855cfbffd57c422851f7d3864e6ae72d0da615fca4dad8c45d375a5", size = 5102953, upload-time = "2025-09-22T04:02:36.054Z" },
1422
+ { url = "https://files.pythonhosted.org/packages/3a/ea/a43ba9bb750d4ffdd885f2cd333572f5bb900cd2408b67fdda07e85978a0/lxml-6.0.2-cp314-cp314-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:901e3b4219fa04ef766885fb40fa516a71662a4c61b80c94d25336b4934b71c0", size = 5055054, upload-time = "2025-09-22T04:02:38.154Z" },
1423
+ { url = "https://files.pythonhosted.org/packages/60/23/6885b451636ae286c34628f70a7ed1fcc759f8d9ad382d132e1c8d3d9bfd/lxml-6.0.2-cp314-cp314-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:a4bf42d2e4cf52c28cc1812d62426b9503cdb0c87a6de81442626aa7d69707ba", size = 5352421, upload-time = "2025-09-22T04:02:40.413Z" },
1424
+ { url = "https://files.pythonhosted.org/packages/48/5b/fc2ddfc94ddbe3eebb8e9af6e3fd65e2feba4967f6a4e9683875c394c2d8/lxml-6.0.2-cp314-cp314-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2c7fdaa4d7c3d886a42534adec7cfac73860b89b4e5298752f60aa5984641a0", size = 5673684, upload-time = "2025-09-22T04:02:42.288Z" },
1425
+ { url = "https://files.pythonhosted.org/packages/29/9c/47293c58cc91769130fbf85531280e8cc7868f7fbb6d92f4670071b9cb3e/lxml-6.0.2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98a5e1660dc7de2200b00d53fa00bcd3c35a3608c305d45a7bbcaf29fa16e83d", size = 5252463, upload-time = "2025-09-22T04:02:44.165Z" },
1426
+ { url = "https://files.pythonhosted.org/packages/9b/da/ba6eceb830c762b48e711ded880d7e3e89fc6c7323e587c36540b6b23c6b/lxml-6.0.2-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:dc051506c30b609238d79eda75ee9cab3e520570ec8219844a72a46020901e37", size = 4698437, upload-time = "2025-09-22T04:02:46.524Z" },
1427
+ { url = "https://files.pythonhosted.org/packages/a5/24/7be3f82cb7990b89118d944b619e53c656c97dc89c28cfb143fdb7cd6f4d/lxml-6.0.2-cp314-cp314-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:8799481bbdd212470d17513a54d568f44416db01250f49449647b5ab5b5dccb9", size = 5269890, upload-time = "2025-09-22T04:02:48.812Z" },
1428
+ { url = "https://files.pythonhosted.org/packages/1b/bd/dcfb9ea1e16c665efd7538fc5d5c34071276ce9220e234217682e7d2c4a5/lxml-6.0.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9261bb77c2dab42f3ecd9103951aeca2c40277701eb7e912c545c1b16e0e4917", size = 5097185, upload-time = "2025-09-22T04:02:50.746Z" },
1429
+ { url = "https://files.pythonhosted.org/packages/21/04/a60b0ff9314736316f28316b694bccbbabe100f8483ad83852d77fc7468e/lxml-6.0.2-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:65ac4a01aba353cfa6d5725b95d7aed6356ddc0a3cd734de00124d285b04b64f", size = 4745895, upload-time = "2025-09-22T04:02:52.968Z" },
1430
+ { url = "https://files.pythonhosted.org/packages/d6/bd/7d54bd1846e5a310d9c715921c5faa71cf5c0853372adf78aee70c8d7aa2/lxml-6.0.2-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:b22a07cbb82fea98f8a2fd814f3d1811ff9ed76d0fc6abc84eb21527596e7cc8", size = 5695246, upload-time = "2025-09-22T04:02:54.798Z" },
1431
+ { url = "https://files.pythonhosted.org/packages/fd/32/5643d6ab947bc371da21323acb2a6e603cedbe71cb4c99c8254289ab6f4e/lxml-6.0.2-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:d759cdd7f3e055d6bc8d9bec3ad905227b2e4c785dc16c372eb5b5e83123f48a", size = 5260797, upload-time = "2025-09-22T04:02:57.058Z" },
1432
+ { url = "https://files.pythonhosted.org/packages/33/da/34c1ec4cff1eea7d0b4cd44af8411806ed943141804ac9c5d565302afb78/lxml-6.0.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:945da35a48d193d27c188037a05fec5492937f66fb1958c24fc761fb9d40d43c", size = 5277404, upload-time = "2025-09-22T04:02:58.966Z" },
1433
+ { url = "https://files.pythonhosted.org/packages/82/57/4eca3e31e54dc89e2c3507e1cd411074a17565fa5ffc437c4ae0a00d439e/lxml-6.0.2-cp314-cp314-win32.whl", hash = "sha256:be3aaa60da67e6153eb15715cc2e19091af5dc75faef8b8a585aea372507384b", size = 3670072, upload-time = "2025-09-22T04:03:38.05Z" },
1434
+ { url = "https://files.pythonhosted.org/packages/e3/e0/c96cf13eccd20c9421ba910304dae0f619724dcf1702864fd59dd386404d/lxml-6.0.2-cp314-cp314-win_amd64.whl", hash = "sha256:fa25afbadead523f7001caf0c2382afd272c315a033a7b06336da2637d92d6ed", size = 4080617, upload-time = "2025-09-22T04:03:39.835Z" },
1435
+ { url = "https://files.pythonhosted.org/packages/d5/5d/b3f03e22b3d38d6f188ef044900a9b29b2fe0aebb94625ce9fe244011d34/lxml-6.0.2-cp314-cp314-win_arm64.whl", hash = "sha256:063eccf89df5b24e361b123e257e437f9e9878f425ee9aae3144c77faf6da6d8", size = 3754930, upload-time = "2025-09-22T04:03:41.565Z" },
1436
+ { url = "https://files.pythonhosted.org/packages/5e/5c/42c2c4c03554580708fc738d13414801f340c04c3eff90d8d2d227145275/lxml-6.0.2-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:6162a86d86893d63084faaf4ff937b3daea233e3682fb4474db07395794fa80d", size = 8910380, upload-time = "2025-09-22T04:03:01.645Z" },
1437
+ { url = "https://files.pythonhosted.org/packages/bf/4f/12df843e3e10d18d468a7557058f8d3733e8b6e12401f30b1ef29360740f/lxml-6.0.2-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:414aaa94e974e23a3e92e7ca5b97d10c0cf37b6481f50911032c69eeb3991bba", size = 4775632, upload-time = "2025-09-22T04:03:03.814Z" },
1438
+ { url = "https://files.pythonhosted.org/packages/e4/0c/9dc31e6c2d0d418483cbcb469d1f5a582a1cd00a1f4081953d44051f3c50/lxml-6.0.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:48461bd21625458dd01e14e2c38dd0aea69addc3c4f960c30d9f59d7f93be601", size = 4975171, upload-time = "2025-09-22T04:03:05.651Z" },
1439
+ { url = "https://files.pythonhosted.org/packages/e7/2b/9b870c6ca24c841bdd887504808f0417aa9d8d564114689266f19ddf29c8/lxml-6.0.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:25fcc59afc57d527cfc78a58f40ab4c9b8fd096a9a3f964d2781ffb6eb33f4ed", size = 5110109, upload-time = "2025-09-22T04:03:07.452Z" },
1440
+ { url = "https://files.pythonhosted.org/packages/bf/0c/4f5f2a4dd319a178912751564471355d9019e220c20d7db3fb8307ed8582/lxml-6.0.2-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5179c60288204e6ddde3f774a93350177e08876eaf3ab78aa3a3649d43eb7d37", size = 5041061, upload-time = "2025-09-22T04:03:09.297Z" },
1441
+ { url = "https://files.pythonhosted.org/packages/12/64/554eed290365267671fe001a20d72d14f468ae4e6acef1e179b039436967/lxml-6.0.2-cp314-cp314t-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:967aab75434de148ec80597b75062d8123cadf2943fb4281f385141e18b21338", size = 5306233, upload-time = "2025-09-22T04:03:11.651Z" },
1442
+ { url = "https://files.pythonhosted.org/packages/7a/31/1d748aa275e71802ad9722df32a7a35034246b42c0ecdd8235412c3396ef/lxml-6.0.2-cp314-cp314t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:d100fcc8930d697c6561156c6810ab4a508fb264c8b6779e6e61e2ed5e7558f9", size = 5604739, upload-time = "2025-09-22T04:03:13.592Z" },
1443
+ { url = "https://files.pythonhosted.org/packages/8f/41/2c11916bcac09ed561adccacceaedd2bf0e0b25b297ea92aab99fd03d0fa/lxml-6.0.2-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2ca59e7e13e5981175b8b3e4ab84d7da57993eeff53c07764dcebda0d0e64ecd", size = 5225119, upload-time = "2025-09-22T04:03:15.408Z" },
1444
+ { url = "https://files.pythonhosted.org/packages/99/05/4e5c2873d8f17aa018e6afde417c80cc5d0c33be4854cce3ef5670c49367/lxml-6.0.2-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:957448ac63a42e2e49531b9d6c0fa449a1970dbc32467aaad46f11545be9af1d", size = 4633665, upload-time = "2025-09-22T04:03:17.262Z" },
1445
+ { url = "https://files.pythonhosted.org/packages/0f/c9/dcc2da1bebd6275cdc723b515f93edf548b82f36a5458cca3578bc899332/lxml-6.0.2-cp314-cp314t-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b7fc49c37f1786284b12af63152fe1d0990722497e2d5817acfe7a877522f9a9", size = 5234997, upload-time = "2025-09-22T04:03:19.14Z" },
1446
+ { url = "https://files.pythonhosted.org/packages/9c/e2/5172e4e7468afca64a37b81dba152fc5d90e30f9c83c7c3213d6a02a5ce4/lxml-6.0.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e19e0643cc936a22e837f79d01a550678da8377d7d801a14487c10c34ee49c7e", size = 5090957, upload-time = "2025-09-22T04:03:21.436Z" },
1447
+ { url = "https://files.pythonhosted.org/packages/a5/b3/15461fd3e5cd4ddcb7938b87fc20b14ab113b92312fc97afe65cd7c85de1/lxml-6.0.2-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:1db01e5cf14345628e0cbe71067204db658e2fb8e51e7f33631f5f4735fefd8d", size = 4764372, upload-time = "2025-09-22T04:03:23.27Z" },
1448
+ { url = "https://files.pythonhosted.org/packages/05/33/f310b987c8bf9e61c4dd8e8035c416bd3230098f5e3cfa69fc4232de7059/lxml-6.0.2-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:875c6b5ab39ad5291588aed6925fac99d0097af0dd62f33c7b43736043d4a2ec", size = 5634653, upload-time = "2025-09-22T04:03:25.767Z" },
1449
+ { url = "https://files.pythonhosted.org/packages/70/ff/51c80e75e0bc9382158133bdcf4e339b5886c6ee2418b5199b3f1a61ed6d/lxml-6.0.2-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:cdcbed9ad19da81c480dfd6dd161886db6096083c9938ead313d94b30aadf272", size = 5233795, upload-time = "2025-09-22T04:03:27.62Z" },
1450
+ { url = "https://files.pythonhosted.org/packages/56/4d/4856e897df0d588789dd844dbed9d91782c4ef0b327f96ce53c807e13128/lxml-6.0.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:80dadc234ebc532e09be1975ff538d154a7fa61ea5031c03d25178855544728f", size = 5257023, upload-time = "2025-09-22T04:03:30.056Z" },
1451
+ { url = "https://files.pythonhosted.org/packages/0f/85/86766dfebfa87bea0ab78e9ff7a4b4b45225df4b4d3b8cc3c03c5cd68464/lxml-6.0.2-cp314-cp314t-win32.whl", hash = "sha256:da08e7bb297b04e893d91087df19638dc7a6bb858a954b0cc2b9f5053c922312", size = 3911420, upload-time = "2025-09-22T04:03:32.198Z" },
1452
+ { url = "https://files.pythonhosted.org/packages/fe/1a/b248b355834c8e32614650b8008c69ffeb0ceb149c793961dd8c0b991bb3/lxml-6.0.2-cp314-cp314t-win_amd64.whl", hash = "sha256:252a22982dca42f6155125ac76d3432e548a7625d56f5a273ee78a5057216eca", size = 4406837, upload-time = "2025-09-22T04:03:34.027Z" },
1453
+ { url = "https://files.pythonhosted.org/packages/92/aa/df863bcc39c5e0946263454aba394de8a9084dbaff8ad143846b0d844739/lxml-6.0.2-cp314-cp314t-win_arm64.whl", hash = "sha256:bb4c1847b303835d89d785a18801a883436cdfd5dc3d62947f9c49e24f0f5a2c", size = 3822205, upload-time = "2025-09-22T04:03:36.249Z" },
1454
+ { url = "https://files.pythonhosted.org/packages/e7/9c/780c9a8fce3f04690b374f72f41306866b0400b9d0fdf3e17aaa37887eed/lxml-6.0.2-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:e748d4cf8fef2526bb2a589a417eba0c8674e29ffcb570ce2ceca44f1e567bf6", size = 3939264, upload-time = "2025-09-22T04:04:32.892Z" },
1455
+ { url = "https://files.pythonhosted.org/packages/f5/5a/1ab260c00adf645d8bf7dec7f920f744b032f69130c681302821d5debea6/lxml-6.0.2-pp310-pypy310_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:4ddb1049fa0579d0cbd00503ad8c58b9ab34d1254c77bc6a5576d96ec7853dba", size = 4216435, upload-time = "2025-09-22T04:04:34.907Z" },
1456
+ { url = "https://files.pythonhosted.org/packages/f2/37/565f3b3d7ffede22874b6d86be1a1763d00f4ea9fc5b9b6ccb11e4ec8612/lxml-6.0.2-pp310-pypy310_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:cb233f9c95f83707dae461b12b720c1af9c28c2d19208e1be03387222151daf5", size = 4325913, upload-time = "2025-09-22T04:04:37.205Z" },
1457
+ { url = "https://files.pythonhosted.org/packages/22/ec/f3a1b169b2fb9d03467e2e3c0c752ea30e993be440a068b125fc7dd248b0/lxml-6.0.2-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bc456d04db0515ce3320d714a1eac7a97774ff0849e7718b492d957da4631dd4", size = 4269357, upload-time = "2025-09-22T04:04:39.322Z" },
1458
+ { url = "https://files.pythonhosted.org/packages/77/a2/585a28fe3e67daa1cf2f06f34490d556d121c25d500b10082a7db96e3bcd/lxml-6.0.2-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2613e67de13d619fd283d58bda40bff0ee07739f624ffee8b13b631abf33083d", size = 4412295, upload-time = "2025-09-22T04:04:41.647Z" },
1459
+ { url = "https://files.pythonhosted.org/packages/7b/d9/a57dd8bcebd7c69386c20263830d4fa72d27e6b72a229ef7a48e88952d9a/lxml-6.0.2-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:24a8e756c982c001ca8d59e87c80c4d9dcd4d9b44a4cbeb8d9be4482c514d41d", size = 3516913, upload-time = "2025-09-22T04:04:43.602Z" },
1460
+ { url = "https://files.pythonhosted.org/packages/0b/11/29d08bc103a62c0eba8016e7ed5aeebbf1e4312e83b0b1648dd203b0e87d/lxml-6.0.2-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:1c06035eafa8404b5cf475bb37a9f6088b0aca288d4ccc9d69389750d5543700", size = 3949829, upload-time = "2025-09-22T04:04:45.608Z" },
1461
+ { url = "https://files.pythonhosted.org/packages/12/b3/52ab9a3b31e5ab8238da241baa19eec44d2ab426532441ee607165aebb52/lxml-6.0.2-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c7d13103045de1bdd6fe5d61802565f1a3537d70cd3abf596aa0af62761921ee", size = 4226277, upload-time = "2025-09-22T04:04:47.754Z" },
1462
+ { url = "https://files.pythonhosted.org/packages/a0/33/1eaf780c1baad88224611df13b1c2a9dfa460b526cacfe769103ff50d845/lxml-6.0.2-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0a3c150a95fbe5ac91de323aa756219ef9cf7fde5a3f00e2281e30f33fa5fa4f", size = 4330433, upload-time = "2025-09-22T04:04:49.907Z" },
1463
+ { url = "https://files.pythonhosted.org/packages/7a/c1/27428a2ff348e994ab4f8777d3a0ad510b6b92d37718e5887d2da99952a2/lxml-6.0.2-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:60fa43be34f78bebb27812ed90f1925ec99560b0fa1decdb7d12b84d857d31e9", size = 4272119, upload-time = "2025-09-22T04:04:51.801Z" },
1464
+ { url = "https://files.pythonhosted.org/packages/f0/d0/3020fa12bcec4ab62f97aab026d57c2f0cfd480a558758d9ca233bb6a79d/lxml-6.0.2-pp311-pypy311_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:21c73b476d3cfe836be731225ec3421fa2f048d84f6df6a8e70433dff1376d5a", size = 4417314, upload-time = "2025-09-22T04:04:55.024Z" },
1465
+ { url = "https://files.pythonhosted.org/packages/6c/77/d7f491cbc05303ac6801651aabeb262d43f319288c1ea96c66b1d2692ff3/lxml-6.0.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:27220da5be049e936c3aca06f174e8827ca6445a4353a1995584311487fc4e3e", size = 3518768, upload-time = "2025-09-22T04:04:57.097Z" },
1466
+ ]
1467
+
1468
  [[package]]
1469
  name = "markdown-it-py"
1470
  version = "4.0.0"
 
2021
  version = "0.1.0"
2022
  source = { editable = "." }
2023
  dependencies = [
2024
+ { name = "beautifulsoup4" },
2025
  { name = "fastapi" },
2026
+ { name = "lxml" },
2027
  { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
2028
  { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
2029
  { name = "openai" },
 
2046
 
2047
  [package.metadata]
2048
  requires-dist = [
2049
+ { name = "beautifulsoup4", specifier = ">=4.14.3" },
2050
  { name = "fastapi", specifier = ">=0.100.0" },
2051
+ { name = "lxml", specifier = ">=6.0.2" },
2052
  { name = "numpy", specifier = ">=1.24.0" },
2053
  { name = "openai", specifier = ">=1.0.0" },
2054
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
 
3514
  { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
3515
  ]
3516
 
3517
+ [[package]]
3518
+ name = "soupsieve"
3519
+ version = "2.8.3"
3520
+ source = { registry = "https://pypi.org/simple" }
3521
+ sdist = { url = "https://files.pythonhosted.org/packages/7b/ae/2d9c981590ed9999a0d91755b47fc74f74de286b0f5cee14c9269041e6c4/soupsieve-2.8.3.tar.gz", hash = "sha256:3267f1eeea4251fb42728b6dfb746edc9acaffc4a45b27e19450b676586e8349", size = 118627, upload-time = "2026-01-20T04:27:02.457Z" }
3522
+ wheels = [
3523
+ { url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" },
3524
+ ]
3525
+
3526
  [[package]]
3527
  name = "sse-starlette"
3528
  version = "3.3.4"