Spaces:

kdcyberdude
/

HARvestGym

Sleeping

App Files Files Community

kdcyberdude commited on 13 days ago

Commit

9eebce3

verified ·

1 Parent(s): b6873b7

Upload folder using huggingface_hub

Browse files

Files changed (21) hide show

BUILD_NOTES.md +18 -0
Dockerfile +3 -1
README.md +268 -500
hars/forum.har +0 -0
hars/shopping.har +2 -2
hars/shopping_admin.har +2 -2
hars/wikipedia.har +0 -0
inference.py +433 -164
openenv_harvestgym.egg-info/PKG-INFO +2 -0
openenv_harvestgym.egg-info/SOURCES.txt +1 -0
openenv_harvestgym.egg-info/requires.txt +2 -0
parameter_pools.json +123 -519
pyproject.toml +2 -0
scripts/inspect_har_endpoints.py +240 -0
server/judge.py +95 -12
server/models.py +125 -31
server/tools/browser_agent.py +104 -112
server/tools/curl_exec.py +53 -16
server/tools/html_distiller.py +485 -0
server/tools/search_episode_data.py +290 -57
uv.lock +150 -0

BUILD_NOTES.md CHANGED Viewed

@@ -192,6 +192,24 @@ When in doubt: check the endpoint schema returned by search_endpoints() — it s
 ---
 ## Non-Issues (Resolved in Design)
 - ~~`store_finding` / `get_findings` tools~~ — **Removed**. Value threading happens through episode `history`.

 ---
+### 10. HAR Is the Agent's Only API Knowledge Source — No Catalog Fallback
+**Status:** Design decision, locked
+**Detail:** The `browser_agent` tool uses **only the HAR file** to build the agent's endpoint index and embeddings. The API catalogs (`catalogs/*.json`) are used exclusively by the judge for parameter-sourcing grading — they play no role in the training loop.
+If a HAR yields very few endpoints, **the HAR recording needs to be improved**, not the code. The product does not patch sparse recordings by injecting catalog data into the agent's search corpus. This is intentional: the RL challenge is for the agent to discover and use APIs it has actually observed, not a curated ground-truth list.
+**What goes where:**
+| Data source | Who uses it | How |
+|---|---|---|
+| `hars/*.har` | Agent only | `browser_agent` → `search_endpoints` semantic search |
+| `catalogs/*.json` | Judge only | Parameter-sourcing grading (`judge.py`) |
+**Do not add catalog augmentation back** to `browser_agent.py` or `search_endpoints.py` under any circumstances. If the embed cache shows a large number of entries (e.g. 503 instead of 1), it means catalog entries leaked into the agent — clear the cache and fix the source.
+---
 ## Non-Issues (Resolved in Design)
 - ~~`store_finding` / `get_findings` tools~~ — **Removed**. Value threading happens through episode `history`.

Dockerfile CHANGED Viewed

@@ -67,6 +67,9 @@ COPY --from=builder /app/env /app/env
 # Set PATH to use the virtual environment
 ENV PATH="/app/.venv/bin:$PATH"
 # Set PYTHONPATH so imports work correctly
 ENV PYTHONPATH="/app/env:$PYTHONPATH"
@@ -76,5 +79,4 @@ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
 # Run the FastAPI server
 # The module path is constructed to work with the /app/env structure
-ENV ENABLE_WEB_INTERFACE=true
 CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

 # Set PATH to use the virtual environment
 ENV PATH="/app/.venv/bin:$PATH"
+# Enable Gradio web UI for manual testing
+ENV ENABLE_WEB_INTERFACE=true
 # Set PYTHONPATH so imports work correctly
 ENV PYTHONPATH="/app/env:$PYTHONPATH"
 # Run the FastAPI server
 # The module path is constructed to work with the /app/env structure
 CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -15,38 +15,47 @@ base_path: /web
 # HARvestGym
-*Core idea: Trains LLMs to reverse-engineer and complete web tasks through raw HTTP APIs. No browser. No docs. Just a URL and a task.*
-### Can a small model learn to explore the API surface of any web application — and complete real tasks through those APIs, without ever opening a browser?
-Web applications are full of APIs. Every click in a browser triggers an HTTP call with a precise schema, a specific authentication header, an exact sequence of prerequisites. **HARvestGym trains a small model to do all of that directly** — given a task and a URL, it discovers the relevant endpoints, understands what each one needs, chains the calls in the right order, and completes the task without any browser.
 The model starts with nothing: no schema, no documentation, no endpoint list. It uses tools to explore — issuing requests, inspecting responses, building up its own understanding of how the application works. This is what a developer does when they reverse-engineer an API. The model learns to do the same.
-Given a URL and a task string, the agent must discover which endpoints exist, figure out schemas and parameter dependencies, and execute the right sequence. Zero prior knowledge.
-## What the Model (Policy) Is  Learning
-Given: a natural language task + a live web application URL. No prior knowledge of the application.
-The model calls `browser_agent` first — this returns the list of API endpoints the browser used to complete the task. The model now has a map: it knows what endpoints exist. What it does not know:
 - which of those endpoints are actually needed for this specific task
 - in what order they must be called (you cannot add to a cart before the cart exists)
 - where each required parameter value comes from
 - how to re-authenticate if a session expires mid-episode
-The model must learn to:
-1. **Discover endpoints** — by using a browser agent tool that completes the same task in a real browser while recording all network traffic, then filtering that traffic to extract only the meaningful application API calls (stripping out CDN requests, analytics, static assets). The browser agent runs once and generates the raw discovery data; the model uses this as its starting context.
-2. **Select the right endpoints** — from the browser agent's list, identify the subset relevant to the current task (not every observed endpoint is needed)
-3. **Sequence calls correctly** — determine the prerequisite order (create cart → find product → add item), including calls that must happen before others even though the task description doesn't say so
-4. **Thread parameters** — this is the hardest part. APIs form a dependency graph:
-  - Some values come from a previous response (`cart_id` from step 1 → path param in step 3)
-  - Some values come from the authentication flow (`form_key`, `Bearer token` → header in every subsequent call)
-  - Some values come from the task description (`product name` → search query → `sku` → body of add-item call)
-  - The ground truth catalog defines these relationships precisely; the model learns to navigate them
-5. **Handle auth and errors** — detect 401 / session-expired responses, re-authenticate, and continue; interpret 4xx errors and adjust the next call accordingly
 ---
@@ -61,275 +70,138 @@ The model must learn to:
 │       ▼                                                                 │
 │  ┌────────────────────────────────────────────────────────────────┐     │
 │  │                  Policy Model (RL Agent)                       │     │
-│  │         small model — no prior knowledge of the app           │     │
 │  │                                                                │     │
-│  │  Observation: task + history + session_state + last_result    │     │
 │  │                                                                │     │
-│  │  Step 1   ──► browser_agent(task, url)                        │     │
-│  │  Step 2+  ──► search_endpoints(query)                         │     │
-│  │           ──► curl_exec(command)                              │     │
-│  │           ──► search_episode_data(query)                      │     │
-│  │           ──► done(result)                                    │     │
 │  └────────┬───────────────────────────────────────────────────────┘     │
 │           │                                                             │
-│    ┌──────┴──────────────────────────────┐                             │
-│    │                                     │                             │
-│    ▼                                     ▼                             │
-│  ┌─────────────────────┐    ┌─────────────────────────────────────┐    │
-│  │   Browser Agent     │    │         Environment                 │    │
-│  │  (step 1 only)      │    │                                     │    │
-│  │                     │    │  • Executes curl_exec via subprocess│    │
-│  │ Training:           │    │  • Auto-injects session cookies     │    │
-│  │  Load pre-recorded  │    │  • Smart-truncates response bodies  │    │
-│  │  cached HAR from    │    │  • Indexes full responses into      │    │
-│  │   disk or launch    │    │    per-episode BM25 + GEMMA store   │    │
-│  │   on real browser   │    │  • Manages session_state: cookies,  │    │
-│  │                     │    │    CSRF tokens, auth headers        │    │
-│  │ Inference:          │    └──────────────┬──────────────────────┘    │
-│  │  Launch real browser│                   │                           │
-│  │  via Playwright +   │                   │ HTTP calls (always live)  │
-│  │  bu-30b-a3b-preview │                   ▼                           │
-│  │                     │    ┌─────────────────────────────────────┐    │
-│  │ Both paths produce: │    │     WebArena EC2 (live apps)        │    │
-│  │  • Filtered HAR     │    │                                     │    │
-│  │  • OpenAPI-like spec│    │  :7770  Shopping (Magento 2)        │    │
-│  │  • GEMMA embeddings │    │  :7780  Shopping Admin              │    │
-│  │    for search_      │    │  :9999  Forum (Postmill)            │    │
-│  │    endpoints()      │    │  :8888  Wikipedia (Kiwix)          │    │
-│  └─────────────────────┘    │  :3000  Map (OpenStreetMap)        │    │
-│                             └──────────────┬──────────────────────┘    │
-│                                            │                           │
-│                                            │ episode trajectory        │
-│                                            ▼                           │
-│                             ┌─────────────────────────────────────┐    │
-│                             │    Deterministic Judge              │    │
-│                             │                                     │    │
-│                             │  Per-template programmatic grader:  │    │
-│                             │  • Inspects episode trajectory      │    │
-│                             │  • Optionally probes live app state │    │
-│                             │  • Verifies parameter sourcing      │    │
-│                             │    (TASK_SPEC / PREV_CALL /         │    │
-│                             │     AUTH_FLOW / STATIC / DERIVED)  │    │
-│                             │  • Scores [0.0 → 1.0]              │    │
-│                             └──────────────┬──────────────────────┘    │
-│                                            │                           │
-│                                            ▼                           │
-│                             ┌─────────────────────────────────────┐    │
-│                             │         Reward Signal               │    │
-│                             │                                     │    │
-│                             │  Per-step:                          │    │
-│                             │   +0.2  valid API call (2xx)        │    │
-│                             │   +0.1  new path explored           │    │
-│                             │   +0.25 correct param sourcing      │    │
-│                             │   −0.15 repeated identical call     │    │
-│                             │   −0.3  browser_agent called again  │    │
-│                             │                                     │    │
-│                             │  Episode end:                       │    │
-│                             │   +2.0–+5.0 task complete (easy→hard│    │
-│                             │   −1.5      task failed             │    │
-│                             └──────────────┬──────────────────────┘    │
-│                                            │                           │
-│                                            ▼                           │
-│                             ┌─────────────────────────────────────┐    │
-│                             │    GRPO (via HF TRL)                │    │
-│                             │                                     │    │
-│                             │  8 parallel rollouts per prompt     │    │
-│                             │  Computes advantages without        │    │
-│                             │  a value function                   │    │
-│                             │  Updates policy weights             │    │
-│                             └─────────────────────────────────────┘    │
-│                                            │                           │
-│                                            └──► updated Policy Model   │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
-### Data Flow: Browser Agent → Search Index → Execution
-```
-HAR File (cached using Browser Agent) ──► filter_har_entries()
-                                │
-                                ▼
-                     drop: CDN, analytics, static assets
-                     keep: {method, path, request_body,
-                             response_body, status_code}
-                                │
-                                ▼
-                     extract_openapi_spec()
-                       → structured endpoint catalog
-                          {path, method, params, auth, response_fields}
-                                │
-                         ┌──────┴──────┐
-                         │             │
-                         ▼             ▼
-               build_GEMMA_embeddings  return summary list
-               (search_endpoints       to RL agent:
-                index — full schemas)    [GET /products,
-                                          POST /guest-carts, ...]
-                         │
-                         ▼
-               search_endpoints("create guest cart")
-               → top-3 endpoint schemas with:
-                  • path params + sources
-                  • body params + sources
-                  • auth requirements
-                  • response field names
-```
-### Episode Response Indexing
-```
-curl_exec(command)
-     │
-     ├──► subprocess: execute against live EC2
-     │
-     ├──► index_full_response()
-     │       BM25 index  ── keyword match (IDs, SKUs, tokens)
-     │       GEMMA embed ── semantic match (paraphrases)
-     │       (indexes BEFORE truncation — all items stored)
-     │
-     └──► smart_truncate()
-              non-JSON HTML    → 3,000 chars
-              JSON primitive   → never truncated
-              error (4xx/5xx)  → never truncated
-              small JSON       → returned as-is
-              large array      → first 2 items shown
-                                 + _list_truncated annotation
-                                 + hint to call search_episode_data()
-```
-### Parameter Dependency Graph (what the judge tracks)
-```
-Task: "Add 'Radiant Tee' to a guest cart"
-┌─────────────────────────────────────────────────────────┐
-│  TASK_SPEC ──────────────────────────────────────────┐  │
-│    "Radiant Tee" (product name)                      │  │
-│         │                                            │  │
-│         ▼                                            │  │
-│  GET /rest/V1/products?name=Radiant+Tee              │  │
-│    → items[0].sku = "MH01"          (PREV_CALL) ──┐  │  │
-│                                                   │  │  │
-│  POST /rest/V1/guest-carts                        │  │  │
-│    → body = "cart-abc123"           (PREV_CALL) ──┼──┼─►│
-│                                                   │  │  │
-│  POST /rest/V1/guest-carts/{cartId}/items         │  │  │
-│    path: cartId      ◄────── "cart-abc123" ───────┘  │  │
-│    body: sku         ◄────── "MH01"         ─────────┘  │
-│    body: qty         ◄────── TASK_SPEC (quantity)       │
-│    body: quote_id    ◄────── DERIVED (= cartId)         │
-└─────────────────────────────────────────────────────────┘
-Source types tracked by the judge:
-  TASK_SPEC  — value stated in the task string
-  PREV_CALL  — value from a prior curl response in this episode
-  AUTH_FLOW  — value from a session/token auth step
-  STATIC     — fixed application constant (e.g. store_id = 1)
-  DERIVED    — computed from another param (e.g. quote_id = cart_id)
-```
-### Curriculum: Complexity Tiers
-```
-  Easy  ──────────────────────── graduate when P(success) > 0.7
-  │  Single call, no auth                                    │
-  │  Templates 1, 2                                          │
-  │  1 API call required                                     │
-  │                                                          ▼
-  Medium ──────────────────────── graduate when P(success) > 0.7
-  │  Auth + 1–2 dependent calls                              │
-  │  Templates 3, 4                                          │
-  │  2–3 API calls required                                  │
-  │                                                          ▼
-  Hard ────────────────────────── final tier
-     Multi-step chain, full auth, ID threading
-     Templates 5, 6, 7
-     4–8+ API calls required
-     Reward scaling: ×2.5 vs Easy
-```
-### The RL Agent's Tool: Browser Agent
-The RL agent has access to a **browser agent tool** powered by `[browser-use/bu-30b-a3b-preview](https://huggingface.co/browser-use/bu-30b-a3b-preview)` — a 30B MoE vision-language model (3B active parameters) purpose-built for web task completion, served via the [browser-use](https://github.com/browser-use/browser-use) library on Playwright. When the RL agent calls this tool with a natural language task, the browser agent:
-1. Opens the target application in a real browser
-2. Completes the task by clicking, typing, and navigating — exactly as a human would
-3. All HTTP traffic is intercepted via Playwright network events
-4. Returns the intercepted traffic, filtered down to only the application's own API calls
-The filtering step strips analytics pings, CDN requests, font loads, JS/CSS bundles and returns only `{method, path, request_body, response_body, status_code}` tuples for the app's actual API endpoints.
-**Training vs. inference — what gets cached:**
-- The browser agent output (filtered endpoint list) is pre-computed once per task and cached. During training, the RL model receives this cached result instantly — no live browser session runs.
-- The RL agent's own `curl_exec` calls **always hit the real live WebArena server** — during both training and inference. No API response is mocked or cached.
-- At inference, the browser agent runs live to handle novel tasks or changed application state.
-Full architecture and code: `[BROWSER_AGENT.md](BROWSER_AGENT.md)`
-### Ground Truth: From the Codebase, Not the Browser
-The browser agent shows *what* API calls happen. It does not explain *why* — specifically, it does not document where each parameter comes from or what field constraints exist. That comes from the application codebase.
-For each WebArena application, we perform a one-time static analysis (using a large model against the Docker image source) to produce a **ground truth API catalog** — a precise, hard-coded document specifying:
-```
-endpoint:    POST /rest/V1/guest-carts/{cartId}/items
-method:      POST
-auth:        None (guest cart)
-path_params:
-  cartId:    [string] obtained from: POST /rest/V1/guest-carts → response body
-body:
-  cartItem.sku:       [string] the product's SKU, from: GET /rest/V1/products → items[].sku
-  cartItem.qty:       [number] quantity, from: task specification
-  cartItem.quote_id:  [string] same as cartId
-```
-This is what the judge compares against. The ground truth defines the complete parameter relationship graph for each application.
-Full extraction process: `[GROUND_TRUTH_EXTRACTION.md](GROUND_TRUTH_EXTRACTION.md)`
-### The Training Loop
-```
-Task (natural language) + App URL
-          │
-          ▼
-Policy Model (sees: task + history of all prior actions/results + session_state + findings)
-    │  calls tools to explore and execute
-    ├─► browser_agent(task, url)   → filtered API call list (cached during training)
-    ├─► search_endpoints(query)   → full schema for a specific endpoint
-    ├─► curl_exec(command)        → execute HTTP call, get {status, headers, body}
-    ├─► search_episode_data(q)    → search prior response bodies in this episode
-    └─► done(result)              → declare task complete
-          │
-          ▼
-Live WebArena App (EC2)  ←─── real HTTP responses (always live, never mocked)
-          │
-          ▼
-Judge (compares against ground truth API catalog)
-          │
-          ▼
-Reward Signal  ──►  GRPO  ──►  updated policy
-```
----
-## Target Applications
-All running on a single AWS EC2 instance. Real production software, no simulation.
-| App            | Port | URL                                                                                                                        | Software                                                   |
-| -------------- | ---- | -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
-| Shopping       | 7770 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/) | Magento 2 — open-source e-commerce platform                |
-| Shopping Admin | 7780 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7780/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7780/) | Magento 2 Admin — backend panel for the same store         |
-| Forum          | 9999 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:9999/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:9999/) | Postmill — open-source Reddit-like link aggregation forum  |
-| Wikipedia      | 8888 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:8888/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:8888/) | Kiwix — read-only offline mirror of English Wikipedia      |
-| Map            | 3000 | [http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:3000/](http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:3000/) | OpenStreetMap — open-source collaborative mapping platform |
-Source: [WebArena environment_docker](https://github.com/web-arena-x/webarena/tree/main/environment_docker)
 ---
@@ -343,170 +215,57 @@ What the model sees at each step:
 class Observation(BaseModel):
     task: str                  # Natural language task
     app_base_url: str          # Root URL of the target application
-    last_tool_result: Any      # Result of last tool call:
-                               #   search_endpoints → list of endpoint schema strings
-                               #   curl_exec → {status_code, headers, body (smart-truncated)}
-                               #   search_episode_data → list of matching JSON object strings
-    history: list[dict]        # Full episode trajectory: list of {action, tool_result} pairs
-                               # from all prior steps. The model sees what it already tried,
-                               # enabling value threading (read a cart_id from step 2's response
-                               # and use it in step 5's curl call) and loop avoidance.
-    session_state: dict        # Auto-managed by environment: cookies, tokens, CSRF values
-                               # extracted from all prior HTTP Set-Cookie and response bodies
-                               # e.g. {"PHPSESSID": "abc", "form_key": "xyz", "cart_id": "123"}
     step_count: int
-    max_steps: int             # 20
-```
-`session_state` is maintained by the environment. The model never parses `Set-Cookie` headers — the environment extracts tokens automatically and makes them available. The model decides *when* to authenticate and *which* session values to use; the environment handles *extraction*.
-**curl execution:** The agent outputs a curl command string. The environment parses it and executes it via subprocess against the live EC2 server — the agent machine never has a direct network connection to WebArena. The environment also injects cookies from `session_state` automatically before each call.
-**Response truncation — smart array truncation, not byte cutoff:** HTTP response bodies are processed by a pure Python function before being returned to the model. Rules applied in order:
-1. **Non-JSON body** (HTML, CSS, JS, plain text): truncate to 3,000 characters. HTML from form-serving pages (login, post creation) is kept longer than pure prose because CSRF tokens and `<input>` fields are embedded inside the markup and the model needs to locate them. See the [HTML / Form-Submission Handling](#html--form-submission-handling) section below for how the model is expected to work with HTML responses.
-2. **JSON primitive** (string, number, boolean): never truncated — these are tokens, IDs, confirmations.
-3. **Error response (4xx / 5xx)**: never truncated — the model needs every word to self-correct.
-4. **JSON object or array with no large arrays** (< 3 dict items per array): returned as-is.
-5. **JSON with a large array field** (≥ 3 dict items): keep first 2 items, drop the rest, and add a `_list_truncated` annotation:
-```json
-{
-  "items": [
-    {"sku": "MH01", "name": "Radiant Tee", "price": 22.0},
-    {"sku": "MH02", "name": "Breathe-Easy Tank", "price": 34.0}
-  ],
-  "_list_truncated": {
-    "field": "items",
-    "shown": 2,
-    "total": 50,
-    "note": "Showing 2 of 50 items. Use search_episode_data() to find a specific item from this response."
-  }
-}
-```
-**Episode response indexing:** Every `curl_exec` call indexes the full request and response bodies into a per-episode hybrid index (BM25 for keyword matching + GEMMA semantic embeddings for paraphrase handling). When a list is truncated, all items (not just the 2 shown) are indexed. The model can retrieve any specific object using `search_episode_data("keyword or natural language query")` without needing a filtered API endpoint to exist. See `TOOLS.md` for the full indexing algorithm.
-### Action Space
-The model outputs a single tool call per step. Full technical specifications for all tools (document construction, truncation implementation, index architecture, caveats) are in `[TOOLS.md](./TOOLS.md)`.
-| Tool                         | Input                             | What It Does                                                                                                                                                                              | Output                                                                                                                         |
-| ---------------------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
-| `browser_agent(task, url)`   | Task string + app base URL        | Checks for pre-recorded HAR; if found, processes it — otherwise launches live browser to perform task and record traffic. Extracts OpenAPI-like spec, builds GEMMA embeddings for search. | Summary list of API endpoint names + methods (e.g. `GET /products`). No schemas/headers. Use `search_endpoints()` for details. |
-| `search_endpoints(query)`    | Natural language query            | Semantic search over GEMMA-embedded endpoint spec built by `browser_agent`. Returns full parameter details for matching endpoints.                                                        | Top-3 endpoint schemas (method, path, auth, params with sources, response fields)                                              |
-| `curl_exec(command)`         | Full curl command string          | Executes HTTP call against live EC2 server, indexes full response into episode BM25 store, returns truncated observation.                                                                 | `{status_code, headers, body}` — body smart-truncated; full body indexed to episode store                                      |
-| `search_episode_data(query)` | Keyword or natural language query | Hybrid BM25 + GEMMA semantic search over all request/response bodies from prior `curl_exec` calls in this episode.                                                                        | Top-5 JSON objects from this episode's request/response history                                                                |
-| `done(result?)`              | Optional result string            | Signals task complete, triggers judge evaluation.                                                                                                                                         | Ends episode                                                                                                                   |
-`browser_agent` is called **exactly once per episode at step 1**. During training, it loads a cached pre-recorded HAR file(if available); at inference, it will launch a live browser session. It returns the deduplicated list of API endpoint patterns observed in the network traffic. **If called again after step 1, the call executes normally but a −0.3 penalty is applied to the reward.** `search_endpoints` then provides the full schema for any specific endpoint the model wants to call — searching the GEMMA embeddings built by `browser_agent` from the HAR data.
-`curl_exec` is the primary HTTP action — one string that encodes method, URL, headers, and body together, exactly as API documentation is written. This lets the model leverage its pretrained knowledge of `curl` syntax while producing calls that are self-documenting.
-```bash
-# Step 1 — Discover which endpoint creates a guest cart
-# (model calls search_endpoints first, sees: POST /rest/V1/guest-carts)
-# Step 2 — Create guest cart
-curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'
-# → body: "cart-abc123"  (plain string — never truncated)
-# Step 3 — Find the product SKU (list response, truncated to 2 items + note)
-curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&searchCriteria[filter_groups][0][filters][0][value]=Radiant+Tee'
-# → body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}],"total_count":1}
-# (1 item — not truncated; if 200 items, all 200 indexed, 2 shown in context)
-# Step 4 — Add item (model reads cart-abc123 from step 2, MH01 from step 3 — all in history)
-curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' \
-  -H 'Content-Type: application/json' \
-  -d '{"cartItem":{"sku":"MH01","qty":1,"quote_id":"cart-abc123"}}'
-```
-Values from prior responses (cart IDs, SKUs, tokens) are threaded directly from the growing episode history. `session_state` tokens (cookies, CSRF values) are auto-injected by the environment. If a list response was truncated and the model needs a specific item not shown in the 2-item sample, it calls `search_episode_data("Radiant Tee sku")` — all 200 items are indexed, even though only 2 were shown in context.
-### Prompt Structure:
 ```
-SYSTEM: You are an API agent. Complete the task using only the tools available:
-        browser_agent, search_endpoints, curl_exec, search_episode_data, done.
-        When a response is HTML, look for JSON data embedded in <script> tags or
-        extract values from <input> fields. CSRF tokens appear as hidden inputs:
-        <input type="hidden" name="_csrf_token" value="XYZ">
-TASK: Add "Radiant Tee" to a guest cart at http://ec2-16-59-2-56.../
-[session_state: {}]
-STEP 1 ACTION: browser_agent("Add Radiant Tee to a guest cart", "http://ec2-...:7770/")
-STEP 1 RESULT: {"app": "shopping", "endpoints": [
-  "POST /rest/V1/guest-carts",
-  "GET  /rest/V1/products",
-  "POST /rest/V1/guest-carts/{id}/items",
-  ...
-], "note": "Use search_endpoints() to get full schema for any of these."}
-STEP 2 ACTION: search_endpoints("create guest cart")
-STEP 2 RESULT: ["endpoint: POST /rest/V1/guest-carts | auth: none | returns: string (cartId)", ...]
-STEP 3 ACTION: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'")
-STEP 3 RESULT: {status_code: 200, body: "cart-abc123"}
-STEP 4 ACTION: search_endpoints("find product by name get sku")
-STEP 4 RESULT: ["endpoint: GET /rest/V1/products | query: searchCriteria filters | returns: .items[].sku .items[].name", ...]
-STEP 5 ACTION: curl_exec("curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&searchCriteria[filter_groups][0][filters][0][value]=Radiant+Tee'")
-STEP 5 RESULT: {status_code: 200, body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}],"total_count":1}}
-STEP 6 ACTION: search_endpoints("add item to guest cart cartId")
-STEP 6 RESULT: ["endpoint: POST /rest/V1/guest-carts/{cartId}/items | path: cartId from POST /rest/V1/guest-carts | body: cartItem.sku, cartItem.qty, cartItem.quote_id (same as cartId)", ...]
-STEP 7 ACTION: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' -H 'Content-Type: application/json' -d '{\"cartItem\":{\"sku\":\"MH01\",\"qty\":1,\"quote_id\":\"cart-abc123\"}}'")
-STEP 7 RESULT: {status_code: 200, body: {"item_id": 5, "sku": "MH01", "qty": 1}}
-→ generate STEP 8: done("Radiant Tee added to cart")
-```
-`browser_agent` at step 1 gives the model the full endpoint landscape upfront — it can see `/rest/V1/guest-carts` and `/rest/V1/products` immediately and plan the call sequence before making any HTTP calls. `search_endpoints` fills in the exact parameter schemas. Value threading (`"MH01"`, `"cart-abc123"`) happens through the growing history — if step 5 had returned 200 products truncated to 2, the model would call `search_episode_data("Radiant Tee sku")` to retrieve `MH01` from the episode index.
-### Parameter Relationship Graph (What the Judge Knows)
-The judge holds a complete dependency map for each task:
-```
-Parameter Source Types:
-  TASK_SPEC    — value given directly in the task (e.g., "product #42")
-  PREV_CALL    — value from a prior API response in this episode
-  AUTH_FLOW    — value obtained during authentication (session token, CSRF key)
-  STATIC       — fixed value known from the application (e.g., store_id = 1)
-  DERIVED      — computed from another value (e.g., cart_id = quote_id)
-```
-For each task, the judge knows which parameters fall into which category, and whether the model correctly sourced each value. This is how partial credit works — the model gets reward for correctly threading a `cart_id` even if the final call had a wrong field elsewhere.
 ### Reward Space
 **Per-step:**
-| Signal                       | Value | Trigger                                                                                             |
-| ---------------------------- | ----- | --------------------------------------------------------------------------------------------------- |
-| Valid API call (2xx)         | +0.2  | `curl_exec` returns 2xx status                                                                      |
-| New path called this episode | +0.1  | `curl_exec` normalized path not called before in this episode — discourages looping on one endpoint |
-| Correct parameter sourcing   | +0.25 | judge: value in curl call came from the correct source type                                         |
-| Session value correctly used | +0.1  | auth token/cookie present and correct in curl call                                                  |
-| Repeated identical call      | −0.15 | exact duplicate curl command issued twice                                                           |
-| browser_agent called again   | −0.3  | `browser_agent` called after step 1 — call executes normally, penalty applied to reward             |
-| Malformed curl command       | −0.1  | curl cannot be parsed or executed by the environment                                                |
-| 4xx response (recoverable)   | −0.05 | call failed but episode continues                                                                   |
-Note: `search_endpoints`, `search_episode_data`, and `done` carry no direct per-step reward. Using `search_endpoints` to find the correct schema is indirectly rewarded by enabling correct parameter sourcing (+0.25) in the curl call that follows. `search_episode_data` is indirectly rewarded by allowing the model to retrieve the correct value to place in the next curl command.
 **Episode end:**
 | Outcome                                                     | Reward                                     |
 | ----------------------------------------------------------- | ------------------------------------------ |
 | Task completed correctly                                    | +2.0 to +5.0 (scales with difficulty tier) |
@@ -514,139 +273,148 @@ Note: `search_endpoints`, `search_episode_data`, and `done` carry no direct per-
 | Authentication correctly obtained (even if task fails)      | +0.3                                       |
 | Timeout / task failed entirely                              | −1.5                                       |
 Target signal separation: successful episodes `+3` to `+7`, failed episodes `−2` to `−1`. Required for GRPO.
-> **Reward design insight:** Pure step-level rewards can teach a model to "look busy" — accumulating +0.2 (valid call) and +0.1 (new path) rewards while never converging to task completion. To prevent this, the terminal outcome reward must dominate the sum of all per-step rewards. Two mechanisms enforce this:
->
-> 1. **Hard ceiling on step rewards per episode.** Maximum achievable per-step reward over 20 steps is bounded: `20 × (0.2 + 0.1 + 0.25 + 0.1) = 13`. But a failed episode still ends at `−1.5`, so any correct episode completion still produces a substantially better total.
-> 2. **Curriculum learning as the primary defense.** Easy tasks (Template 1: single GET, no auth) have a trivially short optimal path (2 steps). There is no room to accumulate "fake" exploration reward when the optimal episode only needs 2 calls. The model learns that the terminal reward is the only thing that matters before it encounters tasks long enough to be gamed. Medium and Hard tiers are introduced only after the model reliably solves Easy — by then the behavior pattern is already anchored. This mirrors how SWE-gym-style environments scale difficulty: start simple enough that the reward signal is unambiguous, then broaden.
->
-> **Premature `done()` penalty:** If the judge scores the final state as incorrect (task not completed), the episode ends at `−1.5`. There is no bonus for calling `done()` early — it is strictly worse than continuing to make correct API calls. The model only benefits from calling `done()` when the task is actually complete.
-**Reset behavior:** `reset()` clears session state, episode history, episode BM25 index, step counter. It does not reset the remote application database. The judge evaluates relative state (did the cart contain the item?), not absolute state (is the DB row count exactly N?).
 ---
-## HTML / Form-Submission Handling
-Not every endpoint in the target applications returns JSON. The Forum (Postmill) and Wikipedia (Kiwix) applications rely on HTML form submissions and HTML responses respectively. The agent is designed to handle both transparently.
-### Why This Matters
-A generalizable API agent must work with the full spectrum of web interfaces — not just REST JSON endpoints. Form-based POST submissions (with CSRF tokens, multipart bodies, URL-encoded fields) are ubiquitous in real web applications. Training on them is intentional: the model learns to identify the correct request format from context rather than assuming JSON everywhere.
-### CSRF Token Extraction
-Postmill protects state-changing routes (login, post creation) with a per-session CSRF token. This token is embedded as a hidden `<input>` field in the HTML form:
-```html
-<input type="hidden" name="_csrf_token" value="abc123XYZ">
 ```
-**How the model handles this — no dedicated CSRF tool needed:**
-1. The model issues a GET to the form page (e.g., `GET /login`).
-2. The environment returns the HTML body, truncated to 3,000 characters (raised from 1,000 specifically to ensure hidden input fields near the end of small forms are included).
-3. The model reads the `value` attribute of `input[name="_csrf_token"]` directly from the returned HTML string. HTML parsing is not required — the token appears as a predictable plain-text pattern in the markup.
-4. The model places the extracted token into the subsequent POST body or form field.
-5. The environment auto-extracts any `Set-Cookie` header from the login response into `session_state`, so subsequent requests are automatically authenticated.
-If the CSRF token is positioned after the 3,000-character cutoff (possible in very large rendered pages), the model can call `search_episode_data("_csrf_token")` — the full HTML body is indexed into the episode store before truncation, making the token retrievable by keyword search.
-```bash
-# Forum login flow
-curl -X POST 'http://ec2-.../login' \
-  -H 'Content-Type: application/x-www-form-urlencoded' \
-  -d '_csrf_token=abc123XYZ&_username=user&_password=pass'
-# → 302 redirect + Set-Cookie: PHPSESSID=... (auto-injected into session_state)
-# Forum post creation
-curl -X POST 'http://ec2-.../f/general/submit' \
-  -H 'Content-Type: application/x-www-form-urlencoded' \
-  -d '_csrf_token=abc123XYZ&title=My+Post&body=Hello+World'
-```
-### Wikipedia / HTML-Only Responses
-Kiwix serves static HTML pages — there is no JSON API. The agent treats Wikipedia responses as structured text: search results appear in `<a href>` anchor tags; article content is in `<p>` tags.
-The environment wraps the truncated HTML response in a lightweight JSON envelope before returning it to the model, so the observation format is always `{status_code, headers, body}` regardless of content type:
-```json
-{
-  "status_code": 200,
-  "headers": {"Content-Type": "text/html"},
-  "body": "<html>...<ul class='mw-search-results'><li><a href='/wiki/Mars'>Mars</a>...</ul>..."
-}
 ```
-For Template 2 ("Retrieve article summary for `{title}`"), task completion is verified by confirming the correct article URL was fetched and returned HTTP 200 — not by parsing article content. This makes the grader robust to HTML structure changes.
-### Form vs. JSON Detection
-`curl_exec` detects whether a request is form-encoded or JSON by inspecting the `Content-Type` header in the curl command string:
-- `Content-Type: application/json` → body is JSON, response indexed as JSON
-- `Content-Type: application/x-www-form-urlencoded` or `multipart/form-data` → body is form data, response indexed as text
-- No `Content-Type` (GET requests) → response indexed based on `Content-Type` of the response
-The model is responsible for setting the correct `Content-Type` in its curl command. The system prompt includes explicit guidance on when to use each.
----
-## Tasks
-HARvestGym trains on **7 task templates** rather than a larger flat task list. Each template is a parameterized scenario: one reward function, one ground truth catalog entry, one grader — but potentially hundreds of distinct episode variations produced by substituting different values for the template slots (`{product_name}`, `{category_name}`, etc.).
-If the training went smoothly, then we can scale it to automatically task creation to create all possible aspects of a task.
-**How template parameters are populated:** Before training, a one-time data prep step calls the application's own listing APIs and builds a static **parameter pool** for each template (see `[parameter_pools.json](parameter_pools.json)`, refreshed via `[scripts/build_parameter_pools.py](scripts/build_parameter_pools.py)`):
-| Template slot                 | Source                                                          |
-| ----------------------------- | --------------------------------------------------------------- |
-| `{category_name}`             | `GET /rest/V1/categories` — all leaf category names             |
-| `{product_name}`              | `GET /rest/V1/products?pageSize=200` — all product names + SKUs |
-| `{forum_category}`            | Forum's category listing API                                    |
-| `{title}`, `{sku}`, `{price}` | Generated or sampled from existing product names                |
-Each episode samples randomly from its pool. The model never sees the pool directly — it gets the task string (e.g., `"Add 'Radiant Tee' to a guest cart"`) and must discover the correct endpoint + SKU through its own API calls.
-### Complexity Tiers
-Templates are organized into **complexity tiers** for curriculum training — the model only graduates to harder templates once it reliably solves easier ones:
-| Tier   | Characteristic                                | API calls required |
-| ------ | --------------------------------------------- | ------------------ |
-| Easy   | Single call, no auth                          | 1                  |
-| Medium | Auth + 1–2 dependent calls                    | 2–3                |
-| Hard   | Multi-step chain with ID threading, full auth | 4–8+               |
-### Task Templates
-| #   | Tier   | App            | Template                                               | Key Challenge                                           |
-| --- | ------ | -------------- | ------------------------------------------------------ | ------------------------------------------------------- |
-| 1   | Easy   | Shopping       | List products in category `{category_name}`            | Single GET with query params                            |
-| 2   | Easy   | Wikipedia      | Retrieve article summary for `{title}`                 | Single GET, path parameter resolution                   |
-| 3   | Medium | Shopping       | Add `{product_name}` to a guest cart                   | 2 calls: create cart → add item; ID threading           |
-| 4   | Medium | Forum          | Retrieve all posts in `{forum_category}` (authed)      | Login → extract session → GET                           |
-| 5   | Hard   | Forum          | Create a post titled `{title}` in `{category}`         | Login → extract CSRF `form_key` → POST with full schema |
-| 6   | Hard   | Shopping       | Guest checkout for `{product_name}`                    | 5+ chained calls; cart → item → shipping → payment      |
-| 7   | Hard   | Shopping Admin | Create a new product with SKU `{sku}`, price `{price}` | Admin bearer token → full Magento product schema        |
-Each task has a deterministic programmatic grader (score in `[0.0, 1.0]`):
-- **Easy graders**: check HTTP response body for expected values
-- **Medium graders**: probe application state after episode (e.g., fetch the cart, verify item is present)
-- **Hard graders**: verify multi-step state change in the application (e.g., post exists, checkout created)
-**On optional request parameters:** API responses and real network traffic often contain extra headers and parameters (`X-Requested-With`, `Cache-Control`, correlation IDs, etc.) that are not functionally required. The judge scores only on *required* parameters. Extra or missing optional headers or body params do not affect the reward signal.
 ---

 # HARvestGym
+### Can a small model learn to reverse-engineer any web application's API — and complete real tasks through those APIs, without ever opening a browser?
+Web applications are full of APIs. Every click in a browser triggers an HTTP call with a precise schema, a specific authentication header, an exact sequence of prerequisites. **HARvestGym trains a small model to do all of that directly** — given a task and a URL, it discovers the relevant endpoints, figures out what each one needs, chains the calls in the right order, and completes the task without any browser.
 The model starts with nothing: no schema, no documentation, no endpoint list. It uses tools to explore — issuing requests, inspecting responses, building up its own understanding of how the application works. This is what a developer does when they reverse-engineer an API. The model learns to do the same.
+---
+## How It Works
+```
+Task + App URL
+      │
+      ▼
+Policy Model (RL Agent)
+  small model — no prior knowledge of the app
+  Step 1     ──► browser_agent(task, url)     → filtered API endpoint list
+  Step 2+    ──► search_endpoints(query)      → full schema for a specific endpoint
+             ──► curl_exec(command)           → execute HTTP call, get response
+             ──► search_episode_data(query)   → search prior response bodies
+             ──► done(result)                 → declare task complete
+      │
+      ▼
+Live WebArena Apps (EC2)  ←── real HTTP responses (always live, never mocked)
+      │
+      ▼
+Deterministic Judge (compares against ground truth API catalog)
+      │
+      ▼
+Reward Signal  ──►  GRPO  ──►  updated policy
+```
+The agent calls `browser_agent` once at the start — this runs a real browser to complete the same task while recording all network traffic, then returns the filtered list of API endpoints observed. The agent now has a map of what endpoints exist. What it does *not* know:
 - which of those endpoints are actually needed for this specific task
 - in what order they must be called (you cannot add to a cart before the cart exists)
 - where each required parameter value comes from
 - how to re-authenticate if a session expires mid-episode
+The model must learn to discover all of this on its own.
 ---
 │       ▼                                                                 │
 │  ┌────────────────────────────────────────────────────────────────┐     │
 │  │                  Policy Model (RL Agent)                       │     │
+│  │         small model — no prior knowledge of the app            │     │
 │  │                                                                │     │
+│  │  Observation: task + history + session_state + last_result     │     │
 │  │                                                                │     │
+│  │  Step 1   ──► browser_agent(task, url)                         │     │
+│  │  Step 2+  ──► search_endpoints(query)                          │     │
+│  │           ──► curl_exec(command)                               │     │
+│  │           ──► search_episode_data(query)                       │     │
+│  │           ──► done(result)                                     │     │
 │  └────────┬───────────────────────────────────────────────────────┘     │
 │           │                                                             │
+│    ┌──────┴──────────────────────────────┐                              │
+│    │                                     │                              │
+│    ▼                                     ▼                              │
+│  ┌─────────────────────┐    ┌─────────────────────────────────────┐     │
+│  │   Browser Agent     │    │         Environment                 │     │
+│  │  (step 1 only)      │    │                                     │     │
+│  │                     │    │  • Executes curl_exec via subprocess│     │
+│  │ Training:           │    │  • Auto-injects session cookies     │     │
+│  │  Load pre-recorded  │    │  • Smart-truncates response bodies  │     │
+│  │  cached HAR from    │    │  • Indexes full responses into      │     │
+│  │   disk or launch    │    │    per-episode BM25 + GEMMA store   │     │
+│  │   on real browser   │    │  • Manages session_state: cookies,  │     │
+│  │                     │    │    CSRF tokens, auth headers        │     │
+│  │ Inference:          │    ���──────────────┬──────────────────────┘     │
+│  │  Launch real browser│                   │                            │
+│  │  via Playwright +   │                   │ HTTP calls (always live)   │
+│  │  bu-30b-a3b-preview │                   ▼                            │
+│  │                     │    ┌─────────────────────────────────────┐     │
+│  │ Both paths produce: │    │     WebArena EC2 (live apps)        │     │
+│  │  • Filtered HAR     │    │                                     │     │
+│  │  • OpenAPI-like spec│    │  :7770  Shopping (Magento 2)        │     │
+│  │  • GEMMA embeddings │    │  :7780  Shopping Admin              │     │
+│  │    for search_      │    │  :9999  Forum (Postmill)            │     │
+│  │    endpoints()      │    │  :8888  Wikipedia (Kiwix)           │     │
+│  └─────────────────────┘    │  :3000  Map (OpenStreetMap)         │     │
+│                             └──────────────┬──────────────────────┘     │
+│                                            │                            │
+│                                            │ episode trajectory         │
+│                                            ▼                            │
+│                             ┌─────────────────────────────────────┐     │
+│                             │    Deterministic Judge              │     │
+│                             │                                     │     │
+│                             │  Per-template programmatic grader:  │     │
+│                             │  • Inspects episode trajectory      │     │
+│                             │  • Optionally probes live app state │     │
+│                             │  • Verifies parameter sourcing      │     │
+│                             │    (TASK_SPEC / PREV_CALL /         │     │
+│                             │     AUTH_FLOW / STATIC / DERIVED)   │     │
+│                             │  • Scores [0.0 → 1.0]               │     │
+│                             └──────────────┬──────────────────────┘     │
+│                                            │                            │
+│                                            ▼                            │
+│                             ┌─────────────────────────────────────┐     │
+│                             │         Reward Signal               │     │
+│                             │                                     │     │
+│                             │  Per-step:                          │     │
+│                             │   +0.2  valid API call (2xx)        │     │
+│                             │   +0.1  new path explored           │     │
+│                             │   +0.25 correct param sourcing      │     │
+│                             │   −0.15 repeated identical call     │     │
+│                             │   −0.3  browser_agent called again  │     │
+│                             │                                     │     │
+│                             │  Episode end:                       │     │
+│                             │   +2.0–+5.0 task complete (easy→hard│     │
+│                             │   −1.5      task failed             │     │
+│                             └──────────────┬──────────────────────┘     │
+│                                            │                            │
+│                                            ▼                            │
+│                             ┌─────────────────────────────────────┐     │
+│                             │    GRPO (via HF TRL)                │     │
+│                             │                                     │     │
+│                             │  8 parallel rollouts per prompt     │     │
+│                             │  Computes advantages without        │     │
+│                             │  a value function                   │     │
+│                             │  Updates policy weights             │     │
+│                             └────��────────────────────────────────┘     │
+│                                            │                            │
+│                                            └──► updated Policy Model    │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
+---
+## Target Applications
+All running on a single AWS EC2 instance — real production software, no simulation.
+| App            | Port | Software                                          |
+| -------------- | ---- | ------------------------------------------------- |
+| Shopping       | 7770 | Magento 2 — open-source e-commerce platform       |
+| Shopping Admin | 7780 | Magento 2 Admin — backend panel for the same store|
+| Forum          | 9999 | Postmill — open-source Reddit-like forum          |
+| Wikipedia      | 8888 | Kiwix — read-only offline mirror of Wikipedia     |
+| Map            | 3000 | OpenStreetMap — collaborative mapping platform    |
+Source: [WebArena environment_docker](https://github.com/web-arena-x/webarena/tree/main/environment_docker)
+---
+## Tasks
+HARvestGym trains on **7 task templates** across three complexity tiers. Each template is a parameterized scenario: one reward function, one ground truth catalog entry, one grader — but potentially hundreds of distinct episode variations produced by substituting different values for the template slots (`{product_name}`, `{category_name}`, etc.).
+### Complexity Tiers
+| Tier   | Characteristic                                | API calls required |
+| ------ | --------------------------------------------- | ------------------ |
+| Easy   | Single call, no auth                          | 1                  |
+| Medium | Auth + 1–2 dependent calls                    | 2–3                |
+| Hard   | Multi-step chain with ID threading, full auth | 4–8+               |
+The model only graduates to harder templates once it reliably solves easier ones.
+### Task Templates
+| #   | Tier   | App            | Template                                               | Key Challenge                                           |
+| --- | ------ | -------------- | ------------------------------------------------------ | ------------------------------------------------------- |
+| 1   | Easy   | Shopping       | List products in category `{category_name}`            | Single GET with query params                            |
+| 2   | Easy   | Wikipedia      | Retrieve article summary for `{title}`                 | Single GET, path parameter resolution                   |
+| 3   | Medium | Shopping       | Add `{product_name}` to a guest cart                   | 2 calls: create cart → add item; ID threading           |
+| 4   | Medium | Forum          | Retrieve all posts in `{forum_category}` (authed)      | Login → extract session → GET                           |
+| 5   | Hard   | Forum          | Create a post titled `{title}` in `{category}`         | Login → extract CSRF `form_key` → POST with full schema |
+| 6   | Hard   | Shopping       | Guest checkout for `{product_name}`                    | 5+ chained calls; cart → item → shipping → payment      |
+| 7   | Hard   | Shopping Admin | Create a new product with SKU `{sku}`, price `{price}` | Admin bearer token → full Magento product schema        |
+**Template parameters** are populated from a static parameter pool built by querying the live applications before training (see `parameter_pools.json`, refreshed via `scripts/build_parameter_pools.py`). Each episode samples randomly from its pool — the model never sees the pool directly, it must discover the correct values through its own API calls.
+Each task has a deterministic programmatic grader (score in `[0.0, 1.0]`):
+- **Easy graders**: check HTTP response body for expected values
+- **Medium graders**: probe application state after episode (e.g., fetch the cart, verify item is present)
+- **Hard graders**: verify multi-step state change in the application (e.g., post exists, checkout created)
 ---
 class Observation(BaseModel):
     task: str                  # Natural language task
     app_base_url: str          # Root URL of the target application
+    last_tool_result: Any      # Result of last tool call
+    history: list[dict]        # Full episode trajectory: [{action, tool_result}, ...]
+    session_state: dict        # Auto-managed: cookies, tokens, CSRF values
     step_count: int
+    max_steps: int             # 20
 ```
+`session_state` is maintained by the environment — the model decides *when* to authenticate and *which* session values to use; the environment handles *extraction* from `Set-Cookie` headers and response bodies.
+**Response truncation** rules applied in order:
+1. Non-JSON body (HTML, CSS): truncated to 3,000 characters
+2. JSON primitive (string, number): never truncated — these are tokens, IDs
+3. Error response (4xx/5xx): never truncated — the model needs every word to self-correct
+4. Small JSON (no large arrays): returned as-is
+5. Large JSON array (≥ 3 items): first 2 items shown + `_list_truncated` annotation + hint to call `search_episode_data()`
+Every `curl_exec` call indexes the *full* response into a per-episode hybrid index (BM25 + GEMMA embeddings) *before* truncation — so all items are always retrievable even when only 2 were shown.
+### Action Space
+The model outputs a single tool call per step.
+| Tool                         | Input                             | Output                                                                          |
+| ---------------------------- | --------------------------------- | ------------------------------------------------------------------------------- |
+| `browser_agent(task, url)`   | Task string + app base URL        | Summary list of API endpoint names + methods (e.g. `GET /products`)             |
+| `search_endpoints(query)`    | Natural language query            | Top-3 endpoint schemas (method, path, auth, params with sources, response fields)|
+| `curl_exec(command)`         | Full curl command string          | `{status_code, headers, body}` — body smart-truncated; full body indexed        |
+| `search_episode_data(query)` | Keyword or natural language query | Top-5 JSON objects from this episode's request/response history                 |
+| `done(result?)`              | Optional result string            | Ends episode, triggers judge evaluation                                         |
+`browser_agent` is called **exactly once per episode at step 1**. Calling it again applies a −0.3 penalty. During training, it loads a cached HAR file; at inference, it launches a live browser session.
+Full technical specifications for all tools: [`TOOLS.md`](./TOOLS.md)
 ### Reward Space
 **Per-step:**
+| Signal                       | Value  | Trigger                                                              |
+| ---------------------------- | ------ | -------------------------------------------------------------------- |
+| Valid API call (2xx)         | +0.2   | `curl_exec` returns 2xx status                                       |
+| New path called this episode | +0.1   | Normalized path not called before — discourages looping              |
+| Correct parameter sourcing   | +0.25  | Judge: value came from the correct source type                       |
+| Session value correctly used | +0.1   | Auth token/cookie present and correct in curl call                   |
+| Repeated identical call      | −0.15  | Exact duplicate curl command issued twice                            |
+| browser_agent called again   | −0.3   | `browser_agent` called after step 1                                  |
+| Malformed curl command       | −0.1   | curl cannot be parsed or executed                                    |
+| 4xx response (recoverable)   | −0.05  | Call failed but episode continues                                    |
 **Episode end:**
 | Outcome                                                     | Reward                                     |
 | ----------------------------------------------------------- | ------------------------------------------ |
 | Task completed correctly                                    | +2.0 to +5.0 (scales with difficulty tier) |
 | Authentication correctly obtained (even if task fails)      | +0.3                                       |
 | Timeout / task failed entirely                              | −1.5                                       |
 Target signal separation: successful episodes `+3` to `+7`, failed episodes `−2` to `−1`. Required for GRPO.
+> **Reward design note:** Pure step-level rewards can teach a model to "look busy" — accumulating exploration rewards while never completing the task. The terminal outcome reward is designed to dominate the sum of all per-step rewards. The curriculum is the primary defense: Easy tasks have a trivially short optimal path (2 steps), so there's no room to accumulate fake exploration reward before the model learns that the terminal reward is what matters.
 ---
+## Key Design Decisions
+### Browser Agent as a Discovery Tool
+The RL agent has access to a **browser agent tool** powered by [`bu-30b-a3b-preview`](https://huggingface.co/browser-use/bu-30b-a3b-preview) — a 30B MoE vision-language model (3B active parameters) served via the [browser-use](https://github.com/browser-use/browser-use) library on Playwright. When called, it completes the task in a real browser while intercepting all network traffic, then returns the filtered API call list.
+**Training vs. inference:** The browser agent output is pre-computed and cached per task during training — the RL model receives it instantly, no live browser session runs. At inference, the browser agent runs live to handle novel tasks.
+Full details: [`BROWSER_AGENT.md`](BROWSER_AGENT.md)
+### Ground Truth from the Codebase, Not the Browser
+The browser agent shows *what* API calls happen. It does not explain *why* — where each parameter comes from or what field constraints exist. That comes from a one-time static analysis of each WebArena application's Docker image source, producing a **ground truth API catalog**:
+```
+endpoint:    POST /rest/V1/guest-carts/{cartId}/items
+path_params:
+  cartId:    obtained from: POST /rest/V1/guest-carts → response body
+body:
+  cartItem.sku:       the product's SKU, from: GET /rest/V1/products → items[].sku
+  cartItem.qty:       quantity, from: task specification
+  cartItem.quote_id:  same as cartId
 ```
+The judge uses this to verify not just *what* the model called, but *where each parameter value came from*. Source types: `TASK_SPEC`, `PREV_CALL`, `AUTH_FLOW`, `STATIC`, `DERIVED`. This is how partial credit works — the model gets reward for correctly threading a `cart_id` even if the final call had a wrong field elsewhere.
+Full extraction process: [`GROUND_TRUTH_EXTRACTION.md`](GROUND_TRUTH_EXTRACTION.md)
+### HTML and Form-Based Applications
+Not every endpoint returns JSON. The Forum (Postmill) relies on HTML form submissions with CSRF tokens; Wikipedia (Kiwix) serves static HTML pages. The agent handles both:
+- **CSRF tokens**: The model GETs the form page, reads the `value` attribute of `input[name="_csrf_token"]` from the returned HTML, and places it in the subsequent POST. If the token is beyond the 3,000-character truncation point, it calls `search_episode_data("_csrf_token")` — the full HTML is indexed before truncation.
+- **HTML-only responses**: Wikipedia responses are returned in the standard `{status_code, headers, body}` envelope. Search results appear in `<a href>` tags; article content in `<p>` tags.
+---
+## Example Episode
 ```
+TASK: Add "Radiant Tee" to a guest cart at http://ec2-16-59-2-56.../
+STEP 1: browser_agent("Add Radiant Tee to a guest cart", "http://ec2-...:7770/")
+→ {"endpoints": ["POST /rest/V1/guest-carts", "GET /rest/V1/products",
+                  "POST /rest/V1/guest-carts/{id}/items", ...]}
+STEP 2: search_endpoints("create guest cart")
+→ ["endpoint: POST /rest/V1/guest-carts | auth: none | returns: string (cartId)"]
+STEP 3: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts' -H 'Content-Type: application/json'")
+→ {status_code: 200, body: "cart-abc123"}
+STEP 4: search_endpoints("find product by name get sku")
+→ ["endpoint: GET /rest/V1/products | query: searchCriteria filters | returns: .items[].sku"]
+STEP 5: curl_exec("curl 'http://ec2-.../rest/V1/products?searchCriteria[filter_groups][0][filters][0][field]=name&...[value]=Radiant+Tee'")
+→ {status_code: 200, body: {"items":[{"sku":"MH01","name":"Radiant Tee","price":22.0}]}}
+STEP 6: search_endpoints("add item to guest cart cartId")
+→ ["endpoint: POST /rest/V1/guest-carts/{cartId}/items | path: cartId from POST /rest/V1/guest-carts | body: cartItem.sku, cartItem.qty, cartItem.quote_id"]
+STEP 7: curl_exec("curl -X POST 'http://ec2-.../rest/V1/guest-carts/cart-abc123/items' -H 'Content-Type: application/json' -d '{\"cartItem\":{\"sku\":\"MH01\",\"qty\":1,\"quote_id\":\"cart-abc123\"}}'")
+→ {status_code: 200, body: {"item_id": 5, "sku": "MH01", "qty": 1}}
+STEP 8: done("Radiant Tee added to cart")
+```
+Values from prior responses (`cart-abc123`, `MH01`) are threaded directly from the growing episode history. If step 5 had returned 200 products truncated to 2, the model would call `search_episode_data("Radiant Tee sku")` to retrieve `MH01` from the episode index.
+---
+## Setup
+### Prerequisites
+- Docker installed and running
+- Python 3.11+ with [`uv`](https://github.com/astral-sh/uv)
+- A Hugging Face token with read access
+### Local Development
+```bash
+# Clone and enter the project
+git clone <your-hf-space-url>
+cd HARvestGym
+# Install dependencies
+uv sync
+# Validate the OpenEnv spec
+openenv validate
+# Build and run the Docker image
+docker build -t harvgym .
+docker run -p 8000:8000 harvgym
+# Run the inference script
+HF_TOKEN=hf_xxx uv run inference.py
+```
+### Environment Variables
+| Variable       | Default                              | Required | Purpose                                   |
+| -------------- | ------------------------------------ | -------- | ----------------------------------------- |
+| `HF_TOKEN`     | —                                    | **Yes**  | HuggingFace auth token                    |
+| `API_BASE_URL` | `https://router.huggingface.co/v1`   | No       | LLM API endpoint                          |
+| `MODEL_NAME`   | `google/gemma-4-31B-it`              | No       | Model for inference                       |
+| `HARVGYM_TASK` | `har_classify_easy`                  | No       | Override which task to run                |
+### API Endpoints
+```bash
+# Reset episode
+curl -X POST http://localhost:8000/reset
+# Execute a step
+curl -X POST http://localhost:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"tool": "browser_agent", "args": {"task": "...", "url": "..."}}'
+# Get current state
+curl http://localhost:8000/state
+```
 ---
+## Baseline Performance
+Scores generated by running `uv run inference.py` with `google/gemma-4-31B-it` via the HuggingFace Router.
+| Task | Difficulty | Score | Steps | Result | Notes |
+| ---- | ---------- | ----- | ----- | ------ | ----- |
+| `easy_list_pants` | Easy | **0.74** | 6 | PASS | List products in 'Pants' category |
+| `medium_cart_camera_backpack` | Medium | **0.56** | 20 | PASS | Add Camera Backpack to guest cart |
+| `medium_cart_flannel_jacket` | Medium | **0.60** | 20 | PASS | Add Flannel Jacket to guest cart |
+| `hard_checkout_ripstop_pants` | Hard | **0.22** | 20 | FAIL | Full guest checkout (hit step limit) |
+| **Overall** | — | **0.53** | — | **3/4 passed** | |
+> **To regenerate:** `HF_TOKEN=hf_xxx uv run inference.py`

hars/forum.har CHANGED Viewed

The diff for this file is too large to render. See raw diff

hars/shopping.har CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc116ba8f3cb52e5fe8335dcaf1eefbb88161df4d494f30832338f57bbe52ed9
-size 13392889

 version https://git-lfs.github.com/spec/v1
+oid sha256:878c65126d999ef91d6b75438431f7c1b9164ac580140bd7ca61ef693cacd76c
+size 115555293

hars/shopping_admin.har CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1c9d48fde1cc1f65c0e81ff9a46d1b23fece9c352b1c548de91ca848ee2411f1
-size 60961456

 version https://git-lfs.github.com/spec/v1
+oid sha256:ce2209be9f3265b0a1682935171fb932c0056bc67b7517419b3ef5239c2ba2be
+size 148077790

hars/wikipedia.har CHANGED Viewed

The diff for this file is too large to render. See raw diff

inference.py CHANGED Viewed

@@ -29,39 +29,65 @@ Usage:
 import asyncio
 import json
 import os
 import sys
 import textwrap
 from typing import Any, List, Optional
 from openai import OpenAI
 # ---------------------------------------------------------------------------
 # Configuration — auto-detect provider from env vars
 # ---------------------------------------------------------------------------
-_OPENROUTER_KEY = os.getenv("OPENROUTER_API_KEY")
-_HF_TOKEN = os.getenv("HF_TOKEN")
 if _OPENROUTER_KEY:
-    # OpenRouter mode — great for testing with powerful models cheaply
     API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
     API_KEY = _OPENROUTER_KEY
     MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31b-it")
-    HF_TOKEN = _HF_TOKEN  # still needed for the env server itself
     print(f"[INFO] Provider: OpenRouter | Model: {MODEL_NAME}", flush=True)
-elif _HF_TOKEN:
     # HuggingFace Inference Router — final submission target
     API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
-    API_KEY = _HF_TOKEN
-    HF_TOKEN = _HF_TOKEN
-    MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
     print(f"[INFO] Provider: HuggingFace | Model: {MODEL_NAME}", flush=True)
-else:
-    raise ValueError(
-        "No API key found. Set either:\n"
-        "  OPENROUTER_API_KEY=sk-or-xxx   (for OpenRouter testing)\n"
-        "  HF_TOKEN=hf_xxx                (for HuggingFace submission)"
-    )
 # ---------------------------------------------------------------------------
 # Tool definitions — proper OpenAI function-calling format.
@@ -79,23 +105,22 @@ TOOLS = [
         "function": {
             "name": "browser_agent",
             "description": (
-                "Discovers all available API endpoints for the target web application "
-                "by replaying recorded HTTP traffic (HAR files) and augmenting with a "
-                "ground-truth API catalog. Returns a structured index of endpoints with "
-                "methods, paths, and parameter schemas. "
-                "MUST be called exactly once at step 1 before any other tool. "
-                "Do NOT call again after step 1."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "task": {
                         "type": "string",
-                        "description": "The natural language task description (e.g. 'Add Radiant Tee to cart')",
                     },
                     "url": {
                         "type": "string",
-                        "description": "Base URL of the target application (e.g. 'http://host:7770/')",
                     },
                 },
                 "required": ["task", "url"],
@@ -109,20 +134,19 @@ TOOLS = [
         "function": {
             "name": "search_endpoints",
             "description": (
-                "Search the discovered API endpoint catalog using a natural language query. "
-                "Returns matching endpoint schemas including HTTP method, full path, "
-                "required/optional parameters, authentication requirements, and example payloads. "
-                "Use this after browser_agent to find the exact endpoint and payload structure "
-                "before making a curl_exec call. "
-                "Examples: 'create guest cart', 'add item to cart', 'set shipping address', "
-                "'place order', 'get products by category'."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "query": {
                         "type": "string",
-                        "description": "Natural language description of the API operation you need (e.g. 'create guest cart', 'add item to cart')",
                     },
                 },
                 "required": ["query"],
@@ -136,14 +160,18 @@ TOOLS = [
         "function": {
             "name": "curl_exec",
             "description": (
-                "Execute an HTTP request against the live application. "
-                "Returns {status_code, headers, body} with the full API response. "
-                "Session cookies and auth tokens are automatically injected — do NOT "
-                "manually set Cookie or Authorization headers. "
-                "Use proper curl syntax with -s (silent) flag. "
-                "Always include -H 'Content-Type: application/json' for POST/PUT requests. "
-                "Read the response body carefully — it contains IDs (cart_id, item_id, order_id) "
-                "needed for subsequent steps."
             ),
             "parameters": {
                 "type": "object",
@@ -151,12 +179,10 @@ TOOLS = [
                     "command": {
                         "type": "string",
                         "description": (
-                            "Full curl command string. Examples:\n"
-                            "  GET:  curl -s -X GET 'http://host/rest/V1/categories'\n"
-                            "  POST: curl -s -X POST 'http://host/rest/V1/guest-carts' -H 'Content-Type: application/json'\n"
-                            "  POST with body: curl -s -X POST 'http://host/rest/V1/guest-carts/CART_ID/items' "
-                            "-H 'Content-Type: application/json' "
-                            "-d '{\"cartItem\":{\"sku\":\"MH01-XS-Black\",\"qty\":1,\"quote_id\":\"CART_ID\"}}'"
                         ),
                     },
                 },
@@ -171,18 +197,21 @@ TOOLS = [
         "function": {
             "name": "search_episode_data",
             "description": (
-                "Search all prior API responses collected during this episode for a specific value. "
-                "Use when a previous curl_exec response was long/truncated and you need to find "
-                "a specific item, ID, SKU, or field value from it. "
-                "Examples: 'cart id from guest-carts response', 'product SKU for Radiant Tee', "
-                "'category id for Gear'."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "query": {
                         "type": "string",
-                        "description": "What value you are looking for in the episode's response history (e.g. 'cart id', 'SKU for Radiant Tee')",
                     },
                 },
                 "required": ["query"],
@@ -196,58 +225,197 @@ TOOLS = [
         "function": {
             "name": "done",
             "description": (
-                "Signal that the task is fully complete. Call this ONLY after you have "
-                "successfully executed all required API calls and verified the outcome "
-                "(e.g. item was added to cart, order was placed). "
-                "Do NOT call done() as a fallback or when uncertain — it triggers final scoring."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "result": {
                         "type": "string",
-                        "description": "Optional summary of what was accomplished (e.g. 'Added Radiant Tee to cart CART_ID, item_id=42')",
                     },
                 },
                 "additionalProperties": False,
             },
-            "strict": False,  # result is optional
         },
     },
 ]
 BENCHMARK = "harvgym"
 MAX_STEPS = 20
-TEMPERATURE = 0.2        # Lower temp → more deterministic tool calls
-MAX_TOKENS = 1024        # More room for reasoning + JSON
 SUCCESS_SCORE_THRESHOLD = 0.5
-# Task definitions: use FIXED task descriptions so the model always knows
-# exactly what to do (env.reset() may randomize, but we tell it the target)
-TASKS = [
-    {
-        "task_name": "har_classify_easy",
         "template_id": 1,
-        "description": "List products in the 'Gear' category",
-        "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
         "difficulty": "easy",
-    },
-    {
-        "task_name": "har_classify_medium",
-        "template_id": 3,
-        "description": "Add 'Radiant Tee' (SKU: MH01-XS-Black) to a guest cart",
-        "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
-        "difficulty": "medium",
-    },
-    {
-        "task_name": "har_pipeline_hard",
-        "template_id": 6,
-        "description": "Complete a full guest checkout for 'Radiant Tee' (SKU: MH01-XS-Black)",
-        "app_base_url": "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/",
-        "difficulty": "hard",
-    },
 ]
 # ---------------------------------------------------------------------------
 # Logging helpers (hackathon format)
 # ---------------------------------------------------------------------------
@@ -279,27 +447,35 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
 # ---------------------------------------------------------------------------
 SYSTEM_PROMPT = textwrap.dedent("""
-You are an API agent completing real-world tasks on a live Magento e-commerce application
-by calling HTTP APIs in the correct sequence.
 WORKFLOW:
-1. Call browser_agent (step 1 only) to discover all available API endpoints.
-2. Call search_endpoints to find the exact endpoint schema you need.
-3. Call curl_exec to execute the HTTP request. Read the response — it contains IDs for next steps.
-4. Repeat steps 2-3 for each action in the task (create cart → add item → set address → place order).
-5. Call done() only after the task is fully accomplished.
-KEY FACTS about Magento REST API (http://host:7770/rest/V1/):
-- Guest cart flow: POST /guest-carts → returns cartId string
-- Add item: POST /guest-carts/{cartId}/items  body: {"cartItem":{"sku":"...","qty":1,"quote_id":"{cartId}"}}
-- Shipping: POST /guest-carts/{cartId}/shipping-information
-- Place order: PUT /guest-carts/{cartId}/order
-RULES:
-- Call browser_agent exactly once at step 1.
-- Always call search_endpoints before curl_exec to get the correct path and payload.
-- Cart IDs, item IDs, and order IDs come from curl_exec responses — read them carefully.
-- Do not call done() until the task is verified complete.
 """).strip()
@@ -334,12 +510,10 @@ def build_user_prompt(task_desc: str, app_base_url: str, step: int,
     """Build the user prompt for each step."""
     history_lines = []
     if history:
-        # Show last 8 steps with meaningful result summaries
-        for h in history[-8:]:
             result = h.get("result", {})
-            # For curl results: show status_code + first 200 chars of body
             if isinstance(result, dict) and "status_code" in result:
-                body_preview = str(result.get("body", ""))[:300]
                 result_summary = f'status={result["status_code"]} body={body_preview}'
             else:
                 result_summary = str(result)[:300]
@@ -351,18 +525,23 @@ def build_user_prompt(task_desc: str, app_base_url: str, step: int,
     session_str = json.dumps(session_state, indent=2)[:500] if session_state else "{}"
     last_result_str = _format_result_for_context(last_result)
     return textwrap.dedent(f"""
     TASK: {task_desc}
     APP URL: {app_base_url}
     STEP: {step}/{MAX_STEPS}
-    SESSION STATE (cookies/tokens auto-managed):
     {session_str}
     LAST TOOL RESULT:
     {last_result_str}
-    HISTORY (last {len(history_lines)} steps):
     {chr(10).join(history_lines) if history_lines else "  (none yet)"}
     What is your next tool call? Output ONLY the JSON object.
@@ -384,51 +563,86 @@ def get_model_action(client: OpenAI, task_desc: str, app_base_url: str,
             "X-Title": "HARvestGym",
         }
-    try:
-        # Use the OpenAI tools API — each tool has name + description + typed params.
-        # tool_choice="required" forces the model to always call a tool (no free text).
-        completion = client.chat.completions.create(
-            model=MODEL_NAME,
-            messages=[
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": user_prompt},
-            ],
-            tools=TOOLS,
-            tool_choice="required",
-            temperature=TEMPERATURE,
-            max_tokens=MAX_TOKENS,
-            stream=False,
-            extra_headers=extra_headers if extra_headers else None,
-        )
-        choice = completion.choices[0] if completion.choices else None
-        if choice is None:
-            print(f"[DEBUG] Empty choices at step {step}", flush=True)
             if step == 1:
                 return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
-            return {"tool": "done", "args": {"result": "Empty API response"}}
-        # Native tool call response (preferred — gives us structured args directly)
-        if choice.message.tool_calls:
-            tc = choice.message.tool_calls[0]
-            tool_name = tc.function.name
-            try:
-                args = json.loads(tc.function.arguments)
-            except json.JSONDecodeError:
-                args = {}
-            print(f"[DEBUG] Tool call: {tool_name}({list(args.keys())})", flush=True)
-            return {"tool": tool_name, "args": args}
-        # Some providers return plain text even with tools (fallback)
-        text = (choice.message.content or "").strip()
-        print(f"[DEBUG] No tool_calls in response, trying text parse: {text[:100]}", flush=True)
-        return _parse_text_fallback(text, step, task_desc, app_base_url)
-    except Exception as exc:
-        print(f"[DEBUG] LLM call failed at step {step}: {exc}", flush=True)
-        if step == 1:
-            return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
-        return {"tool": "done", "args": {"result": f"LLM error: {exc}"}}
 def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str) -> dict:
@@ -451,9 +665,12 @@ def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str
     print(f"[DEBUG] Text fallback failed: {text[:200]}", flush=True)
     if step == 1:
         return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
-    if "done" in text.lower():
         return {"tool": "done", "args": {}}
-    return {"tool": "done", "args": {"result": f"Parse error: {text[:100]}"}}
 # ---------------------------------------------------------------------------
@@ -471,9 +688,20 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
     template_id = task_config["template_id"]
     task_description = task_config["description"]
     app_base_url = task_config["app_base_url"]
-    # Pin the template via env var so reset() samples from the right pool
-    os.environ["HARVGYM_TASK"] = task_name   # use name, not int
     env = HARvestGymEnvironment()
@@ -489,11 +717,15 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
     try:
         obs = env.reset()
-        # CRITICAL: use the env-sampled task description — the judge grades exactly
-        # what env.reset() returned (random category/product), not our hardcoded string.
         task_desc = obs.task or task_description
         base_url = obs.app_base_url or app_base_url
         for step in range(1, MAX_STEPS + 1):
             if getattr(obs, "done", False):
                 break
@@ -523,6 +755,12 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
                 last_result = obs.last_tool_result
                 session_state = dict(obs.session_state or {})
                 history.append({
                     "step": step,
                     "tool": tool,
@@ -534,6 +772,7 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
                 reward = -0.1
                 done = False
                 error_str = str(exc)[:200]
             rewards.append(reward)
             steps_taken = step
@@ -546,19 +785,32 @@ async def run_episode(task_config: dict, client: OpenAI) -> dict:
         # Reward range by design: terminal success = +2 to +5, terminal fail = -1.5
         # Use a generous baseline so partial credit shows up.
         total_reward = sum(rewards)
-        # Normalise to [0,1]: shift by +1.5 (min), divide by max-possible per task
-        # Template 1 max=2, Template 3 max=3.5, Template 6 max=5 → use 5.0 as ceiling
-        score = max(0.0, min(1.0, (total_reward + 1.5) / (5.0 + 1.5)))
         success = total_reward >= 0.5   # any positive terminal reward = success
     except Exception as exc:
         error_str = str(exc)[:200]
         print(f"[DEBUG] Episode error: {error_str}", flush=True)
     finally:
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
     return {
         "task_name": task_name,
         "success": success,
         "steps": steps_taken,
         "score": score,
@@ -574,21 +826,38 @@ async def main() -> None:
     client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
     results = []
-    for task_config in TASKS:
         result = await run_episode(task_config, client)
         results.append(result)
-    # Summary
-    print("\n[SUMMARY]", flush=True)
-    for r in results:
-        status = "PASS" if r["success"] else "FAIL"
         print(
-            f"  [{status}] {r['task_name']} — score={r['score']:.2f} steps={r['steps']}",
             flush=True,
         )
     overall_score = sum(r["score"] for r in results) / len(results) if results else 0.0
-    print(f"\n  overall_score={overall_score:.2f}", flush=True)
 if __name__ == "__main__":

 import asyncio
 import json
 import os
+import re
 import sys
 import textwrap
+from pathlib import Path
 from typing import Any, List, Optional
 from openai import OpenAI
+# ---------------------------------------------------------------------------
+# Verbose mode — set VERBOSE=1 for detailed per-step debugging.
+# Keep disabled (default) for hackathon submission to avoid stdout noise.
+# ---------------------------------------------------------------------------
+VERBOSE = os.getenv("VERBOSE", "0").strip() == "1"
+def vprint(*args) -> None:
+    """Print only when VERBOSE=1."""
+    if VERBOSE:
+        print(*args, flush=True)
+def vdump(label: str, obj: Any, max_chars: int = 2000) -> None:
+    """Pretty-print a labelled object when verbose."""
+    if not VERBOSE:
+        return
+    try:
+        text = json.dumps(obj, indent=2)
+    except Exception:
+        text = str(obj)
+    if len(text) > max_chars:
+        text = text[:max_chars] + f"\n... [truncated {len(text)-max_chars} chars]"
+    print(f"\n{'─'*60}\n[VERBOSE] {label}\n{'─'*60}\n{text}\n", flush=True)
 # ---------------------------------------------------------------------------
 # Configuration — auto-detect provider from env vars
 # ---------------------------------------------------------------------------
+HF_TOKEN = os.getenv("HF_TOKEN")
+if not HF_TOKEN:
+    raise ValueError(
+        "HF_TOKEN is required but not set.\n"
+        "Usage: HF_TOKEN=hf_xxx uv run inference.py"
+    )
+_OPENROUTER_KEY = os.getenv("OPENROUTER_API_KEY")
 if _OPENROUTER_KEY:
+    # OpenRouter mode — useful for local testing with alternative models
     API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
     API_KEY = _OPENROUTER_KEY
     MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31b-it")
     print(f"[INFO] Provider: OpenRouter | Model: {MODEL_NAME}", flush=True)
+else:
     # HuggingFace Inference Router — final submission target
     API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+    API_KEY = HF_TOKEN
+    MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31B-it")
     print(f"[INFO] Provider: HuggingFace | Model: {MODEL_NAME}", flush=True)
 # ---------------------------------------------------------------------------
 # Tool definitions — proper OpenAI function-calling format.
         "function": {
             "name": "browser_agent",
             "description": (
+                "Discovers API endpoints available on the target web application by "
+                "replaying real browser traffic recorded in HAR files. Returns a "
+                "structured index of observed endpoints with HTTP methods, paths, "
+                "request/response schemas, and headers (including any auth headers seen). "
+                "Call this ONCE at step 1 to build the endpoint index. Do not call again."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "task": {
                         "type": "string",
+                        "description": "The task you need to accomplish (used to prioritise relevant endpoints)",
                     },
                     "url": {
                         "type": "string",
+                        "description": "Base URL of the target application",
                     },
                 },
                 "required": ["task", "url"],
         "function": {
             "name": "search_endpoints",
             "description": (
+                "Semantic search over the endpoints and it's details found by the browser_agent. "
+                "Returns matching endpoint schemas: HTTP method, full path, required parameters, "
+                "authentication requirements (bearer token, cookie, etc.), and example payloads. "
+                "Use this before every curl_exec call to confirm the correct endpoint shape. "
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "query": {
                         "type": "string",
+                        "description": "Natural language description of the operation you need "
+                                       "(e.g. 'authenticate user', 'list products in category', "
+                                       "'add item to cart', 'place order')",
                     },
                 },
                 "required": ["query"],
         "function": {
             "name": "curl_exec",
             "description": (
+                "Execute an HTTP request against the live application and return the response. "
+                "Response contains: status_code, headers, body. "
+                "For HTML pages, body is a structured summary: page title, forms with action URLs "
+                "and field values (product IDs, form_key, etc.), and visible text. "
+                "IMPORTANT: When the body shows '[Forms — N found]' with POST actions containing "
+                "'/checkout/cart/add/...', the 'product' field IS the product ID and the action "
+                "URL IS the add-to-cart endpoint — use these directly without calling "
+                "search_episode_data again. "
+                "Session state (cookies, auth tokens) is automatically managed — previously "
+                "obtained tokens are injected into subsequent requests automatically. "
+                "If the response is truncated or you need a value from an earlier response, "
+                "use search_episode_data."
             ),
             "parameters": {
                 "type": "object",
                     "command": {
                         "type": "string",
                         "description": (
+                            "Full curl command string (use -s for silent mode). "
+                            "Include -H 'Content-Type: application/json' for POST/PUT/PATCH. "
+                            "Example: curl -s -X POST 'http://host/api/endpoint' "
+                            "-H 'Content-Type: application/json' -d '{\"key\":\"value\"}'"
                         ),
                     },
                 },
         "function": {
             "name": "search_episode_data",
             "description": (
+                "Semantic search over all API responses collected during this episode. "
+                "Full response bodies are stored untruncated — this tool finds the right "
+                "response and returns a compact preview with a note showing the total "
+                "number of matching objects (e.g. '47 items total — showing first 3'). "
+                "Use more specific queries to drill into a particular value. "
+                "Examples: 'id for category Gear', 'SKU for Radiant Tee', "
+                "'cart id', 'authentication token', 'order id after checkout'."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "query": {
                         "type": "string",
+                        "description": "What you are looking for in the response history of the curl commands you executed "
+                                       "(e.g. 'category id for Pants', 'cart id', 'token')",
                     },
                 },
                 "required": ["query"],
         "function": {
             "name": "done",
             "description": (
+                "Signal that the task is complete and trigger final scoring. "
+                "Call this immediately after the response that fulfills the task objective. "
+                "Do not make further API calls once the goal is met — call done() next."
             ),
             "parameters": {
                 "type": "object",
                 "properties": {
                     "result": {
                         "type": "string",
+                        "description": "Brief summary of what was accomplished",
                     },
                 },
                 "additionalProperties": False,
             },
+            "strict": False,
         },
     },
 ]
 BENCHMARK = "harvgym"
 MAX_STEPS = 20
+TEMPERATURE = 0.2
+MAX_TOKENS = 64000
 SUCCESS_SCORE_THRESHOLD = 0.5
+# ---------------------------------------------------------------------------
+# Task bank — 5 easy (T1: list products), 5 medium (T3: add to cart),
+# 5 hard (T6: guest checkout).
+#
+# For hackathon submission only the first easy/medium/hard is run.
+# Full evaluation runs all 15 sequentially to measure generalisation.
+# ---------------------------------------------------------------------------
+_SHOP = "http://ec2-16-59-2-56.us-east-2.compute.amazonaws.com:7770/"
+def _load_parameter_pools_for_tasks() -> dict:
+    pools_path = Path(__file__).with_name("parameter_pools.json")
+    with open(pools_path) as f:
+        return json.load(f)
+_TASK_PARAMETER_POOLS = _load_parameter_pools_for_tasks()
+def _lookup_category_params(category_name: str) -> dict:
+    categories = _TASK_PARAMETER_POOLS.get("template_1", {}).get("pool", {}).get("category_name", [])
+    for item in categories:
+        if item.get("name") == category_name:
+            return {
+                "category_name": item["name"],
+                "category_id": item.get("category_id"),
+            }
+    raise ValueError(f"Unknown category in parameter_pools.json: {category_name}")
+def _lookup_product_params(product_name: str, template_id: int) -> dict:
+    products = _TASK_PARAMETER_POOLS.get(f"template_{template_id}", {}).get("pool", {}).get("product_name", [])
+    for item in products:
+        if item.get("name") == product_name:
+            return {
+                "product_name": item["name"],
+                "sku": item.get("sku", ""),
+                "product_id": item.get("product_id"),
+            }
+    raise ValueError(
+        f"Unknown product in parameter_pools.json for template {template_id}: {product_name}"
+    )
+def _make_easy_task(task_name: str, category_name: str) -> dict:
+    return {
+        "task_name": task_name,
         "template_id": 1,
         "difficulty": "easy",
+        "description": f"List products in the '{category_name}' category",
+        "app_base_url": _SHOP,
+        "task_params": _lookup_category_params(category_name),
+    }
+def _make_product_task(task_name: str, template_id: int, difficulty: str,
+                       description: str, product_name: str) -> dict:
+    return {
+        "task_name": task_name,
+        "template_id": template_id,
+        "difficulty": difficulty,
+        "description": description,
+        "app_base_url": _SHOP,
+        "task_params": _lookup_product_params(product_name, template_id),
+    }
+TASKS_EASY = [
+    _make_easy_task("easy_list_pants", "Pants"),
+    _make_easy_task("easy_list_bags", "Bags"),
+    _make_easy_task("easy_list_jackets", "Jackets"),
+    _make_easy_task("easy_list_hoodies", "Hoodies"),
+    _make_easy_task("easy_list_shoes", "Shoes"),
 ]
+TASKS_MEDIUM = [
+    _make_product_task(
+        "medium_cart_camera_backpack",
+        3,
+        "medium",
+        "Add 'Camera Backpack Bagsmar DSLR Waterproof' to a guest cart",
+        "Camera Backpack Bagsmar DSLR Waterproof",
+    ),
+    _make_product_task(
+        "medium_cart_flannel_jacket",
+        3,
+        "medium",
+        "Add 'Noldares Flannel Jacket For Men Plaid' to a guest cart",
+        "Noldares Flannel Jacket For Men Plaid",
+    ),
+    _make_product_task(
+        "medium_cart_champion_hoodie",
+        3,
+        "medium",
+        "Add 'Champion Hoodie Big And Tall Zip Up' to a guest cart",
+        "Champion Hoodie Big And Tall Zip Up",
+    ),
+    _make_product_task(
+        "medium_cart_cargo_pants",
+        3,
+        "medium",
+        "Add 'Mens Slim Fit Cargo Pants Athletic' to a guest cart",
+        "Mens Slim Fit Cargo Pants Athletic",
+    ),
+    _make_product_task(
+        "medium_cart_leather_jacket",
+        3,
+        "medium",
+        "Add 'Inesver Womens Leather Jacket Open Front' to a guest cart",
+        "Inesver Womens Leather Jacket Open Front",
+    ),
+]
+TASKS_HARD = [
+    _make_product_task(
+        "hard_checkout_ripstop_pants",
+        6,
+        "hard",
+        "Complete a full guest checkout for 'Mens Ripstop Cargo Pants Tactical Hiking'",
+        "Mens Ripstop Cargo Pants Tactical Hiking",
+    ),
+    _make_product_task(
+        "hard_checkout_flannel_jacket",
+        6,
+        "hard",
+        "Complete a full guest checkout for 'Noldares Flannel Jacket For Men Plaid'",
+        "Noldares Flannel Jacket For Men Plaid",
+    ),
+    _make_product_task(
+        "hard_checkout_champion_hoodie",
+        6,
+        "hard",
+        "Complete a full guest checkout for 'Champion Hoodie Big And Tall Zip Up'",
+        "Champion Hoodie Big And Tall Zip Up",
+    ),
+    _make_product_task(
+        "hard_checkout_fleece_jacket",
+        6,
+        "hard",
+        "Complete a full guest checkout for 'Womens Fleece Jacket With Hood Winter'",
+        "Womens Fleece Jacket With Hood Winter",
+    ),
+    _make_product_task(
+        "hard_checkout_totes_boots",
+        6,
+        "hard",
+        "Complete a full guest checkout for 'Totes Womens Cold Weather Boots Nicole'",
+        "Totes Womens Cold Weather Boots Nicole",
+    ),
+]
+# Default: first of each tier (hackathon submission format)
+TASKS = [ TASKS_EASY[0], TASKS_MEDIUM[0], TASKS_MEDIUM[1], TASKS_HARD[0]]
+# Set EVAL_MODE=full to run all 1By default, we have three tasks.5; EVAL_MODE=easy/medium/hard to run only that tier
+_EVAL_MODE = os.getenv("EVAL_MODE", "").strip().lower()
+if _EVAL_MODE == "full":
+    TASKS = TASKS_EASY + TASKS_MEDIUM + TASKS_HARD
+elif _EVAL_MODE == "easy":
+    TASKS = TASKS_EASY
+elif _EVAL_MODE == "one":
+    TASKS = [TASKS_MEDIUM[1]]
+elif _EVAL_MODE == "medium":
+    TASKS = TASKS_MEDIUM
+elif _EVAL_MODE == "hard":
+    TASKS = TASKS_HARD
 # ---------------------------------------------------------------------------
 # Logging helpers (hackathon format)
 # ---------------------------------------------------------------------------
 # ---------------------------------------------------------------------------
 SYSTEM_PROMPT = textwrap.dedent("""
+You are an API agent. Your goal is to complete a real-world task on a live web application
+by calling its HTTP APIs in the correct order using the tools provided.
 WORKFLOW:
+1. Call browser_agent once at step 1 to build an index of the application's endpoints.
+2. Use search_endpoints before each API call to find the correct path, method, and required parameters.
+3. Execute HTTP requests with curl_exec in the correct dependency order. Read every response
+   carefully — IDs, tokens, and error messages in responses are required inputs for (or
+   corrective signals for) subsequent calls.
+4. If a prior response contains a value you need now, use search_episode_data to retrieve it.
+5. Call done() as soon as the task objective is met.
+PRINCIPLES:
+- Always discover before you act: browser_agent first, then search_endpoints.
+- Extract every ID, token, and key from API responses and use them in subsequent calls.
+- If a request returns an auth error, find and call the auth endpoint first, then retry.
+- Never fabricate IDs or values — they must come from actual API responses.
+- Once the task is done, call done() immediately — do not make additional calls.
+- Some tasks require a sequence of dependent API calls where the output of one call
+  (an ID, token, or key) is the required input to the next. Identify these dependencies
+  before acting: plan the call sequence, then execute step by step.
+- Never call the same endpoint repeatedly hoping for a different result. If a call already
+  succeeded, move on to the next step. Repeating the same call wastes steps and incurs a
+  penalty.
+- Do not brute-force or vary parameters at random. If a call fails, read the error message
+  in LAST TOOL RESULT, diagnose the cause logically, and use that understanding to form the
+  correct next request.
+- If you are partway through a multi-step task and a required ID or token is missing, use
+  search_episode_data to retrieve it from an earlier response before making a new call.
 """).strip()
     """Build the user prompt for each step."""
     history_lines = []
     if history:
+        for h in history:
             result = h.get("result", {})
             if isinstance(result, dict) and "status_code" in result:
+                body_preview = str(result.get("body", ""))[:800]
                 result_summary = f'status={result["status_code"]} body={body_preview}'
             else:
                 result_summary = str(result)[:300]
     session_str = json.dumps(session_state, indent=2)[:500] if session_state else "{}"
     last_result_str = _format_result_for_context(last_result)
+    # Highlight form_key if available — it's needed for HTML form POSTs
+    form_key_hint = ""
+    if session_state.get("form_key"):
+        form_key_hint = f"\nFORM_KEY (auto-extracted, use in POST body): {session_state['form_key']}"
     return textwrap.dedent(f"""
     TASK: {task_desc}
     APP URL: {app_base_url}
     STEP: {step}/{MAX_STEPS}
+    SESSION STATE (cookies/tokens auto-managed):{form_key_hint}
     {session_str}
     LAST TOOL RESULT:
     {last_result_str}
+    HISTORY (all {len(history_lines)} steps so far):
     {chr(10).join(history_lines) if history_lines else "  (none yet)"}
     What is your next tool call? Output ONLY the JSON object.
             "X-Title": "HARvestGym",
         }
+    vprint(f"\n{'═'*60}")
+    vprint(f"[VERBOSE] === LLM CALL — step {step} ===")
+    vdump("SYSTEM PROMPT", SYSTEM_PROMPT)
+    vdump("USER PROMPT", user_prompt)
+    # Retry loop — backs off on 429 rate limits, never calls done() on a transient error
+    _MAX_RETRIES = 3
+    _BASE_DELAY  = 3   # seconds before first retry
+    for _attempt in range(_MAX_RETRIES):
+        try:
+            completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": SYSTEM_PROMPT},
+                    {"role": "user", "content": user_prompt},
+                ],
+                tools=TOOLS,
+                tool_choice="required",
+                temperature=TEMPERATURE,
+                max_tokens=MAX_TOKENS,
+                stream=False,
+                extra_headers=extra_headers if extra_headers else None,
+            )
+            choice = completion.choices[0] if completion.choices else None
+            vdump(f"RAW COMPLETION (step {step}, attempt {_attempt+1})", {
+                "finish_reason": choice.finish_reason if choice else None,
+                "usage": dict(completion.usage) if hasattr(completion, "usage") and completion.usage else None,
+                "message_content": choice.message.content if choice else None,
+                "tool_calls_count": len(choice.message.tool_calls or []) if choice else 0,
+            })
+            # Detect null/empty completion (upstream rate limit without a 429 status)
+            if choice is None or (
+                choice.finish_reason is None
+                and not (choice.message.tool_calls or (choice.message.content or "").strip())
+            ):
+                wait = _BASE_DELAY * (2 ** _attempt)
+                print(f"[DEBUG] Null completion at step {step} (attempt {_attempt+1}/{_MAX_RETRIES}) — waiting {wait}s", flush=True)
+                import time; time.sleep(wait)
+                continue  # retry
+            # Native tool call (preferred)
+            if choice.message.tool_calls:
+                tc = choice.message.tool_calls[0]
+                tool_name = tc.function.name
+                try:
+                    args = json.loads(tc.function.arguments)
+                except json.JSONDecodeError:
+                    args = {}
+                print(f"[DEBUG] Tool call: {tool_name}({list(args.keys())})", flush=True)
+                vdump(f"TOOL CALL ARGS — {tool_name}", args)
+                return {"tool": tool_name, "args": args}
+            # Plain-text fallback (some providers ignore tool_choice="required")
+            text = (choice.message.content or "").strip()
+            print(f"[DEBUG] No tool_calls in response, trying text parse: {text[:100]}", flush=True)
+            vprint(f"[VERBOSE] Full text response: {text}")
+            return _parse_text_fallback(text, step, task_desc, app_base_url)
+        except Exception as exc:
+            exc_str = str(exc)
+            is_rate_limit = "429" in exc_str or "rate" in exc_str.lower() or "Rate" in exc_str
+            if is_rate_limit and _attempt < _MAX_RETRIES - 1:
+                wait = _BASE_DELAY * (2 ** _attempt)
+                print(f"[DEBUG] Rate-limited at step {step} (attempt {_attempt+1}/{_MAX_RETRIES}) — waiting {wait}s then retrying", flush=True)
+                import time; time.sleep(wait)
+                continue  # retry
+            # Non-rate-limit error or exhausted retries — don't call done(), keep episode alive
+            print(f"[DEBUG] LLM call failed at step {step} (attempt {_attempt+1}): {exc}", flush=True)
             if step == 1:
                 return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
+            return {"tool": "search_endpoints", "args": {"query": "available API endpoints"}}
+    # Exhausted all retries — nudge forward without ending the episode
+    print(f"[DEBUG] All {_MAX_RETRIES} retries exhausted at step {step} — nudging with search_endpoints", flush=True)
+    if step == 1:
+        return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
+    return {"tool": "search_endpoints", "args": {"query": "available API endpoints"}}
 def _parse_text_fallback(text: str, step: int, task_desc: str, app_base_url: str) -> dict:
     print(f"[DEBUG] Text fallback failed: {text[:200]}", flush=True)
     if step == 1:
         return {"tool": "browser_agent", "args": {"task": task_desc, "url": app_base_url}}
+    # If the model explicitly says done, honour it — but only if text clearly indicates it.
+    # A bare parse error should NEVER call done() because that would trigger the judge early.
+    if re.search(r"\bdone\b", text.lower()) and len(text.strip()) < 80:
         return {"tool": "done", "args": {}}
+    # Keep episode alive — nudge the model rather than punishing with a premature judge call.
+    return {"tool": "search_endpoints", "args": {"query": "available REST API endpoints"}}
 # ---------------------------------------------------------------------------
     template_id = task_config["template_id"]
     task_description = task_config["description"]
     app_base_url = task_config["app_base_url"]
+    task_params = dict(task_config.get("task_params") or {})
+    # Pin the exact task so env.reset() uses the intended category/product instead
+    # of sampling a random item from the template pool.
+    os.environ["HARVGYM_TASK"] = str(template_id)
+    os.environ["HARVGYM_TASK_SPEC_JSON"] = json.dumps(
+        {
+            "template_id": template_id,
+            "description": task_description,
+            "params": task_params,
+            "base_url": app_base_url,
+            "difficulty": task_config.get("difficulty", ""),
+        }
+    )
     env = HARvestGymEnvironment()
     try:
         obs = env.reset()
+        # Use the env-provided task description, which now matches the exact task spec
+        # passed in above.
         task_desc = obs.task or task_description
         base_url = obs.app_base_url or app_base_url
+        vprint(f"\n{'═'*60}")
+        vprint(f"[VERBOSE] EPISODE START — {task_name}")
+        vdump("INITIAL OBSERVATION (from env.reset)", obs.__dict__ if hasattr(obs, "__dict__") else str(obs))
         for step in range(1, MAX_STEPS + 1):
             if getattr(obs, "done", False):
                 break
                 last_result = obs.last_tool_result
                 session_state = dict(obs.session_state or {})
+                vprint(f"\n[VERBOSE] ── step {step} result ──")
+                vdump(f"TOOL RESULT — {tool}", last_result)
+                vprint(f"[VERBOSE] reward={reward:.3f}  done={done}")
+                if done:
+                    vdump("FINAL OBS (done=True)", obs.__dict__ if hasattr(obs, "__dict__") else str(obs))
                 history.append({
                     "step": step,
                     "tool": tool,
                 reward = -0.1
                 done = False
                 error_str = str(exc)[:200]
+                vprint(f"[VERBOSE] Step {step} EXCEPTION: {exc}")
             rewards.append(reward)
             steps_taken = step
         # Reward range by design: terminal success = +2 to +5, terminal fail = -1.5
         # Use a generous baseline so partial credit shows up.
         total_reward = sum(rewards)
+        # Score: normalize to [0, 1] using per-template terminal-reward ceiling.
+        # Template 1 (easy) max=2.0, Template 3 (medium) max=3.5, Template 6 (hard) max=5.0.
+        # Shift by +1.5 so that the fail reward (-1.5) maps to 0 and max maps to 1.
+        _TEMPLATE_REWARD_CEIL = {1: 2.0, 3: 3.5, 6: 5.0}
+        _reward_ceil = _TEMPLATE_REWARD_CEIL.get(task_config.get("template_id"), 5.0)
+        score = max(0.0, min(1.0, (total_reward + 1.5) / (_reward_ceil + 1.5)))
         success = total_reward >= 0.5   # any positive terminal reward = success
+        vprint(f"\n[VERBOSE] ── episode end — {task_name} ──")
+        vprint(f"[VERBOSE] total_reward={total_reward:.3f}  score={score:.3f}  success={success}")
+        vprint(f"[VERBOSE] rewards per step: {[f'{r:.2f}' for r in rewards]}")
     except Exception as exc:
         error_str = str(exc)[:200]
         print(f"[DEBUG] Episode error: {error_str}", flush=True)
     finally:
+        try:
+            env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error: {e}", flush=True)
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
     return {
         "task_name": task_name,
+        "difficulty": task_config.get("difficulty", "unknown"),
+        "description": task_config.get("description", ""),
         "success": success,
         "steps": steps_taken,
         "score": score,
     client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
     results = []
+    for i, task_config in enumerate(TASKS, 1):
+        difficulty = task_config.get("difficulty", "")
+        desc = task_config.get("description", "")
+        print(
+            f"\n{'='*70}\n[TASK {i}/{len(TASKS)}] ({difficulty.upper()}) {desc}\n{'='*70}",
+            flush=True,
+        )
         result = await run_episode(task_config, client)
         results.append(result)
+        status = "PASS" if result["success"] else "FAIL"
         print(
+            f"  → [{status}] score={result['score']:.2f}  steps={result['steps']}",
             flush=True,
         )
+    # Summary grouped by difficulty tier
+    print("\n" + "="*70, flush=True)
+    print("[SUMMARY]", flush=True)
+    for tier in ["easy", "medium", "hard"]:
+        tier_results = [r for r in results if r.get("difficulty") == tier]
+        if not tier_results:
+            continue
+        avg = sum(r["score"] for r in tier_results) / len(tier_results)
+        passes = sum(1 for r in tier_results if r["success"])
+        print(f"\n  {tier.upper()} ({passes}/{len(tier_results)} passed, avg score={avg:.2f}):", flush=True)
+        for r in tier_results:
+            status = "PASS" if r["success"] else "FAIL"
+            print(f"    [{status}] {r['task_name']} — score={r['score']:.2f} steps={r['steps']}", flush=True)
     overall_score = sum(r["score"] for r in results) / len(results) if results else 0.0
+    print(f"\n  OVERALL score={overall_score:.2f}  ({sum(1 for r in results if r['success'])}/{len(results)} passed)",
+          flush=True)
 if __name__ == "__main__":

openenv_harvestgym.egg-info/PKG-INFO CHANGED Viewed

@@ -11,6 +11,8 @@ Requires-Dist: requests>=2.31.0
 Requires-Dist: rank-bm25>=0.2.2
 Requires-Dist: openai>=1.0.0
 Requires-Dist: numpy>=1.24.0
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

 Requires-Dist: rank-bm25>=0.2.2
 Requires-Dist: openai>=1.0.0
 Requires-Dist: numpy>=1.24.0
+Requires-Dist: beautifulsoup4>=4.14.3
+Requires-Dist: lxml>=6.0.2
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_harvestgym.egg-info/SOURCES.txt CHANGED Viewed

@@ -14,6 +14,7 @@ server/models.py
 server/tools/__init__.py
 server/tools/browser_agent.py
 server/tools/curl_exec.py
 server/tools/search_endpoints.py
 server/tools/search_episode_data.py
 tests/test_e2e_episode.py

 server/tools/__init__.py
 server/tools/browser_agent.py
 server/tools/curl_exec.py
+server/tools/embed_cache.py
 server/tools/search_endpoints.py
 server/tools/search_episode_data.py
 tests/test_e2e_episode.py

openenv_harvestgym.egg-info/requires.txt CHANGED Viewed

@@ -6,6 +6,8 @@ requests>=2.31.0
 rank-bm25>=0.2.2
 openai>=1.0.0
 numpy>=1.24.0
 [dev]
 pytest>=8.0.0

 rank-bm25>=0.2.2
 openai>=1.0.0
 numpy>=1.24.0
+beautifulsoup4>=4.14.3
+lxml>=6.0.2
 [dev]
 pytest>=8.0.0

parameter_pools.json CHANGED Viewed

@@ -4,7 +4,7 @@
     "generated_at": "2026-04-08",
     "source": {
       "categories": "GET /rest/V1/categories/list (live EC2, port 7780)",
-      "products": "GET /rest/V1/products type_id=simple + configurable (live EC2, port 7780)",
       "forums": "HTML scrape of /forums page (live EC2, port 9999) + HTTP 200 verification per slug",
       "wikipedia": "Well-known Wikipedia titles \u2014 verified by grader at runtime via HEAD /wikipedia_en.../A/{slug}",
       "admin_skus": "Generated (HAR-TEST-NNN namespace, no collision with existing catalog)",
@@ -13,10 +13,10 @@
     "grader_matching_notes": {
       "template_1": "category_id stored for grader; category_name is what appears in task string",
       "template_2": "expected_slug stored for grader (verifies HTTP 200); display title is in task string",
-      "template_3": "sku stored for grader (verifies cart item); product name is in task string",
       "template_4": "forum_name must exist and return posts; no exact value matching needed",
       "template_5": "title is free-form generated; grader only checks post was created in that forum",
-      "template_6": "sku stored for grader (verifies order was placed); product name is in task string",
       "template_7": "sku+price are exact \u2014 grader calls GET /rest/V1/products/{sku} to verify creation"
     }
   },
@@ -29,149 +29,37 @@
     ],
     "pool": {
       "category_name": [
-        {
-          "name": "Gear",
-          "category_id": 3
-        },
         {
           "name": "Bags",
           "category_id": 4
         },
         {
-          "name": "Fitness Equipment",
-          "category_id": 5
-        },
-        {
-          "name": "Watches",
-          "category_id": 6
-        },
-        {
-          "name": "New Luma Yoga Collection",
-          "category_id": 8
-        },
-        {
-          "name": "Training",
-          "category_id": 9
-        },
-        {
-          "name": "Video Download",
-          "category_id": 10
-        },
-        {
-          "name": "Men",
-          "category_id": 11
-        },
-        {
-          "name": "Tops",
-          "category_id": 12
-        },
-        {
-          "name": "Bottoms",
-          "category_id": 13
-        },
-        {
-          "name": "Jackets",
-          "category_id": 14
-        },
-        {
-          "name": "Hoodies & Sweatshirts",
-          "category_id": 15
-        },
-        {
-          "name": "Tees",
-          "category_id": 16
-        },
-        {
-          "name": "Tanks",
-          "category_id": 17
-        },
-        {
-          "name": "Pants",
-          "category_id": 18
-        },
-        {
-          "name": "Shorts",
-          "category_id": 19
-        },
-        {
-          "name": "Women",
-          "category_id": 20
-        },
-        {
-          "name": "Tops",
-          "category_id": 21
-        },
-        {
-          "name": "Bottoms",
-          "category_id": 22
         },
         {
           "name": "Jackets",
-          "category_id": 23
-        },
-        {
-          "name": "Hoodies & Sweatshirts",
-          "category_id": 24
-        },
-        {
-          "name": "Tees",
-          "category_id": 25
-        },
-        {
-          "name": "Bras & Tanks",
-          "category_id": 26
-        },
-        {
-          "name": "Pants",
-          "category_id": 27
-        },
-        {
-          "name": "Shorts",
-          "category_id": 28
-        },
-        {
-          "name": "Women Sale",
-          "category_id": 30
         },
         {
-          "name": "Men Sale",
-          "category_id": 31
         },
         {
           "name": "Pants",
-          "category_id": 32
-        },
-        {
-          "name": "Tees",
-          "category_id": 33
-        },
-        {
-          "name": "Erin Recommends",
-          "category_id": 34
-        },
-        {
-          "name": "Performance Fabrics",
-          "category_id": 35
-        },
-        {
-          "name": "Eco Friendly",
-          "category_id": 36
-        },
-        {
-          "name": "Sale",
-          "category_id": 37
         },
         {
-          "name": "What's New",
-          "category_id": 38
         },
         {
-          "name": "Performance Sportswear New",
-          "category_id": 39
         },
         {
-          "name": "Eco Collection New",
-          "category_id": 40
         }
       ]
     }
@@ -298,236 +186,94 @@
     "pool": {
       "product_name": [
         {
-          "name": "Joust Duffle Bag",
-          "sku": "24-MB01"
-        },
-        {
-          "name": "Strive Shoulder Pack",
-          "sku": "24-MB04"
-        },
-        {
-          "name": "Crown Summit Backpack",
-          "sku": "24-MB03"
-        },
-        {
-          "name": "Wayfarer Messenger Bag",
-          "sku": "24-MB05"
-        },
-        {
-          "name": "Rival Field Messenger",
-          "sku": "24-MB06"
-        },
-        {
-          "name": "Fusion Backpack",
-          "sku": "24-MB02"
-        },
-        {
-          "name": "Impulse Duffle",
-          "sku": "24-UB02"
-        },
-        {
-          "name": "Voyage Yoga Bag",
-          "sku": "24-WB01"
-        },
-        {
-          "name": "Compete Track Tote",
-          "sku": "24-WB02"
-        },
-        {
-          "name": "Savvy Shoulder Tote",
-          "sku": "24-WB05"
-        },
-        {
-          "name": "Endeavor Daytrip Backpack",
-          "sku": "24-WB06"
-        },
-        {
-          "name": "Driven Backpack",
-          "sku": "24-WB03"
-        },
-        {
-          "name": "Overnight Duffle",
-          "sku": "24-WB07"
-        },
-        {
-          "name": "Push It Messenger Bag",
-          "sku": "24-WB04"
-        },
-        {
-          "name": "Affirm Water Bottle",
-          "sku": "24-UG06"
-        },
-        {
-          "name": "Dual Handle Cardio Ball",
-          "sku": "24-UG07"
-        },
-        {
-          "name": "Zing Jump Rope",
-          "sku": "24-UG04"
-        },
-        {
-          "name": "Pursuit Lumaflex&trade; Tone Band",
-          "sku": "24-UG02"
-        },
-        {
-          "name": "Go-Get'r Pushup Grips",
-          "sku": "24-UG05"
-        },
-        {
-          "name": "Quest Lumaflex&trade; Band",
-          "sku": "24-UG01"
-        },
-        {
-          "name": "Sprite Foam Yoga Brick",
-          "sku": "24-WG084"
-        },
-        {
-          "name": "Sprite Foam Roller",
-          "sku": "24-WG088"
-        },
-        {
-          "name": "Harmony Lumaflex&trade; Strength Band Kit",
-          "sku": "24-UG03"
-        },
-        {
-          "name": "Sprite Stasis Ball 55 cm",
-          "sku": "24-WG081-gray"
-        },
-        {
-          "name": "Sprite Stasis Ball 65 cm",
-          "sku": "24-WG082-gray"
-        },
-        {
-          "name": "Sprite Stasis Ball 75 cm",
-          "sku": "24-WG083-gray"
-        },
-        {
-          "name": "Sprite Yoga Strap 6 foot",
-          "sku": "24-WG085"
-        },
-        {
-          "name": "Sprite Yoga Strap 8 foot",
-          "sku": "24-WG086"
-        },
-        {
-          "name": "Sprite Yoga Strap 10 foot",
-          "sku": "24-WG087"
-        },
-        {
-          "name": "Aim Analog Watch",
-          "sku": "24-MG04"
-        },
-        {
-          "name": "Endurance Watch",
-          "sku": "24-MG01"
-        },
-        {
-          "name": "Summit Watch",
-          "sku": "24-MG03"
-        },
-        {
-          "name": "Cruise Dual Analog Watch",
-          "sku": "24-MG05"
-        },
-        {
-          "name": "Dash Digital Watch",
-          "sku": "24-MG02"
-        },
-        {
-          "name": "Luma Analog Watch",
-          "sku": "24-WG09"
-        },
-        {
-          "name": "Bolo Sport Watch",
-          "sku": "24-WG01"
-        },
-        {
-          "name": "Clamber Watch",
-          "sku": "24-WG03"
-        },
-        {
-          "name": "Didi Sport Watch",
-          "sku": "24-WG02"
         },
         {
-          "name": "Stellar Solar Jacket",
-          "sku": "WJ01"
         },
         {
-          "name": "Josie Yoga Jacket",
-          "sku": "WJ02"
         },
         {
-          "name": "Augusta Pullover Jacket",
-          "sku": "WJ03"
         },
         {
-          "name": "Ingrid Running Jacket",
-          "sku": "WJ04"
         },
         {
-          "name": "Riona Full Zip Jacket",
-          "sku": "WJ05"
         },
         {
-          "name": "Juno Jacket",
-          "sku": "WJ06"
         },
         {
-          "name": "Inez Full Zip Jacket",
-          "sku": "WJ07"
         },
         {
-          "name": "Adrienne Trek Jacket",
-          "sku": "WJ08"
         },
         {
-          "name": "Jade Yoga Jacket",
-          "sku": "WJ09"
         },
         {
-          "name": "Nadia Elements Shell",
-          "sku": "WJ10"
         },
         {
-          "name": "Neve Studio Dance Jacket",
-          "sku": "WJ11"
         },
         {
-          "name": "Olivia 1/4 Zip Light Jacket",
-          "sku": "WJ12"
         },
         {
-          "name": "Chaz Kangeroo Hoodie",
-          "sku": "MH01"
         },
         {
-          "name": "Teton Pullover Hoodie",
-          "sku": "MH02"
         },
         {
-          "name": "Bruno Compete Hoodie",
-          "sku": "MH03"
         },
         {
-          "name": "Frankie  Sweatshirt",
-          "sku": "MH04"
         },
         {
-          "name": "Hollister Backyard Sweatshirt",
-          "sku": "MH05"
-        },
-        {
-          "name": "Stark Fundamental Hoodie",
-          "sku": "MH06"
-        },
-        {
-          "name": "Hero Hoodie",
-          "sku": "MH07"
-        },
-        {
-          "name": "Oslo Trek Hoodie",
-          "sku": "MH08"
         }
       ]
     }
@@ -739,236 +485,94 @@
     "pool": {
       "product_name": [
         {
-          "name": "Joust Duffle Bag",
-          "sku": "24-MB01"
-        },
-        {
-          "name": "Strive Shoulder Pack",
-          "sku": "24-MB04"
-        },
-        {
-          "name": "Crown Summit Backpack",
-          "sku": "24-MB03"
-        },
-        {
-          "name": "Wayfarer Messenger Bag",
-          "sku": "24-MB05"
-        },
-        {
-          "name": "Rival Field Messenger",
-          "sku": "24-MB06"
-        },
-        {
-          "name": "Fusion Backpack",
-          "sku": "24-MB02"
-        },
-        {
-          "name": "Impulse Duffle",
-          "sku": "24-UB02"
-        },
-        {
-          "name": "Voyage Yoga Bag",
-          "sku": "24-WB01"
-        },
-        {
-          "name": "Compete Track Tote",
-          "sku": "24-WB02"
-        },
-        {
-          "name": "Savvy Shoulder Tote",
-          "sku": "24-WB05"
-        },
-        {
-          "name": "Endeavor Daytrip Backpack",
-          "sku": "24-WB06"
-        },
-        {
-          "name": "Driven Backpack",
-          "sku": "24-WB03"
-        },
-        {
-          "name": "Overnight Duffle",
-          "sku": "24-WB07"
-        },
-        {
-          "name": "Push It Messenger Bag",
-          "sku": "24-WB04"
-        },
-        {
-          "name": "Affirm Water Bottle",
-          "sku": "24-UG06"
-        },
-        {
-          "name": "Dual Handle Cardio Ball",
-          "sku": "24-UG07"
-        },
-        {
-          "name": "Zing Jump Rope",
-          "sku": "24-UG04"
-        },
-        {
-          "name": "Pursuit Lumaflex&trade; Tone Band",
-          "sku": "24-UG02"
-        },
-        {
-          "name": "Go-Get'r Pushup Grips",
-          "sku": "24-UG05"
-        },
-        {
-          "name": "Quest Lumaflex&trade; Band",
-          "sku": "24-UG01"
-        },
-        {
-          "name": "Sprite Foam Yoga Brick",
-          "sku": "24-WG084"
-        },
-        {
-          "name": "Sprite Foam Roller",
-          "sku": "24-WG088"
-        },
-        {
-          "name": "Harmony Lumaflex&trade; Strength Band Kit",
-          "sku": "24-UG03"
-        },
-        {
-          "name": "Sprite Stasis Ball 55 cm",
-          "sku": "24-WG081-gray"
-        },
-        {
-          "name": "Sprite Stasis Ball 65 cm",
-          "sku": "24-WG082-gray"
-        },
-        {
-          "name": "Sprite Stasis Ball 75 cm",
-          "sku": "24-WG083-gray"
-        },
-        {
-          "name": "Sprite Yoga Strap 6 foot",
-          "sku": "24-WG085"
-        },
-        {
-          "name": "Sprite Yoga Strap 8 foot",
-          "sku": "24-WG086"
-        },
-        {
-          "name": "Sprite Yoga Strap 10 foot",
-          "sku": "24-WG087"
-        },
-        {
-          "name": "Aim Analog Watch",
-          "sku": "24-MG04"
-        },
-        {
-          "name": "Endurance Watch",
-          "sku": "24-MG01"
-        },
-        {
-          "name": "Summit Watch",
-          "sku": "24-MG03"
-        },
-        {
-          "name": "Cruise Dual Analog Watch",
-          "sku": "24-MG05"
-        },
-        {
-          "name": "Dash Digital Watch",
-          "sku": "24-MG02"
-        },
-        {
-          "name": "Luma Analog Watch",
-          "sku": "24-WG09"
-        },
-        {
-          "name": "Bolo Sport Watch",
-          "sku": "24-WG01"
-        },
-        {
-          "name": "Clamber Watch",
-          "sku": "24-WG03"
-        },
-        {
-          "name": "Didi Sport Watch",
-          "sku": "24-WG02"
-        },
-        {
-          "name": "Stellar Solar Jacket",
-          "sku": "WJ01"
-        },
-        {
-          "name": "Josie Yoga Jacket",
-          "sku": "WJ02"
-        },
-        {
-          "name": "Augusta Pullover Jacket",
-          "sku": "WJ03"
         },
         {
-          "name": "Ingrid Running Jacket",
-          "sku": "WJ04"
         },
         {
-          "name": "Riona Full Zip Jacket",
-          "sku": "WJ05"
         },
         {
-          "name": "Juno Jacket",
-          "sku": "WJ06"
         },
         {
-          "name": "Inez Full Zip Jacket",
-          "sku": "WJ07"
         },
         {
-          "name": "Adrienne Trek Jacket",
-          "sku": "WJ08"
         },
         {
-          "name": "Jade Yoga Jacket",
-          "sku": "WJ09"
         },
         {
-          "name": "Nadia Elements Shell",
-          "sku": "WJ10"
         },
         {
-          "name": "Neve Studio Dance Jacket",
-          "sku": "WJ11"
         },
         {
-          "name": "Olivia 1/4 Zip Light Jacket",
-          "sku": "WJ12"
         },
         {
-          "name": "Chaz Kangeroo Hoodie",
-          "sku": "MH01"
         },
         {
-          "name": "Teton Pullover Hoodie",
-          "sku": "MH02"
         },
         {
-          "name": "Bruno Compete Hoodie",
-          "sku": "MH03"
         },
         {
-          "name": "Frankie  Sweatshirt",
-          "sku": "MH04"
         },
         {
-          "name": "Hollister Backyard Sweatshirt",
-          "sku": "MH05"
         },
         {
-          "name": "Stark Fundamental Hoodie",
-          "sku": "MH06"
         },
         {
-          "name": "Hero Hoodie",
-          "sku": "MH07"
         },
         {
-          "name": "Oslo Trek Hoodie",
-          "sku": "MH08"
         }
       ]
     }

     "generated_at": "2026-04-08",
     "source": {
       "categories": "GET /rest/V1/categories/list (live EC2, port 7780)",
+      "products": "HTML scrape of search results page on live EC2 store (port 7770) \u2014 product_id is the Magento entity ID used in add-to-cart forms; sku is PROD-{product_id} as the store REST API is auth-gated",
       "forums": "HTML scrape of /forums page (live EC2, port 9999) + HTTP 200 verification per slug",
       "wikipedia": "Well-known Wikipedia titles \u2014 verified by grader at runtime via HEAD /wikipedia_en.../A/{slug}",
       "admin_skus": "Generated (HAR-TEST-NNN namespace, no collision with existing catalog)",
     "grader_matching_notes": {
       "template_1": "category_id stored for grader; category_name is what appears in task string",
       "template_2": "expected_slug stored for grader (verifies HTTP 200); display title is in task string",
+      "template_3": "product_id stored for grader (checks POST /checkout/cart/add + cart probe); product name is in task string for HTML search flow",
       "template_4": "forum_name must exist and return posts; no exact value matching needed",
       "template_5": "title is free-form generated; grader only checks post was created in that forum",
+      "template_6": "product_id stored for grader; name is in task string; checkout grader checks REST guest-cart stages OR HTML checkout flow",
       "template_7": "sku+price are exact \u2014 grader calls GET /rest/V1/products/{sku} to verify creation"
     }
   },
     ],
     "pool": {
       "category_name": [
         {
           "name": "Bags",
           "category_id": 4
         },
         {
+          "name": "Backpack",
+          "category_id": 4
         },
         {
           "name": "Jackets",
+          "category_id": 11
         },
         {
+          "name": "Hoodies",
+          "category_id": 9
         },
         {
           "name": "Pants",
+          "category_id": 13
         },
         {
+          "name": "Shoes",
+          "category_id": 3
         },
         {
+          "name": "Boots",
+          "category_id": 3
         },
         {
+          "name": "Slippers",
+          "category_id": 3
         }
       ]
     }
     "pool": {
       "product_name": [
         {
+          "name": "Camera Backpack Bagsmar DSLR Waterproof",
+          "sku": "PROD-89940",
+          "product_id": 89940
         },
         {
+          "name": "Totes Womens Cold Weather Boots Nicole",
+          "sku": "PROD-29409",
+          "product_id": 29409
         },
         {
+          "name": "Totes Womens Snow Boots Jami Lace Up",
+          "sku": "PROD-83651",
+          "product_id": 83651
         },
         {
+          "name": "Noldares Flannel Jacket For Men Plaid",
+          "sku": "PROD-59237",
+          "product_id": 59237
         },
         {
+          "name": "Inesver Womens Leather Jacket Open Front",
+          "sku": "PROD-30743",
+          "product_id": 30743
         },
         {
+          "name": "Womens Corduroy Coat Plaid Hoodie Long Jacket",
+          "sku": "PROD-13227",
+          "product_id": 13227
         },
         {
+          "name": "Womens Fleece Jacket With Hood Winter",
+          "sku": "PROD-60773",
+          "product_id": 60773
         },
         {
+          "name": "Champion Hoodie Big And Tall Zip Up",
+          "sku": "PROD-64850",
+          "product_id": 64850
         },
         {
+          "name": "Matching Couples Hoodie Set",
+          "sku": "PROD-60915",
+          "product_id": 60915
         },
         {
+          "name": "Mens Novelty 3D Printed Pullover Hoodie",
+          "sku": "PROD-62228",
+          "product_id": 62228
         },
         {
+          "name": "Mens Slim Fit Cargo Pants Athletic",
+          "sku": "PROD-65987",
+          "product_id": 65987
         },
         {
+          "name": "Mens Ripstop Cargo Pants Tactical Hiking",
+          "sku": "PROD-10245",
+          "product_id": 10245
         },
         {
+          "name": "Womens Flowy Boho Harem Pants Yoga",
+          "sku": "PROD-64374",
+          "product_id": 64374
         },
         {
+          "name": "Womens High Waist Harem Pants Stripe",
+          "sku": "PROD-61333",
+          "product_id": 61333
         },
         {
+          "name": "Shoeslocker Womens Cozy Memory Foam Slippers",
+          "sku": "PROD-94779",
+          "product_id": 94779
         },
         {
+          "name": "Mens Canvas Korean Fashion Casual Shoes",
+          "sku": "PROD-60868",
+          "product_id": 60868
         },
         {
+          "name": "Unisex Diving Shoes Ultralight Anti Slip",
+          "sku": "PROD-12364",
+          "product_id": 12364
         },
         {
+          "name": "Womens Loafers Fashion Retro Single Shoes",
+          "sku": "PROD-63738",
+          "product_id": 63738
         }
       ]
     }
     "pool": {
       "product_name": [
         {
+          "name": "Camera Backpack Bagsmar DSLR Waterproof",
+          "sku": "PROD-89940",
+          "product_id": 89940
         },
         {
+          "name": "Totes Womens Cold Weather Boots Nicole",
+          "sku": "PROD-29409",
+          "product_id": 29409
         },
         {
+          "name": "Totes Womens Snow Boots Jami Lace Up",
+          "sku": "PROD-83651",
+          "product_id": 83651
         },
         {
+          "name": "Noldares Flannel Jacket For Men Plaid",
+          "sku": "PROD-59237",
+          "product_id": 59237
         },
         {
+          "name": "Inesver Womens Leather Jacket Open Front",
+          "sku": "PROD-30743",
+          "product_id": 30743
         },
         {
+          "name": "Womens Corduroy Coat Plaid Hoodie Long Jacket",
+          "sku": "PROD-13227",
+          "product_id": 13227
         },
         {
+          "name": "Womens Fleece Jacket With Hood Winter",
+          "sku": "PROD-60773",
+          "product_id": 60773
         },
         {
+          "name": "Champion Hoodie Big And Tall Zip Up",
+          "sku": "PROD-64850",
+          "product_id": 64850
         },
         {
+          "name": "Matching Couples Hoodie Set",
+          "sku": "PROD-60915",
+          "product_id": 60915
         },
         {
+          "name": "Mens Novelty 3D Printed Pullover Hoodie",
+          "sku": "PROD-62228",
+          "product_id": 62228
         },
         {
+          "name": "Mens Slim Fit Cargo Pants Athletic",
+          "sku": "PROD-65987",
+          "product_id": 65987
         },
         {
+          "name": "Mens Ripstop Cargo Pants Tactical Hiking",
+          "sku": "PROD-10245",
+          "product_id": 10245
         },
         {
+          "name": "Womens Flowy Boho Harem Pants Yoga",
+          "sku": "PROD-64374",
+          "product_id": 64374
         },
         {
+          "name": "Womens High Waist Harem Pants Stripe",
+          "sku": "PROD-61333",
+          "product_id": 61333
         },
         {
+          "name": "Shoeslocker Womens Cozy Memory Foam Slippers",
+          "sku": "PROD-94779",
+          "product_id": 94779
         },
         {
+          "name": "Mens Canvas Korean Fashion Casual Shoes",
+          "sku": "PROD-60868",
+          "product_id": 60868
         },
         {
+          "name": "Unisex Diving Shoes Ultralight Anti Slip",
+          "sku": "PROD-12364",
+          "product_id": 12364
         },
         {
+          "name": "Womens Loafers Fashion Retro Single Shoes",
+          "sku": "PROD-63738",
+          "product_id": 63738
         }
       ]
     }

pyproject.toml CHANGED Viewed

@@ -16,6 +16,8 @@ dependencies = [
     "rank-bm25>=0.2.2",
     "openai>=1.0.0",
     "numpy>=1.24.0",
 ]
 [project.optional-dependencies]

     "rank-bm25>=0.2.2",
     "openai>=1.0.0",
     "numpy>=1.24.0",
+    "beautifulsoup4>=4.14.3",
+    "lxml>=6.0.2",
 ]
 [project.optional-dependencies]

scripts/inspect_har_endpoints.py ADDED Viewed

	@@ -0,0 +1,240 @@

+#!/usr/bin/env python3
+"""
+inspect_har_endpoints.py
+Runs extract_openapi_spec() on every HAR file in hars/ and prints a full
+summary of discovered endpoints — method, path, status code, auth, and a
+snippet of the request/response body where available.
+Usage:
+    python scripts/inspect_har_endpoints.py [--json]
+Flags:
+    --json   Emit machine-readable JSON instead of the human-readable table
+"""
+from __future__ import annotations
+import json
+import sys
+from pathlib import Path
+# ---------------------------------------------------------------------------
+# Path setup — make the package importable without installing
+# ---------------------------------------------------------------------------
+REPO_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(REPO_ROOT))
+from server.tools.browser_agent import extract_openapi_spec  # noqa: E402
+# ---------------------------------------------------------------------------
+# HAR files to inspect
+# ---------------------------------------------------------------------------
+HARS_DIR = REPO_ROOT / "hars"
+HAR_FILES = {
+    "shopping":       HARS_DIR / "shopping.har",
+    "shopping_admin": HARS_DIR / "shopping_admin.har",
+    "forum":          HARS_DIR / "forum.har",
+    "wikipedia":      HARS_DIR / "wikipedia.har",
+}
+# Fake base URLs — only used for pass-through in extract_openapi_spec
+APP_BASE_URLS = {
+    "shopping":       "http://localhost:7770",
+    "shopping_admin": "http://localhost:7780",
+    "forum":          "http://localhost:9999",
+    "wikipedia":      "http://localhost:8888",
+}
+# ---------------------------------------------------------------------------
+# Pretty-print helpers
+# ---------------------------------------------------------------------------
+_COL_W = 80
+def _hr(char: str = "─") -> None:
+    print(char * _COL_W)
+def _body_snippet(value) -> str | None:
+    if value is None:
+        return None
+    if isinstance(value, str):
+        snippet = value[:120]
+    else:
+        snippet = json.dumps(value)[:120]
+    return snippet + ("…" if len(str(snippet)) >= 120 else "")
+def _print_entry(idx: int, entry: dict) -> None:
+    auth_flag = "🔐 AUTH" if entry["auth_observed"] else "open"
+    print(f"  [{idx:>3}] {entry['method']:<7} {entry['path']}")
+    print(f"         status={entry['status_code']}  ct={entry['response_content_type'] or '—'}  {auth_flag}")
+    if entry.get("query_params"):
+        print(f"         query: {entry['query_params'][:100]}")
+    req_snippet = _body_snippet(entry.get("request_body"))
+    if req_snippet:
+        print(f"         req_body: {req_snippet}")
+    resp_snippet = _body_snippet(entry.get("response_body_sample"))
+    if resp_snippet:
+        print(f"         resp_sample: {resp_snippet}")
+def _method_counts(entries: list[dict]) -> dict[str, int]:
+    counts: dict[str, int] = {}
+    for e in entries:
+        counts[e["method"]] = counts.get(e["method"], 0) + 1
+    return dict(sorted(counts.items()))
+def print_app_summary(app_name: str, entries: list[dict], raw_total: int | None = None) -> None:
+    _hr("═")
+    header = f"  APP: {app_name.upper()}   ({len(entries)} unique API endpoints"
+    if raw_total is not None:
+        header += f" extracted from {raw_total} raw HAR entries"
+    header += ")"
+    print(header)
+    counts = _method_counts(entries)
+    print(f"  Methods: {counts}")
+    auth_count = sum(1 for e in entries if e["auth_observed"])
+    print(f"  Auth-required endpoints: {auth_count}/{len(entries)}")
+    _hr()
+    if not entries:
+        print("  (no API-like entries survived filtering)")
+    for i, entry in enumerate(entries, 1):
+        _print_entry(i, entry)
+    print()
+# ---------------------------------------------------------------------------
+# JSON mode
+# ---------------------------------------------------------------------------
+def emit_json(results: dict) -> None:
+    # Convert to a JSON-safe structure
+    output = {}
+    for app_name, entries in results.items():
+        output[app_name] = {
+            "total": len(entries),
+            "method_counts": _method_counts(entries),
+            "endpoints": entries,
+        }
+    print(json.dumps(output, indent=2))
+# ---------------------------------------------------------------------------
+# Verification / assertion checks
+# ---------------------------------------------------------------------------
+# NOTE: These HAR files are sparse — each was recorded for a narrow task
+# scenario, not as a full API crawl.  The vast majority of HAR entries are
+# static assets (/static/ prefix) that the extractor correctly filters out.
+# Thresholds below reflect the actual usable API surface in each file.
+SANITY_CHECKS: dict[str, dict] = {
+    "shopping": {
+        "min_endpoints": 1,
+        "expected_methods": {"GET"},
+        "note": "Sparse HAR — only checkout success page recorded; "
+                "213 total entries but 212 are /static/ assets.",
+    },
+    "shopping_admin": {
+        "min_endpoints": 2,
+        "expected_methods": {"GET", "POST"},
+        "note": "Sparse HAR — product save/edit + MUI JSON endpoint; "
+                "353 total entries but 350 are /static/ assets.",
+    },
+    "forum": {
+        "min_endpoints": 2,
+        "expected_methods": {"GET", "POST"},
+        "note": "Sparse HAR — one POST submission + one forum thread GET; "
+                "24 total entries but 22 are .js build files.",
+    },
+    "wikipedia": {
+        "min_endpoints": 0,
+        "expected_methods": set(),
+        "note": "Sparse HAR — only an article HTML page + /-/mw/ style/JS assets; "
+                "no XHR/REST traffic recorded.",
+    },
+}
+def run_checks(results: dict) -> bool:
+    print("\n" + "─" * _COL_W)
+    print("SANITY CHECKS  (thresholds calibrated to actual HAR content)")
+    print("─" * _COL_W)
+    all_passed = True
+    for app_name, checks in SANITY_CHECKS.items():
+        entries = results.get(app_name, [])
+        methods_found = set(e["method"] for e in entries)
+        n = len(entries)
+        min_ok = n >= checks["min_endpoints"]
+        exp = checks["expected_methods"]
+        methods_ok = exp.issubset(methods_found) if exp else True
+        status = "PASS" if (min_ok and methods_ok) else "FAIL"
+        if status == "FAIL":
+            all_passed = False
+        print(f"  {status}  {app_name}")
+        print(f"       endpoints : {n} (min={checks['min_endpoints']})  {'✓' if min_ok else '✗'}")
+        if exp:
+            print(f"       methods   : {sorted(methods_found)} "
+                  f"(expected ⊇ {sorted(exp)})  {'✓' if methods_ok else '✗'}")
+        print(f"       note      : {checks['note']}")
+    print("─" * _COL_W)
+    print("Overall:", "ALL PASSED ✓" if all_passed else "SOME FAILED ✗")
+    return all_passed
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+def main() -> int:
+    emit_json_mode = "--json" in sys.argv
+    results: dict[str, list[dict]] = {}
+    raw_totals: dict[str, int] = {}
+    missing: list[str] = []
+    for app_name, har_path in HAR_FILES.items():
+        if not har_path.exists():
+            print(f"[WARN] HAR not found: {har_path}", file=sys.stderr)
+            missing.append(app_name)
+            results[app_name] = []
+            raw_totals[app_name] = 0
+            continue
+        with open(har_path) as f:
+            har_data = json.load(f)
+        raw_totals[app_name] = len(har_data.get("log", {}).get("entries", []))
+        entries = extract_openapi_spec(har_data, APP_BASE_URLS[app_name])
+        results[app_name] = entries
+    if emit_json_mode:
+        emit_json(results)
+        return 0
+    # Human-readable output
+    for app_name, entries in results.items():
+        print_app_summary(app_name, entries, raw_totals.get(app_name))
+    passed = run_checks(results)
+    if missing:
+        print(f"\n[WARN] Missing HAR files for: {', '.join(missing)}")
+    return 0 if passed else 1
+if __name__ == "__main__":
+    sys.exit(main())

server/judge.py CHANGED Viewed

@@ -140,23 +140,38 @@ def _get_curl_steps(episode: Episode):
 def grade_template_1(episode: Episode, task: Task) -> float:
     """Easy — Shopping: List products in category {category_name}"""
     category_name = task.params.get("category_name", "")
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200:
             body = cp.response_body
             if isinstance(body, dict) and "items" in body:
                 items = body["items"]
                 if len(items) > 0:
-                    # Check if any item mentions the category
                     for item in items:
                         if _item_matches_category(item, category_name):
                             return 1.0
-                    # Items returned but can't verify category — partial
                     return 0.3
-            # Also check if it's a raw list
             if isinstance(body, list) and len(body) > 0:
                 return 0.3
     return 0.0
@@ -220,14 +235,14 @@ def grade_template_3(episode: Episode, task: Task) -> float:
     """Medium — Shopping: Add {product_name} to a guest cart"""
     product_name = task.params.get("product_name", "")
     sku = task.params.get("sku")
-    # Primary: check if add-to-cart responded with item_id
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200:
             body = cp.response_body
             if isinstance(body, dict) and "item_id" in body:
-                # Verify the sku if we have it
                 if sku and body.get("sku") == sku:
                     return 1.0
                 if _fuzzy_match(str(body.get("name", "")), product_name):
@@ -235,7 +250,29 @@ def grade_template_3(episode: Episode, task: Task) -> float:
                 if body.get("item_id"):
                     return 1.0
-    # Try live probe
     cart_id = _extract_cart_id(episode)
     if cart_id:
         probe = _judge_probe(f"/rest/V1/guest-carts/{cart_id}", task.base_url)
@@ -247,13 +284,13 @@ def grade_template_3(episode: Episode, task: Task) -> float:
                 if _fuzzy_match(str(item.get("name", "")), product_name):
                     return 1.0
             if len(items) == 0:
-                return 0.2  # cart created, item not added
-    # Partial: cart was created
     if cart_id:
         return 0.2
-    # Partial: attempted cart creation
     if any("guest-carts" in (s.curl_parsed.path or "") and
            s.curl_parsed.method == "POST"
            for s in _get_curl_steps(episode)):
@@ -424,7 +461,7 @@ def grade_template_6(episode: Episode, task: Task) -> float:
 def _extract_admin_token(episode: Episode) -> str | None:
-    """Find admin bearer token from episode trajectory."""
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200 and "integration/admin/token" in cp.path:
@@ -434,6 +471,49 @@ def _extract_admin_token(episode: Episode) -> str | None:
     return None
 def _attempted_product_creation(episode: Episode, sku: str) -> bool:
     """Check if the model attempted to create a product with this SKU."""
     for step in _get_curl_steps(episode):
@@ -666,12 +746,15 @@ def evaluate(episode: Episode) -> EpisodeResult:
     task_score = grader(episode, task)
     param_score = verify_parameter_sourcing(episode, task)
-    auth_obtained = _check_forum_auth(episode) or bool(_extract_admin_token(episode))
     # Compute reward
     reward = _score_to_reward(task_score, template_id)
-    # Bonus for auth obtained even on task failure
     if task_score < 0.5 and auth_obtained:
         reward = max(reward, AUTH_BONUS)

 def grade_template_1(episode: Episode, task: Task) -> float:
     """Easy — Shopping: List products in category {category_name}"""
     category_name = task.params.get("category_name", "")
+    category_lower = category_name.lower()
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200:
             body = cp.response_body
+            # REST API JSON response (ideal path: /rest/V1/products)
             if isinstance(body, dict) and "items" in body:
                 items = body["items"]
                 if len(items) > 0:
                     for item in items:
                         if _item_matches_category(item, category_name):
                             return 1.0
                     return 0.3
+            # Raw list
             if isinstance(body, list) and len(body) > 0:
                 return 0.3
+            # Distilled HTML page (from html_distiller) — check for search results page
+            # that contains product forms.  page_type/forms/text are the distiller's keys.
+            if isinstance(body, dict) and "page_type" in body:
+                forms = body.get("forms", [])
+                text = body.get("text", "") or ""
+                title = (body.get("title") or "").lower()
+                # A search/category results page has multiple POST add-to-cart forms
+                product_forms = [f for f in forms if f.get("method") == "POST"
+                                 and "product" in f.get("fields", {})]
+                if product_forms:
+                    # Check that the page is about the requested category
+                    if category_lower in title or category_lower in text.lower():
+                        return 1.0
+                    # Products listed but category name not verifiable from title — partial
+                    return 0.5
     return 0.0
     """Medium — Shopping: Add {product_name} to a guest cart"""
     product_name = task.params.get("product_name", "")
     sku = task.params.get("sku")
+    product_id = str(task.params.get("product_id", ""))
+    # Primary: REST API — check if add-to-cart responded with item_id
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200:
             body = cp.response_body
             if isinstance(body, dict) and "item_id" in body:
                 if sku and body.get("sku") == sku:
                     return 1.0
                 if _fuzzy_match(str(body.get("name", "")), product_name):
                 if body.get("item_id"):
                     return 1.0
+    # Secondary: HTML form-based add-to-cart (POST to /checkout/cart/add)
+    # A 302 redirect or 200 response from this endpoint means item was accepted
+    for step in _get_curl_steps(episode):
+        cp = step.curl_parsed
+        if cp.method == "POST" and "/checkout/cart/add" in (cp.path or ""):
+            if cp.status_code in (200, 302):
+                # Optionally verify the correct product_id was posted
+                body_str = str(cp.body or "")
+                correct_product = (not product_id) or (product_id in body_str)
+                # Probe cart to confirm item presence
+                probe = _judge_probe("/checkout/cart/", task.base_url)
+                if probe and probe.status_code == 200:
+                    cart_text = (probe.body if isinstance(probe.body, str) else str(probe.body)).lower()
+                    # Cart page mentions product name or has quantity indicators
+                    if product_name.lower()[:15] in cart_text:
+                        return 1.0
+                    if "qty" in cart_text or "quantity" in cart_text or "item" in cart_text:
+                        return 0.8 if correct_product else 0.6
+                # POST succeeded without cart confirmation
+                return 0.7 if correct_product else 0.5
+    # Try live probe via REST guest-cart
     cart_id = _extract_cart_id(episode)
     if cart_id:
         probe = _judge_probe(f"/rest/V1/guest-carts/{cart_id}", task.base_url)
                 if _fuzzy_match(str(item.get("name", "")), product_name):
                     return 1.0
             if len(items) == 0:
+                return 0.2  # cart created, item not added yet
+    # Partial: REST cart was created
     if cart_id:
         return 0.2
+    # Partial: attempted cart creation via REST
     if any("guest-carts" in (s.curl_parsed.path or "") and
            s.curl_parsed.method == "POST"
            for s in _get_curl_steps(episode)):
 def _extract_admin_token(episode: Episode) -> str | None:
+    """Find admin bearer token from shopping-admin trajectory (used by graders)."""
     for step in _get_curl_steps(episode):
         cp = step.curl_parsed
         if cp.status_code == 200 and "integration/admin/token" in cp.path:
     return None
+def _check_any_auth_obtained(episode: Episode) -> bool:
+    """
+    Generic check: did the agent successfully obtain ANY form of authentication?
+    Detects:
+    - Forum/CSRF token authentication
+    - Shopping-admin integration token
+    - Any 200 response returning a bare token string (bearer, user token, API key)
+    - Any 200 response returning a dict with a token field (access_token, id_token, etc.)
+    Application-agnostic — the model discovers auth endpoints via browser_agent /
+    search_endpoints; this simply rewards the intermediate step of obtaining auth.
+    """
+    # Forum/CSRF auth
+    if _check_forum_auth(episode):
+        return True
+    # Shopping admin token
+    if _extract_admin_token(episode):
+        return True
+    # Generic: any successful response that looks like it returned an auth token
+    for step in _get_curl_steps(episode):
+        cp = step.curl_parsed
+        if cp.status_code != 200:
+            continue
+        body = cp.response_body
+        # Plain string token (e.g. Magento user/guest tokens, API keys)
+        if isinstance(body, str):
+            stripped = body.strip().strip('"')
+            if re.fullmatch(r"[A-Za-z0-9+/=_\-\.]{20,}", stripped):
+                return True
+        # Dict with a recognised token field
+        if isinstance(body, dict):
+            for k in ("token", "access_token", "id_token", "auth_token", "bearer"):
+                if k in body and isinstance(body[k], str) and len(body[k]) > 10:
+                    return True
+    return False
 def _attempted_product_creation(episode: Episode, sku: str) -> bool:
     """Check if the model attempted to create a product with this SKU."""
     for step in _get_curl_steps(episode):
     task_score = grader(episode, task)
     param_score = verify_parameter_sourcing(episode, task)
+    auth_obtained = _check_any_auth_obtained(episode)
     # Compute reward
     reward = _score_to_reward(task_score, template_id)
+    # Auth bonus: if the task failed but the agent successfully obtained any form
+    # of authentication (bearer token, session cookie, CSRF token, etc.), floor
+    # the reward at AUTH_BONUS.  This is application-agnostic — obtaining auth is
+    # a useful intermediate skill regardless of the specific task template.
     if task_score < 0.5 and auth_obtained:
         reward = max(reward, AUTH_BONUS)

server/models.py CHANGED Viewed

@@ -75,6 +75,8 @@ REWARD_NEW_PATH = 0.1            # curl path not seen before this episode
 REWARD_CORRECT_PARAM = 0.25      # judge: correct parameter sourcing (applied at end)
 REWARD_SESSION_VALUE = 0.1       # auth token/cookie correctly used
 PENALTY_REPEATED_CALL = -0.15    # exact duplicate curl command
 PENALTY_BROWSER_AGENT_AGAIN = -0.3  # browser_agent called after step 1
 PENALTY_MALFORMED_CURL = -0.1    # curl can't be parsed/executed
 PENALTY_4XX = -0.05              # recoverable HTTP error
@@ -103,14 +105,49 @@ TASK_NAME_TO_TEMPLATE = {
     "har_pipeline_hard": 6,
 }
-TEMPLATE_DESCRIPTIONS = {
-    1: "List products in category {category_name}",
-    2: "Retrieve the Wikipedia article for '{title}'",
-    3: "Add '{product_name}' to a guest cart",
-    4: "Retrieve all posts in the '{forum_category}' forum (you must log in first)",
-    5: "Create a forum post titled '{title}' in the '{category}' forum",
-    6: "Complete a guest checkout for '{product_name}'",
-    7: "Create a new product in the admin panel with SKU '{sku}' and price {price}",
 }
@@ -139,7 +176,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         items = pool.get("category_name", [{"name": "Gear", "category_id": 3}])
         chosen = random.choice(items)
         params = {"category_name": chosen["name"], "category_id": chosen.get("category_id")}
-        description = TEMPLATE_DESCRIPTIONS[1].format(**params)
     elif template_id == 2:
         items = pool.get("title", [{"title": "Python (programming language)", "expected_slug": "Python_(programming_language)"}])
@@ -148,7 +185,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         chosen = random.choice(items)
         title = chosen.get("title", chosen) if isinstance(chosen, dict) else chosen
         params = {"title": title, "expected_slug": chosen.get("expected_slug", title.replace(" ", "_"))}
-        description = TEMPLATE_DESCRIPTIONS[2].format(**params)
     elif template_id == 3:
         items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
@@ -157,8 +194,11 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         chosen = random.choice(items)
         product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
         sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
         params = {"product_name": product_name, "sku": sku}
-        description = TEMPLATE_DESCRIPTIONS[3].format(**params)
     elif template_id == 4:
         items = pool.get("forum_category", [{"slug": "general", "name": "General"}])
@@ -167,7 +207,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         chosen = random.choice(items)
         forum_cat = chosen.get("slug", chosen.get("name", "general")) if isinstance(chosen, dict) else chosen
         params = {"forum_category": forum_cat}
-        description = TEMPLATE_DESCRIPTIONS[4].format(**params)
     elif template_id == 5:
         categories = pool.get("forum_category", [{"slug": "general"}])
@@ -180,7 +220,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         chosen_title = random.choice(titles) if isinstance(titles[0], str) else random.choice(titles).get("title", "Test post")
         forum_cat = chosen_cat.get("slug", "general") if isinstance(chosen_cat, dict) else chosen_cat
         params = {"title": chosen_title, "category": forum_cat}
-        description = TEMPLATE_DESCRIPTIONS[5].format(**params)
     elif template_id == 6:
         items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
@@ -189,8 +229,11 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         chosen = random.choice(items)
         product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
         sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
         params = {"product_name": product_name, "sku": sku}
-        description = TEMPLATE_DESCRIPTIONS[6].format(**params)
     elif template_id == 7:
         items = pool.get("admin_sku", [{"sku": "HAR-TEST-001", "price": "29.99"}])
@@ -200,7 +243,7 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
         sku = chosen.get("sku", "HAR-TEST-001") if isinstance(chosen, dict) else chosen
         price = str(chosen.get("price", "29.99")) if isinstance(chosen, dict) else "29.99"
         params = {"sku": sku, "price": price}
-        description = TEMPLATE_DESCRIPTIONS[7].format(**params)
     else:
         params = {}
@@ -211,6 +254,19 @@ def _sample_task(template_id: int, parameter_pools: dict) -> tuple[str, dict, st
     return description, params, base_url
 # ---------------------------------------------------------------------------
 # Environment
 # ---------------------------------------------------------------------------
@@ -235,6 +291,7 @@ class HARvestGymEnvironment(Environment):
         self._episode_store: dict = {}   # embeddings, BM25 corpus, etc.
         self._called_paths: set = set()  # for new-path reward
         self._last_curl_commands: list = []  # for duplicate detection
         self._step_rewards: list[float] = []
         self._done = False
@@ -297,6 +354,12 @@ class HARvestGymEnvironment(Environment):
         task_name = self._task_name
         if task_name in TASK_NAME_TO_TEMPLATE:
             return TASK_NAME_TO_TEMPLATE[task_name]
         # Try integer
         try:
             tid = int(task_name)
@@ -310,17 +373,30 @@ class HARvestGymEnvironment(Environment):
         """Reset environment: clear episode state, sample new task."""
         from .episode import Episode, Task
-        template_id = self._get_template_id()
-        description, params, base_url = _sample_task(template_id, self._parameter_pools)
-        meta = TEMPLATE_META[template_id]
         self._current_task = Task(
             template_id=template_id,
             description=description,
             params=params,
-            app=meta["app"],
             base_url=base_url,
-            difficulty=meta["tier"],
         )
         self._episode = Episode(task=self._current_task)
@@ -328,6 +404,7 @@ class HARvestGymEnvironment(Environment):
         self._episode_store = {}
         self._called_paths = set()
         self._last_curl_commands = []
         self._step_rewards = []
         self._done = False
         self._state = State(episode_id=str(uuid4()), step_count=0)
@@ -344,8 +421,8 @@ class HARvestGymEnvironment(Environment):
             reward=0.0,
             metadata={
                 "template_id": template_id,
-                "difficulty": meta["tier"],
-                "app": meta["app"],
             },
         )
@@ -397,7 +474,9 @@ class HARvestGymEnvironment(Environment):
                     headers=parsed["headers"],
                     body=parsed["body"],
                     status_code=resp.get("status_code", 0),
-                    response_body=resp.get("body"),
                     response_headers=resp.get("headers", {}),
                 )
             except Exception:
@@ -502,18 +581,33 @@ class HARvestGymEnvironment(Environment):
                 reward += PENALTY_MALFORMED_CURL
             elif 200 <= status < 300:
                 reward += REWARD_VALID_API_CALL
-                # New path bonus
                 from urllib.parse import urlparse
                 from .tools.browser_agent import _normalise_path
                 try:
-                    parsed_for_path = __import__("shlex").split(command)
-                    for t in parsed_for_path:
-                        if t.startswith("http"):
-                            path = _normalise_path(urlparse(t.strip("'\"")).path)
-                            if path and path not in self._called_paths:
-                                self._called_paths.add(path)
-                                reward += REWARD_NEW_PATH
                             break
                 except Exception:
                     pass
             elif 400 <= status < 500:

 REWARD_CORRECT_PARAM = 0.25      # judge: correct parameter sourcing (applied at end)
 REWARD_SESSION_VALUE = 0.1       # auth token/cookie correctly used
 PENALTY_REPEATED_CALL = -0.15    # exact duplicate curl command
+PENALTY_REPEATED_DIFF_PARAM_CALL = -0.05    # duplicate curl but with different parameters
+PENALTY_REPEATED_PATH = -0.15    # same (method, normalised path) called more than once
 PENALTY_BROWSER_AGENT_AGAIN = -0.3  # browser_agent called after step 1
 PENALTY_MALFORMED_CURL = -0.1    # curl can't be parsed/executed
 PENALTY_4XX = -0.05              # recoverable HTTP error
     "har_pipeline_hard": 6,
 }
+TEMPLATE_DESCRIPTIONS: dict[int, list[str]] = {
+    1: [
+        "List products in category {category_name}",
+        "Show all products under the {category_name} category",
+        "Fetch the product listing for the '{category_name}' category",
+        "What products are available in the {category_name} category?",
+    ],
+    2: [
+        "Retrieve the Wikipedia article for '{title}'",
+        "Fetch the Wikipedia page about '{title}'",
+        "Get the Wikipedia entry for '{title}'",
+        "Look up '{title}' on Wikipedia and return the article",
+    ],
+    3: [
+        "Find '{product_name}' in the store and add it to the shopping cart",
+        "Add '{product_name}' to the cart",
+        "Shop for '{product_name}' and put it in the cart",
+        "I want to buy '{product_name}' — add it to my cart",
+    ],
+    4: [
+        "Retrieve all posts in the '{forum_category}' forum (you must log in first)",
+        "Fetch the post list for the '{forum_category}' forum category",
+        "Get all threads in the '{forum_category}' forum section",
+        "List the forum posts under '{forum_category}' (authentication required)",
+    ],
+    5: [
+        "Create a post titled '{title}' in the '{category}' forum. Note: authentication is required.",
+        "Post a new thread called '{title}' in the '{category}' forum",
+        "Submit a forum post with the title '{title}' to the '{category}' section",
+        "Publish '{title}' as a new post in the '{category}' forum",
+    ],
+    6: [
+        "Complete a full guest checkout for '{product_name}'. The checkout involves multiple dependent steps — each step produces a value needed by the next. The task is complete when a confirmed order is placed.",
+        "Place a guest order for '{product_name}'. The process spans several API calls that build on each other; you are done when an order confirmation is received.",
+        "Buy '{product_name}' as a guest user and complete the checkout. Each stage of the checkout requires information returned by the previous stage.",
+        "Finish a guest checkout for '{product_name}'. Work through each step in sequence — the output of every step feeds into the next — until the order is confirmed.",
+    ],
+    7: [
+        "Create a new product in the admin panel with SKU '{sku}' and price {price}. Admin access is required.",
+        "Add a product to the catalog via the admin interface: SKU '{sku}', price {price}",
+        "As an admin, create a new product listing with SKU '{sku}' priced at {price}",
+        "Use admin credentials to create a product with SKU '{sku}' and a price of {price}",
+    ],
 }
         items = pool.get("category_name", [{"name": "Gear", "category_id": 3}])
         chosen = random.choice(items)
         params = {"category_name": chosen["name"], "category_id": chosen.get("category_id")}
+        description = random.choice(TEMPLATE_DESCRIPTIONS[1]).format(**params)
     elif template_id == 2:
         items = pool.get("title", [{"title": "Python (programming language)", "expected_slug": "Python_(programming_language)"}])
         chosen = random.choice(items)
         title = chosen.get("title", chosen) if isinstance(chosen, dict) else chosen
         params = {"title": title, "expected_slug": chosen.get("expected_slug", title.replace(" ", "_"))}
+        description = random.choice(TEMPLATE_DESCRIPTIONS[2]).format(**params)
     elif template_id == 3:
         items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
         chosen = random.choice(items)
         product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
         sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
+        product_id = chosen.get("product_id") if isinstance(chosen, dict) else None
         params = {"product_name": product_name, "sku": sku}
+        if product_id:
+            params["product_id"] = product_id
+        description = random.choice(TEMPLATE_DESCRIPTIONS[3]).format(**params)
     elif template_id == 4:
         items = pool.get("forum_category", [{"slug": "general", "name": "General"}])
         chosen = random.choice(items)
         forum_cat = chosen.get("slug", chosen.get("name", "general")) if isinstance(chosen, dict) else chosen
         params = {"forum_category": forum_cat}
+        description = random.choice(TEMPLATE_DESCRIPTIONS[4]).format(**params)
     elif template_id == 5:
         categories = pool.get("forum_category", [{"slug": "general"}])
         chosen_title = random.choice(titles) if isinstance(titles[0], str) else random.choice(titles).get("title", "Test post")
         forum_cat = chosen_cat.get("slug", "general") if isinstance(chosen_cat, dict) else chosen_cat
         params = {"title": chosen_title, "category": forum_cat}
+        description = random.choice(TEMPLATE_DESCRIPTIONS[5]).format(**params)
     elif template_id == 6:
         items = pool.get("product_name", [{"name": "Radiant Tee", "sku": "MH01"}])
         chosen = random.choice(items)
         product_name = chosen.get("name", chosen) if isinstance(chosen, dict) else chosen
         sku = chosen.get("sku", "") if isinstance(chosen, dict) else ""
+        product_id = chosen.get("product_id") if isinstance(chosen, dict) else None
         params = {"product_name": product_name, "sku": sku}
+        if product_id:
+            params["product_id"] = product_id
+        description = random.choice(TEMPLATE_DESCRIPTIONS[6]).format(**params)
     elif template_id == 7:
         items = pool.get("admin_sku", [{"sku": "HAR-TEST-001", "price": "29.99"}])
         sku = chosen.get("sku", "HAR-TEST-001") if isinstance(chosen, dict) else chosen
         price = str(chosen.get("price", "29.99")) if isinstance(chosen, dict) else "29.99"
         params = {"sku": sku, "price": price}
+        description = random.choice(TEMPLATE_DESCRIPTIONS[7]).format(**params)
     else:
         params = {}
     return description, params, base_url
+def _load_fixed_task_from_env() -> dict | None:
+    """Load an exact task specification when the caller wants deterministic reset()."""
+    raw = os.environ.get("HARVGYM_TASK_SPEC_JSON", "").strip()
+    if not raw:
+        return None
+    try:
+        parsed = json.loads(raw)
+    except json.JSONDecodeError:
+        print("[HARvestGym] Ignoring invalid HARVGYM_TASK_SPEC_JSON", flush=True)
+        return None
+    return parsed if isinstance(parsed, dict) else None
 # ---------------------------------------------------------------------------
 # Environment
 # ---------------------------------------------------------------------------
         self._episode_store: dict = {}   # embeddings, BM25 corpus, etc.
         self._called_paths: set = set()  # for new-path reward
         self._last_curl_commands: list = []  # for duplicate detection
+        self._called_methods_paths: list[tuple[str, str]] = []  # for same-path penalty
         self._step_rewards: list[float] = []
         self._done = False
         task_name = self._task_name
         if task_name in TASK_NAME_TO_TEMPLATE:
             return TASK_NAME_TO_TEMPLATE[task_name]
+        if task_name.startswith("easy_"):
+            return 1
+        if task_name.startswith("medium_"):
+            return 3
+        if task_name.startswith("hard_"):
+            return 6
         # Try integer
         try:
             tid = int(task_name)
         """Reset environment: clear episode state, sample new task."""
         from .episode import Episode, Task
+        fixed_task = _load_fixed_task_from_env()
+        if fixed_task:
+            template_id = int(fixed_task.get("template_id", self._get_template_id()))
+            meta = TEMPLATE_META.get(template_id, TEMPLATE_META[self._get_template_id()])
+            params = dict(fixed_task.get("params") or {})
+            description = fixed_task.get("description") or TEMPLATE_DESCRIPTIONS[template_id].format(**params)
+            base_url = fixed_task.get("base_url") or f"http://{EC2_HOST}:{meta['base_url_port']}/"
+            difficulty = fixed_task.get("difficulty") or meta["tier"]
+            app = fixed_task.get("app") or meta["app"]
+        else:
+            template_id = self._get_template_id()
+            description, params, base_url = _sample_task(template_id, self._parameter_pools)
+            meta = TEMPLATE_META[template_id]
+            difficulty = meta["tier"]
+            app = meta["app"]
         self._current_task = Task(
             template_id=template_id,
             description=description,
             params=params,
+            app=app,
             base_url=base_url,
+            difficulty=difficulty,
         )
         self._episode = Episode(task=self._current_task)
         self._episode_store = {}
         self._called_paths = set()
         self._last_curl_commands = []
+        self._called_methods_paths = []
         self._step_rewards = []
         self._done = False
         self._state = State(episode_id=str(uuid4()), step_count=0)
             reward=0.0,
             metadata={
                 "template_id": template_id,
+                "difficulty": difficulty,
+                "app": app,
             },
         )
                     headers=parsed["headers"],
                     body=parsed["body"],
                     status_code=resp.get("status_code", 0),
+                    # Use _judge_body (full structured body) for judge grading;
+                    # falls back to body (truncated) if not present
+                    response_body=resp.get("_judge_body", resp.get("body")),
                     response_headers=resp.get("headers", {}),
                 )
             except Exception:
                 reward += PENALTY_MALFORMED_CURL
             elif 200 <= status < 300:
                 reward += REWARD_VALID_API_CALL
+                # New path bonus + same-path penalty
                 from urllib.parse import urlparse
                 from .tools.browser_agent import _normalise_path
                 try:
+                    import shlex as _shlex
+                    # Extract HTTP method (-X flag or infer from data flags)
+                    _tokens = _shlex.split(command)
+                    _method = "GET"
+                    for _i, _tok in enumerate(_tokens):
+                        if _tok in ("-X", "--request") and _i + 1 < len(_tokens):
+                            _method = _tokens[_i + 1].upper()
+                            break
+                    if _method == "GET" and any(t in command for t in ("-d ", "--data", "-F ")):
+                        _method = "POST"
+                    _norm_path = None
+                    for _t in _tokens:
+                        if _t.startswith("http"):
+                            _norm_path = _normalise_path(urlparse(_t.strip("'\"")).path)
                             break
+                    if _norm_path:
+                        _mp = (_method, _norm_path)
+                        if _mp in self._called_methods_paths:
+                            reward += PENALTY_REPEATED_PATH
+                        self._called_methods_paths.append(_mp)
+                        if _norm_path not in self._called_paths:
+                            self._called_paths.add(_norm_path)
+                            reward += REWARD_NEW_PATH
                 except Exception:
                     pass
             elif 400 <= status < 500:

server/tools/browser_agent.py CHANGED Viewed

@@ -7,24 +7,17 @@ calls, REST endpoints, form submissions), and builds embeddings via the
 HuggingFace Inference API for semantic search_endpoints().
 Architecture:
   - Embeddings are cached on disk via embed_cache.py (max 2000 entries).
-    On the first run for an app, the API is called once.  All subsequent
-    runs (and every episode within a training run) are pure cache hits —
-    zero API cost.
-  - Source priority:
-      1. HAR file (primary) — endpoints observed from browser traffic.
-         If HAR has < HAR_MIN_ENTRIES meaningful endpoints, it is a partial
-         recording and we augment with the API catalog (see below).
-      2. API catalog (fallback) — full structured spec extracted from source
-         code.  Used ONLY when the HAR is sparse.  This is equivalent to
-         the "live browser session" described in BROWSER_AGENT.md §Stage 2.
-  - The catalog is ALSO used by the judge for parameter-sourcing grading.
-    It serves double duty, but the two uses are completely independent:
-    the judge compares tool call parameters against catalog ground truth,
-    while the agent uses catalog entries as a search corpus when HAR alone
-    is insufficient.
 """
 from __future__ import annotations
@@ -43,11 +36,6 @@ import numpy as np
 # ---------------------------------------------------------------------------
 HARS_DIR = Path(__file__).parent.parent.parent / "hars"
-CATALOGS_DIR = Path(__file__).parent.parent.parent / "catalogs"
-# If a HAR yields fewer than this many unique endpoints it is considered a
-# partial recording and the API catalog is used to fill in the rest.
-HAR_MIN_ENTRIES = 10
 HAR_MAP: dict[str, str] = {
     ":7770": "shopping.har",
@@ -140,6 +128,11 @@ def _is_api_like(path: str, method: str, resp_ct: str, req_ct: str) -> bool:
     return False
 def _normalise_path(path: str) -> str:
     for pattern, replacement in _ID_PATTERNS:
         path = pattern.sub(replacement, path)
@@ -195,9 +188,14 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
     """
     Extract an OpenAPI-like spec from HAR data.
-    Includes: REST calls, XHR/fetch, form POSTs, any JSON-responding GET.
     Excludes: static assets (JS/CSS/images/fonts), analytics, CDN.
     """
     entries = har_data.get("log", {}).get("entries", [])
     seen: set[str] = set()
     spec_entries = []
@@ -219,7 +217,10 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
         parsed_url = urlparse(raw_url)
         path = parsed_url.path
-        if not _is_api_like(path, method, resp_ct, req_ct):
             continue
         path_norm = _normalise_path(path)
@@ -233,16 +234,56 @@ def extract_openapi_spec(har_data: dict, app_base_url: str) -> list[dict]:
             for h in req.get("headers", [])
         )
-        spec_entries.append({
-            "method": method,
-            "path": path_norm,
-            "query_params": parsed_url.query or None,
-            "request_body": _extract_body(req),
-            "status_code": resp.get("status", 0),
-            "response_content_type": resp_ct,
-            "response_body_sample": _truncate_response_sample(resp),
-            "auth_observed": has_auth,
-        })
     return spec_entries
@@ -255,16 +296,26 @@ def spec_entry_to_text(entry: dict, app_name: str) -> str:
         f"status: {entry['status_code']}",
         f"auth: {'required' if entry['auth_observed'] else 'none'}",
     ]
-    if entry.get("query_params"):
-        parts.append(f"query: {entry['query_params']}")
-    if entry.get("request_body"):
-        body = entry["request_body"]
-        body_str = json.dumps(body)[:_BODY_SAMPLE_CHARS] if not isinstance(body, str) else body[:_BODY_SAMPLE_CHARS]
-        parts.append(f"body: {body_str}")
-    if entry.get("response_body_sample") is not None:
-        rsp = entry["response_body_sample"]
-        rsp_str = json.dumps(rsp)[:_BODY_SAMPLE_CHARS] if not isinstance(rsp, str) else str(rsp)[:_BODY_SAMPLE_CHARS]
-        parts.append(f"response_sample: {rsp_str}")
     return " | ".join(parts)
@@ -385,57 +436,9 @@ def embed_query_via_api(query: str) -> np.ndarray | None:
     return _embed_with_cache([query])
-def catalog_to_spec_entries(app_name: str) -> list[dict]:
-    """
-    Load the API catalog as spec entries.
-    Used ONLY when the HAR yields fewer than HAR_MIN_ENTRIES endpoints
-    (i.e. it is a partial/stub recording).  This is equivalent to the
-    live-browser-session fallback described in BROWSER_AGENT.md §Stage 2.
-    The judge uses the same catalog for parameter-sourcing grading, but
-    the two uses are independent — the agent's search corpus and the
-    judge's ground-truth are different concepts that happen to share the
-    same underlying data file.
-    """
-    catalog_path = CATALOGS_DIR / f"{app_name}.json"
-    if not catalog_path.exists():
-        return []
-    try:
-        with open(catalog_path) as f:
-            data = json.load(f)
-        endpoints = data if isinstance(data, list) else data.get("endpoints", [])
-        spec_entries = []
-        for ep in endpoints:
-            endpoint_str = ep.get("endpoint", "")
-            if endpoint_str and " " in endpoint_str:
-                method, path = endpoint_str.split(" ", 1)
-                method = method.upper()
-            else:
-                path = ep.get("path", endpoint_str)
-                method = ep.get("method", "GET").upper()
-            if not path:
-                continue
-            auth = ep.get("auth", ep.get("authentication", "none"))
-            spec_entries.append({
-                "method": method,
-                "path": path,
-                "query_params": None,
-                "request_body": ep.get("body_params") or ep.get("body"),
-                "status_code": 200,
-                "response_content_type": "application/json",
-                "response_body_sample": ep.get("response_fields") or ep.get("response_sample"),
-                "auth_observed": auth not in ("none", "None", None, ""),
-            })
-        return spec_entries
-    except Exception as e:
-        print(f"[browser_agent] Could not load catalog '{app_name}': {e}", flush=True)
-        return []
 def build_endpoint_embeddings(spec_entries: list[dict], app_name: str):
     """
-    Build embeddings for all spec entries (HAR-extracted + catalog fallback).
     Returns (embeddings_array, text_chunks).
     Embeddings are retrieved from or saved to the persistent cache.
     """
@@ -512,22 +515,6 @@ def run_browser_agent(task: str, url: str, episode_store=None) -> dict:
         flush=True,
     )
-    # Augment with catalog when HAR is a partial recording
-    # (The catalog = source-code-extracted API spec; serves the same role as a
-    #  live browser session when no full HAR is available.)
-    if len(spec_entries) < HAR_MIN_ENTRIES:
-        catalog_entries = catalog_to_spec_entries(app_name)
-        if catalog_entries:
-            print(
-                f"[browser_agent] HAR sparse ({len(spec_entries)} entries < {HAR_MIN_ENTRIES}), "
-                f"augmenting from catalog ({len(catalog_entries)} entries)",
-                flush=True,
-            )
-            har_paths = {e["path"] for e in spec_entries}
-            for ce in catalog_entries:
-                if ce["path"] not in har_paths:
-                    spec_entries.append(ce)
     # Build / retrieve embeddings via cache
     if spec_entries and episode_store is not None:
         try:
@@ -547,14 +534,19 @@ def run_browser_agent(task: str, url: str, episode_store=None) -> dict:
         _store_empty(episode_store, app_name)
     summary = [{"method": e["method"], "path": e["path"]} for e in spec_entries]
     return {
         "app": app_name,
         "endpoints": summary,
         "total_endpoints": len(summary),
         "note": (
-            f"Discovered {len(summary)} API endpoints from recorded traffic. "
-            "Use search_endpoints(query) to get full schema, parameters, and auth "
-            "details for any endpoint."
         ),
     }

 HuggingFace Inference API for semantic search_endpoints().
 Architecture:
+  - The HAR file is the sole source of the agent's API knowledge.
+    The agent discovers endpoints only from what was recorded in the HAR.
+    If the HAR is sparse, the browser agent recording needs to be improved —
+    the product does not patch this by injecting other data sources.
+  - The API catalog (catalogs/*.json) is used exclusively by the judge
+    for parameter-sourcing grading.  It plays no role in the training loop.
   - Embeddings are cached on disk via embed_cache.py (max 2000 entries).
+    First run: calls HF Inference API.  All subsequent episodes in the same
+    training run are pure cache hits — zero API cost.
 """
 from __future__ import annotations
 # ---------------------------------------------------------------------------
 HARS_DIR = Path(__file__).parent.parent.parent / "hars"
 HAR_MAP: dict[str, str] = {
     ":7770": "shopping.har",
     return False
+def _is_html_page(method: str, resp_ct: str) -> bool:
+    """Return True for HTML GET responses that may contain SSR data."""
+    return method == "GET" and "text/html" in resp_ct
 def _normalise_path(path: str) -> str:
     for pattern, replacement in _ID_PATTERNS:
         path = pattern.sub(replacement, path)
     """
     Extract an OpenAPI-like spec from HAR data.
+    Includes:
+      - REST calls, XHR/fetch, form POSTs, any JSON-responding GET
+      - HTML GET pages that have a non-empty response body (distilled via html_distiller)
     Excludes: static assets (JS/CSS/images/fonts), analytics, CDN.
     """
+    from .html_distiller import distill_html
     entries = har_data.get("log", {}).get("entries", [])
     seen: set[str] = set()
     spec_entries = []
         parsed_url = urlparse(raw_url)
         path = parsed_url.path
+        is_html = _is_html_page(method, resp_ct)
+        is_api = _is_api_like(path, method, resp_ct, req_ct)
+        if not is_api and not is_html:
             continue
         path_norm = _normalise_path(path)
             for h in req.get("headers", [])
         )
+        if is_html:
+            # Attempt to distil the HTML body captured in the HAR
+            html_body = entry.get("response", {}).get("content", {}).get("text", "") or ""
+            if not html_body:
+                # HAR was recorded without "Save response body" — still include the
+                # page as a stub so the agent knows the route exists
+                distilled = None
+                distilled_summary = None
+            else:
+                distilled = distill_html(html_body, base_url=raw_url)
+                # Build a short summary for the spec text (used for embeddings)
+                blob_count = len(distilled.get("data_blobs", []))
+                form_count = len(distilled.get("forms", []))
+                blob_keys = []
+                for b in distilled.get("data_blobs", [])[:3]:
+                    blob_keys.extend(b.get("keys", [])[:5])
+                distilled_summary = {
+                    "page_type": distilled.get("page_type"),
+                    "title": distilled.get("title"),
+                    "data_blobs": blob_count,
+                    "forms": form_count,
+                    "blob_top_keys": blob_keys[:20],
+                    "text_preview": (distilled.get("text") or "")[:200],
+                }
+            spec_entries.append({
+                "method": method,
+                "path": path_norm,
+                "query_params": parsed_url.query or None,
+                "request_body": None,
+                "status_code": resp.get("status", 0),
+                "response_content_type": resp_ct,
+                "response_body_sample": distilled_summary,
+                "auth_observed": has_auth,
+                "is_html_page": True,
+                # Store full distilled dict so the agent can retrieve it via search_endpoints
+                "_distilled": distilled,
+            })
+        else:
+            spec_entries.append({
+                "method": method,
+                "path": path_norm,
+                "query_params": parsed_url.query or None,
+                "request_body": _extract_body(req),
+                "status_code": resp.get("status", 0),
+                "response_content_type": resp_ct,
+                "response_body_sample": _truncate_response_sample(resp),
+                "auth_observed": has_auth,
+                "is_html_page": False,
+            })
     return spec_entries
         f"status: {entry['status_code']}",
         f"auth: {'required' if entry['auth_observed'] else 'none'}",
     ]
+    if entry.get("is_html_page"):
+        parts.append("type: html_page")
+        sample = entry.get("response_body_sample") or {}
+        if sample.get("title"):
+            parts.append(f"title: {sample['title']}")
+        if sample.get("blob_top_keys"):
+            parts.append(f"data_keys: {' '.join(sample['blob_top_keys'][:15])}")
+        if sample.get("text_preview"):
+            parts.append(f"text: {sample['text_preview'][:200]}")
+    else:
+        if entry.get("query_params"):
+            parts.append(f"query: {entry['query_params']}")
+        if entry.get("request_body"):
+            body = entry["request_body"]
+            body_str = json.dumps(body)[:_BODY_SAMPLE_CHARS] if not isinstance(body, str) else body[:_BODY_SAMPLE_CHARS]
+            parts.append(f"body: {body_str}")
+        if entry.get("response_body_sample") is not None:
+            rsp = entry["response_body_sample"]
+            rsp_str = json.dumps(rsp)[:_BODY_SAMPLE_CHARS] if not isinstance(rsp, str) else str(rsp)[:_BODY_SAMPLE_CHARS]
+            parts.append(f"response_sample: {rsp_str}")
     return " | ".join(parts)
     return _embed_with_cache([query])
 def build_endpoint_embeddings(spec_entries: list[dict], app_name: str):
     """
+    Build embeddings for HAR-extracted spec entries.
     Returns (embeddings_array, text_chunks).
     Embeddings are retrieved from or saved to the persistent cache.
     """
         flush=True,
     )
     # Build / retrieve embeddings via cache
     if spec_entries and episode_store is not None:
         try:
         _store_empty(episode_store, app_name)
     summary = [{"method": e["method"], "path": e["path"]} for e in spec_entries]
+    api_count = sum(1 for e in spec_entries if not e.get("is_html_page"))
+    html_count = sum(1 for e in spec_entries if e.get("is_html_page"))
     return {
         "app": app_name,
         "endpoints": summary,
         "total_endpoints": len(summary),
+        "api_endpoints": api_count,
+        "html_pages": html_count,
         "note": (
+            f"Discovered {api_count} API endpoints and {html_count} HTML page(s) "
+            f"from recorded traffic. "
+            "Use search_endpoints(query) to get full schema, parameters, auth details, "
+            "and page content (for HTML pages: embedded data blobs, forms, CSRF tokens)."
         ),
     }

server/tools/curl_exec.py CHANGED Viewed

@@ -375,31 +375,49 @@ def curl_exec(command: str, session_state: dict, episode_store: dict,
     except (json.JSONDecodeError, ValueError):
         parsed_body = body_text
-    # Extract tokens from body
-    _extract_tokens_from_body(parsed_body, session_state)
-    # Index into episode BM25 store BEFORE truncation
     _index_into_episode_store(
         episode_store=episode_store,
         request_body=parsed["body"],
-        response_body=parsed_body,
         url=parsed["url"],
         method=parsed["method"],
         status_code=status_code,
     )
-    # Apply smart truncation
-    if status_code >= 400:
-        # Never truncate errors
-        truncated_body = parsed_body
-    else:
-        body_for_truncation = body_text if isinstance(parsed_body, str) else json.dumps(parsed_body)
-        truncated_body = smart_truncate(body_for_truncation, resp_ct)
     return {
         "status_code": status_code,
         "headers": resp_headers,
         "body": truncated_body,
     }
@@ -410,10 +428,18 @@ def curl_exec(command: str, session_state: dict, episode_store: dict,
 def _index_into_episode_store(episode_store: dict, request_body: Any,
                                response_body: Any, url: str, method: str,
                                status_code: int) -> None:
-    """Index request/response into episode BM25 store for search_episode_data()."""
     if "bm25_corpus" not in episode_store:
         episode_store["bm25_corpus"] = []
         episode_store["bm25_metadata"] = []
     def _to_text(obj: Any) -> str:
         if obj is None:
@@ -422,13 +448,24 @@ def _index_into_episode_store(episode_store: dict, request_body: Any,
             return obj
         return json.dumps(obj)
-    entry_text = f"url: {url} | method: {method} | status: {status_code} | " \
-                 f"request: {_to_text(request_body)} | response: {_to_text(response_body)}"
     episode_store["bm25_corpus"].append(entry_text)
     episode_store["bm25_metadata"].append({
         "url": url,
         "method": method,
         "status_code": status_code,
-        "response_body": response_body,
     })

     except (json.JSONDecodeError, ValueError):
         parsed_body = body_text
+    # Distil HTML responses into structured compact form
+    is_html_response = "text/html" in resp_ct
+    if is_html_response and isinstance(parsed_body, str) and parsed_body:
+        from .html_distiller import distill_html, distill_html_compact
+        distilled = distill_html(parsed_body, base_url=parsed["url"])
+        # Auto-extract form_key from HTML forms into session_state for reuse
+        for form in distilled.get("forms", []):
+            fk = form.get("fields", {}).get("form_key")
+            if fk and fk != "hidden":
+                session_state["form_key"] = fk
+                break
+        # Store the full distilled dict (not raw HTML) for search_episode_data
+        raw_body_for_store = distilled
+        # What we return to the agent is the compact text summary
+        truncated_body: Any = distill_html_compact(parsed_body, base_url=parsed["url"])
+    else:
+        raw_body_for_store = parsed_body
+        # Extract tokens from body (only for non-HTML responses)
+        _extract_tokens_from_body(parsed_body, session_state)
+        # Apply smart truncation
+        if status_code >= 400:
+            truncated_body = parsed_body
+        else:
+            body_for_truncation = body_text if isinstance(parsed_body, str) else json.dumps(parsed_body)
+            truncated_body = smart_truncate(body_for_truncation, resp_ct)
+    # Index into episode BM25 store
     _index_into_episode_store(
         episode_store=episode_store,
         request_body=parsed["body"],
+        response_body=raw_body_for_store,
         url=parsed["url"],
         method=parsed["method"],
         status_code=status_code,
     )
     return {
         "status_code": status_code,
         "headers": resp_headers,
         "body": truncated_body,
+        # _judge_body: full structured body for the judge (not shown to the model)
+        # For HTML: the distilled dict; for JSON/text: same as body
+        "_judge_body": raw_body_for_store,
     }
 def _index_into_episode_store(episode_store: dict, request_body: Any,
                                response_body: Any, url: str, method: str,
                                status_code: int) -> None:
+    """
+    Index request/response into the episode store for search_episode_data().
+    Three parallel structures are maintained:
+      bm25_corpus       — truncated text strings for BM25 / embedding (lean, fast)
+      bm25_metadata     — url/method/status_code per entry (no body, saves memory)
+      episode_raw_bodies — {index: full_untruncated_response_body} for retrieval
+    """
     if "bm25_corpus" not in episode_store:
         episode_store["bm25_corpus"] = []
         episode_store["bm25_metadata"] = []
+        episode_store["episode_raw_bodies"] = {}
     def _to_text(obj: Any) -> str:
         if obj is None:
             return obj
         return json.dumps(obj)
+    # Lean text for BM25 / embedding — cap at 2000 chars so embeddings stay within
+    # the model's token limit without losing the key signal (url + first part of body).
+    # For distilled HTML (stored as a dict), serialize the distilled form — it's already
+    # compact (text content, blob keys, form actions) rather than raw HTML.
+    resp_text = _to_text(response_body)
+    lean_resp = resp_text[:2000] if len(resp_text) > 2000 else resp_text
+    entry_text = (
+        f"url: {url} method: {method} status: {status_code} "
+        f"request: {_to_text(request_body)} response: {lean_resp}"
+    )
+    idx = len(episode_store["bm25_corpus"])
     episode_store["bm25_corpus"].append(entry_text)
     episode_store["bm25_metadata"].append({
         "url": url,
         "method": method,
         "status_code": status_code,
     })
+    # Store full untruncated body keyed by index — never truncated
+    episode_store["episode_raw_bodies"][idx] = response_body

server/tools/html_distiller.py ADDED Viewed

	@@ -0,0 +1,485 @@

+"""
+html_distiller — technology-agnostic HTML distillation for the RL agent.
+Converts an HTML response body into a compact, structured dict that the agent
+and the embedding index can work with.  Raw HTML is never returned as-is —
+it is expensive (200 KB+) and mostly noise (CSS classes, JS bundles, nav chrome).
+What is extracted (in priority order):
+  1. Embedded JSON data blobs  — server-injected structured data that is the
+     *actual payload* for SSR pages:
+       • <script type="application/json">                (Next.js, generic)
+       • <script type="text/x-magento-init">            (Magento 2)
+       • window.__INITIAL_STATE__ = {...}               (Redux-style SSR)
+       • window.__NEXT_DATA__ = {...}                   (Next.js legacy)
+       • window.__nuxt__ = {...} / window.__NUXT__ = {} (Nuxt.js)
+       • <script id="__NEXT_DATA__">                    (Next.js)
+       • Any <script> tag containing only valid JSON
+     These are technology-specific patterns, but the extraction logic is written
+     generically — it looks for the common conventions rather than hardcoding
+     Magento.  A React/Next.js app will be handled by the same code path.
+  2. HTML forms  — discovers new POST endpoints (form.action) and captures
+     auth-critical fields (CSRF tokens, hidden inputs).
+  3. Visible text content  — the human-readable body after stripping all
+     scripts, styles, and nav/header/footer chrome.  Capped at MAX_TEXT_CHARS.
+Output schema (always a dict with the same keys — absent items are None/[]):
+  {
+    "page_type": str,          # "data_page" | "form_page" | "text_page"
+    "title": str | None,       # <title> text
+    "description": str | None, # <meta name="description">
+    "data_blobs": [            # extracted JSON payloads
+      {"source": str, "data": any, "keys": [str]}  # keys = top-level keys
+    ],
+    "forms": [
+      {
+        "action": str,         # endpoint URL (relative or absolute)
+        "method": str,         # GET | POST
+        "fields": {            # name → value (includes hidden inputs)
+          "field_name": "field_value_or_type"
+        }
+      }
+    ],
+    "text": str | None,        # stripped visible text (capped)
+    "raw_truncated": str,      # first RAW_PREVIEW_CHARS of raw HTML (fallback)
+  }
+Usage:
+  from server.tools.html_distiller import distill_html
+  result = distill_html(html_string, base_url="http://example.com/page")
+  # result["data_blobs"] — structured data, e.g. product listings
+  # result["forms"]      — form actions + CSRF tokens
+  # result["text"]       — stripped readable text
+"""
+from __future__ import annotations
+import json
+import re
+from typing import Any
+from urllib.parse import urljoin
+try:
+    from bs4 import BeautifulSoup
+    _BS4_AVAILABLE = True
+except ImportError:
+    _BS4_AVAILABLE = False
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+MAX_TEXT_CHARS = 20000      # max chars of stripped visible text to keep
+MAX_BLOB_KEYS = 40         # max top-level keys to surface from a JSON blob
+MAX_BLOB_DEPTH_PREVIEW = 2 # how many levels of nesting to summarise
+RAW_PREVIEW_CHARS = 1000   # fallback raw HTML preview if BS4 unavailable
+MAX_BLOBS = 10             # max embedded JSON blobs to extract
+MAX_FORMS = 5              # max forms to extract
+MAX_ITEMS_IN_ARRAY = 3     # preview items for large arrays in blobs
+# ---------------------------------------------------------------------------
+# Public entry point
+# ---------------------------------------------------------------------------
+def distill_html(html: str, base_url: str = "") -> dict:
+    """
+    Distil an HTML page into a structured, compact representation.
+    Args:
+        html:      Raw HTML string (may be very large).
+        base_url:  The URL this page was fetched from, used to resolve
+                   relative URLs in form actions.
+    Returns:
+        Distilled dict (see module docstring for schema).
+    """
+    if not html:
+        return _empty_result()
+    if not _BS4_AVAILABLE:
+        return {
+            **_empty_result(),
+            "raw_truncated": html[:RAW_PREVIEW_CHARS],
+            "_note": "beautifulsoup4 not installed; only raw preview returned.",
+        }
+    try:
+        # lxml is faster and more forgiving than html.parser for large pages
+        soup = BeautifulSoup(html, "lxml")
+    except Exception:
+        soup = BeautifulSoup(html, "html.parser")
+    title = _extract_title(soup)
+    description = _extract_meta_description(soup)
+    data_blobs = _extract_data_blobs(soup)
+    forms = _extract_forms(soup, base_url)
+    text = _extract_visible_text(soup)
+    # Determine page_type based on what we found
+    if data_blobs:
+        page_type = "data_page"
+    elif forms:
+        page_type = "form_page"
+    else:
+        page_type = "text_page"
+    return {
+        "page_type": page_type,
+        "title": title,
+        "description": description,
+        "data_blobs": data_blobs,
+        "forms": forms,
+        "text": text,
+        "raw_truncated": html[:RAW_PREVIEW_CHARS],
+    }
+def distill_html_compact(html: str, base_url: str = "") -> str:
+    """
+    Return a compact text representation of the distilled HTML,
+    suitable for returning to the agent in curl_exec responses.
+    Aims for < 3000 chars while preserving all actionable information.
+    """
+    d = distill_html(html, base_url)
+    parts: list[str] = []
+    if d["title"]:
+        parts.append(f"[Page: {d['title']}]")
+    if d["description"]:
+        parts.append(f"[Description: {d['description']}]")
+    if d["data_blobs"]:
+        parts.append(f"[Embedded data — {len(d['data_blobs'])} block(s)]")
+        for i, blob in enumerate(d["data_blobs"]):
+            src = blob.get("source", "?")
+            data = blob.get("data")
+            preview = _compact_blob_preview(data)
+            parts.append(f"  blob[{i}] from <{src}>: {preview}")
+    if d["forms"]:
+        parts.append(f"[Forms — {len(d['forms'])} found]")
+        for form in d["forms"]:
+            action = form["action"] or "(current page)"
+            method = form["method"]
+            fields = form["fields"]
+            # Strip noisy base64-encoded redirect fields; keep actionable fields only
+            _SKIP_FIELDS = {"uenc"}
+            clean_fields = {k: v for k, v in fields.items() if k not in _SKIP_FIELDS}
+            csrf = {k: v for k, v in clean_fields.items()
+                    if "csrf" in k.lower() or "token" in k.lower()
+                    or k.startswith("_") or clean_fields.get(k, "") == "hidden"}
+            field_summary = ", ".join(f"{k}={repr(v)}" for k, v in list(clean_fields.items())[:6])
+            parts.append(f"  {method} {action}")
+            parts.append(f"    fields: {field_summary}")
+            if csrf:
+                parts.append(f"    csrf/hidden: {csrf}")
+    if d["text"]:
+        parts.append(f"[Text content]\n{d['text'][:800]}")
+    result = "\n".join(parts)
+    if not result:
+        # Absolute fallback: raw preview
+        return html[:RAW_PREVIEW_CHARS]
+    return result
+# ---------------------------------------------------------------------------
+# Extraction helpers
+# ---------------------------------------------------------------------------
+def _extract_title(soup) -> str | None:
+    tag = soup.find("title")
+    if tag:
+        return tag.get_text(strip=True) or None
+    return None
+def _extract_meta_description(soup) -> str | None:
+    tag = soup.find("meta", attrs={"name": "description"})
+    if tag and tag.get("content"):
+        return tag["content"].strip() or None
+    return None
+# Patterns for window.X = {...} assignments in inline scripts
+_WINDOW_ASSIGN_RE = re.compile(
+    r'window\.__?([A-Za-z0-9_]+)__?\s*=\s*(\{.*?\}|\[.*?\])',
+    re.DOTALL,
+)
+# Known SSR data script types
+_DATA_SCRIPT_TYPES = {
+    "application/json",
+    "text/x-magento-init",
+    "application/ld+json",     # structured data / schema.org
+}
+# Known SSR script IDs
+_DATA_SCRIPT_IDS = {
+    "__next_data__",
+    "__nuxt__",
+    "initial-state",
+    "redux-state",
+    "app-state",
+    "page-data",
+    "server-data",
+    "bootstrap-data",
+}
+def _try_parse_json(text: str) -> tuple[bool, Any]:
+    """Returns (success, parsed_value)."""
+    text = text.strip()
+    if not text:
+        return False, None
+    try:
+        return True, json.loads(text)
+    except (json.JSONDecodeError, ValueError):
+        return False, None
+def _summarise_json_keys(obj: Any, depth: int = 0) -> list[str]:
+    """Return top-level keys (and one level of nested keys) for a JSON object."""
+    if not isinstance(obj, dict):
+        if isinstance(obj, list) and obj:
+            return _summarise_json_keys(obj[0], depth)
+        return []
+    keys = list(obj.keys())
+    if depth < 1:
+        nested = []
+        for k in keys[:5]:
+            v = obj[k]
+            if isinstance(v, dict):
+                sub = list(v.keys())[:5]
+                nested.append(f"{k}.{{{','.join(sub)}}}")
+            elif isinstance(v, list) and v and isinstance(v[0], dict):
+                sub = list(v[0].keys())[:4]
+                nested.append(f"{k}[].{{{','.join(sub)}}}")
+        return keys + nested
+    return keys
+def _extract_data_blobs(soup) -> list[dict]:
+    """
+    Extract all embedded JSON data blobs from <script> tags and window.X = {...} patterns.
+    """
+    blobs: list[dict] = []
+    seen_sources: set[str] = set()
+    # 1. <script type="..."> tags with known data types
+    for script in soup.find_all("script"):
+        if len(blobs) >= MAX_BLOBS:
+            break
+        script_type = (script.get("type") or "").lower().strip()
+        script_id = (script.get("id") or "").lower().strip()
+        text = script.string or ""
+        source = None
+        if script_type in _DATA_SCRIPT_TYPES:
+            source = script_type
+        elif script_id in _DATA_SCRIPT_IDS:
+            source = f"id={script.get('id')}"
+        elif script_type in ("", "text/javascript", "module"):
+            # Check for window.X = {...} patterns
+            for m in _WINDOW_ASSIGN_RE.finditer(text):
+                var_name = f"window.__{m.group(1)}__"
+                ok, data = _try_parse_json(m.group(2))
+                if ok and isinstance(data, (dict, list)):
+                    source_key = var_name
+                    if source_key not in seen_sources:
+                        seen_sources.add(source_key)
+                        blobs.append({
+                            "source": var_name,
+                            "data": _preview_blob(data),
+                            "keys": _summarise_json_keys(data)[:MAX_BLOB_KEYS],
+                        })
+            continue  # already handled window patterns above
+        else:
+            continue
+        if not text.strip():
+            continue
+        ok, data = _try_parse_json(text)
+        if not ok:
+            continue
+        # Skip tiny or trivially small blobs (no useful data)
+        if isinstance(data, dict) and len(data) <= 1 and not any(
+            isinstance(v, (dict, list)) for v in data.values()
+        ):
+            continue
+        source_key = f"{source}:{script_id or 'anon'}"
+        if source_key in seen_sources:
+            continue
+        seen_sources.add(source_key)
+        blobs.append({
+            "source": source,
+            "data": _preview_blob(data),
+            "keys": _summarise_json_keys(data)[:MAX_BLOB_KEYS],
+        })
+    return blobs
+def _preview_blob(data: Any) -> Any:
+    """
+    Return a compact preview of a JSON blob — large arrays are trimmed,
+    deeply nested objects are summarised.
+    """
+    if isinstance(data, list):
+        if len(data) > MAX_ITEMS_IN_ARRAY:
+            return {
+                "sample": [_preview_blob(item) for item in data[:MAX_ITEMS_IN_ARRAY]],
+                "total": len(data),
+                "_note": f"{len(data)} items total. Use search_episode_data() for specifics.",
+            }
+        return [_preview_blob(item) for item in data]
+    if isinstance(data, dict):
+        result = {}
+        for k, v in list(data.items())[:MAX_BLOB_KEYS]:
+            if isinstance(v, list) and len(v) > MAX_ITEMS_IN_ARRAY:
+                result[k] = {
+                    "sample": [_preview_blob(i) for i in v[:MAX_ITEMS_IN_ARRAY]],
+                    "total": len(v),
+                    "_note": f"{len(v)} items. Use search_episode_data() for specifics.",
+                }
+            elif isinstance(v, dict) and len(v) > 30:
+                # Only collapse very large dicts — preserve small-to-medium ones fully
+                # since they often contain critical IDs (e.g. product option configs)
+                result[k] = {
+                    "_keys": list(v.keys())[:20],
+                    "_note": "large nested object — call search_episode_data() for full content",
+                }
+            else:
+                result[k] = v
+        return result
+    return data
+def _extract_forms(soup, base_url: str) -> list[dict]:
+    """
+    Extract all forms: action URL, method, and all named fields with their values.
+    Hidden inputs (CSRF tokens, form_key, etc.) are included.
+    """
+    forms = []
+    for form in soup.find_all("form")[:MAX_FORMS]:
+        action = form.get("action", "") or ""
+        if base_url and action and not action.startswith("http"):
+            action = urljoin(base_url, action)
+        method = (form.get("method") or "GET").upper()
+        fields: dict[str, str] = {}
+        for inp in form.find_all(["input", "select", "textarea"]):
+            name = inp.get("name")
+            if not name:
+                continue
+            inp_type = (inp.get("type") or "text").lower()
+            value = inp.get("value", "")
+            if inp_type == "hidden":
+                # Hidden inputs: store actual value (CSRF tokens etc.)
+                fields[name] = value
+            elif inp_type in ("submit", "button", "reset"):
+                continue
+            elif inp_type == "checkbox":
+                fields[name] = "checkbox"
+            elif inp_type == "radio":
+                if name not in fields:
+                    fields[name] = "radio"
+            else:
+                # text, email, password, number, etc.
+                fields[name] = inp_type if not value else value
+        forms.append({
+            "action": action,
+            "method": method,
+            "fields": fields,
+        })
+    return forms
+# Tags whose text content is irrelevant noise
+_NOISE_TAGS = {
+    "script", "style", "noscript", "head", "meta", "link",
+    "header", "footer", "nav", "aside",
+    "svg", "path", "symbol",
+    "[document]",
+}
+def _extract_visible_text(soup) -> str | None:
+    """
+    Extract visible text content from the page.
+    Strips scripts, styles, navigation, and other noise.
+    Returns plain text, capped at MAX_TEXT_CHARS.
+    """
+    # Remove noise tags in-place
+    for tag in soup.find_all(_NOISE_TAGS):
+        tag.decompose()
+    # Get text from what's left — use separator so words don't jam together
+    text = soup.get_text(separator=" ", strip=True)
+    # Collapse whitespace
+    text = re.sub(r"\s{2,}", " ", text).strip()
+    if not text:
+        return None
+    return text[:MAX_TEXT_CHARS]
+def _compact_blob_preview(data: Any) -> str:
+    """One-line preview of a JSON blob for the compact text representation."""
+    if data is None:
+        return "null"
+    if isinstance(data, bool):
+        return str(data).lower()
+    if isinstance(data, (int, float)):
+        return str(data)
+    if isinstance(data, str):
+        return repr(data[:80])
+    if isinstance(data, list):
+        total = data.get("total") if isinstance(data, dict) else len(data)
+        sample = data.get("sample") if isinstance(data, dict) else data[:1]
+        if sample:
+            first_keys = list(sample[0].keys())[:4] if isinstance(sample[0], dict) else []
+            return f"array({total} items), first keys: {first_keys}"
+        return f"array({len(data)} items)"
+    if isinstance(data, dict):
+        # If it has a "total" note it's our preview wrapper
+        if "_note" in data and "total" in data:
+            sample = data.get("sample", [])
+            keys = list(sample[0].keys())[:4] if sample and isinstance(sample[0], dict) else []
+            return f"array({data['total']} items), first item keys: {keys}"
+        keys = list(data.keys())[:8]
+        return f"object({len(data)} keys): {keys}"
+    return str(data)[:100]
+def _empty_result() -> dict:
+    return {
+        "page_type": "text_page",
+        "title": None,
+        "description": None,
+        "data_blobs": [],
+        "forms": [],
+        "text": None,
+        "raw_truncated": "",
+    }

server/tools/search_episode_data.py CHANGED Viewed

@@ -1,87 +1,320 @@
 """
-search_episode_data tool — BM25 + semantic search over accumulated episode response data.
-Searches all request/response bodies from prior curl_exec calls in this episode.
 """
 from __future__ import annotations
 import json
 import re
 from typing import Any
-def search_episode_data(query: str, episode_store: dict) -> list[dict]:
-    """
-    Hybrid BM25 + keyword search over episode accumulated response bodies.
-    Args:
-        query: Keyword or natural language query (e.g. "Radiant Tee sku", "_csrf_token")
-        episode_store: Per-episode store containing bm25_corpus and bm25_metadata
-    Returns:
-        Top-5 matching JSON objects from episode history, annotated with step info
     """
-    corpus: list[str] = episode_store.get("bm25_corpus", [])
-    metadata: list[dict] = episode_store.get("bm25_metadata", [])
-    if not corpus:
-        return [{"note": "No episode data yet. Make API calls with curl_exec() first."}]
-    # Try BM25 ranking
     try:
-        from rank_bm25 import BM25Okapi
-        tokenized_corpus = [_tokenize(doc) for doc in corpus]
-        tokenized_query = _tokenize(query)
-        bm25 = BM25Okapi(tokenized_corpus)
-        scores = bm25.get_scores(tokenized_query)
-        # Get top 5 by BM25 score
         import numpy as np
-        top_k = min(5, len(scores))
-        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
-        results = []
-        for idx in top_indices:
-            if scores[idx] > 0:
-                meta = metadata[idx]
-                result = {
-                    "step": idx + 1,
-                    "url": meta.get("url", ""),
-                    "method": meta.get("method", ""),
-                    "status_code": meta.get("status_code", 0),
-                    "data": meta.get("response_body"),
-                }
-                results.append(result)
-        if results:
-            return results
     except ImportError:
         pass
-    except Exception as e:
-        print(f"[search_episode_data] BM25 error: {e}", flush=True)
-    # Fallback: keyword match
-    query_lower = query.lower()
-    query_terms = query_lower.split()
-    results = []
-    for idx, doc in enumerate(corpus):
-        if any(term in doc.lower() for term in query_terms):
-            meta = metadata[idx]
-            results.append({
-                "step": idx + 1,
-                "url": meta.get("url", ""),
-                "method": meta.get("method", ""),
-                "status_code": meta.get("status_code", 0),
-                "data": meta.get("response_body"),
-            })
-    return results[:5] if results else [{"note": f"No results found for: {query}"}]
 def _tokenize(text: str) -> list[str]:
-    """Simple whitespace + punctuation tokenizer for BM25."""
     text = text.lower()
     tokens = re.findall(r"[a-z0-9_\-\.]+", text)
     return tokens if tokens else [""]

 """
+search_episode_data — semantic + BM25 search over accumulated episode API responses.
+Each curl_exec call stores its full, untruncated response body in episode_store under
+``episode_raw_bodies``.  This tool embeds those bodies (via the same HF API used by
+browser_agent) and performs cosine-similarity search against the model's query, falling
+back to BM25 keyword search when embeddings are unavailable.
+Results are returned as compact previews so they fit in the LLM context window:
+- Nested trees (e.g. category trees with children_data) are flattened to id+name pairs.
+- Large item arrays are shown as a short sample with a total-count note.
+- The model can issue more specific queries to drill into any result.
 """
 from __future__ import annotations
 import json
+import os
 import re
 from typing import Any
+# ---------------------------------------------------------------------------
+# Compact preview helpers
+# ---------------------------------------------------------------------------
+def _flatten_tree(obj: Any, id_key: str = "id", name_key: str = "name") -> list[dict]:
+    """Recursively flatten any nested tree structure into [{id, name}] pairs."""
+    results: list[dict] = []
+    if isinstance(obj, dict):
+        if id_key in obj and name_key in obj:
+            results.append({id_key: obj[id_key], name_key: obj[name_key]})
+        for v in obj.values():
+            results.extend(_flatten_tree(v, id_key, name_key))
+    elif isinstance(obj, list):
+        for item in obj:
+            results.extend(_flatten_tree(item, id_key, name_key))
+    return results
+def _compact_preview(response_body: Any, max_items: int = 3) -> dict:
     """
+    Return a compact, context-friendly preview of a response body.
+    - Distilled HTML (has page_type key) → structured summary with forms/products.
+    - Nested trees with children_data → flat {id, name} list.
+    - Lists / items arrays → short sample + total count.
+    - Scalars / errors → returned as-is.
+    - The preview always includes a note showing how many objects exist in total.
+    """
+    if not isinstance(response_body, (dict, list)):
+        return {"value": response_body}
+    # --- distilled HTML page (from html_distiller) ---
+    if isinstance(response_body, dict) and "page_type" in response_body and "forms" in response_body:
+        result: dict = {}
+        if response_body.get("title"):
+            result["page_title"] = response_body["title"]
+        # Forms — most actionable: show action URL, method, and fields (strip base64 uenc)
+        forms = response_body.get("forms", [])
+        if forms:
+            clean_forms = []
+            for form in forms[:8]:
+                fields = {k: v for k, v in form.get("fields", {}).items()
+                          if k not in ("uenc",) and len(str(v)) < 100}
+                clean_forms.append({
+                    "action": form.get("action", ""),
+                    "method": form.get("method", "GET"),
+                    "fields": fields,
+                })
+            result["forms"] = clean_forms
+        # Data blobs — show top-level keys and compact preview of small blobs
+        blobs = response_body.get("data_blobs", [])
+        if blobs:
+            blob_summary = []
+            for blob in blobs[:3]:
+                data = blob.get("data")
+                if isinstance(data, (dict, list)):
+                    s = json.dumps(data)
+                    blob_summary.append({"source": blob.get("source"), "preview": s[:300]})
+                else:
+                    blob_summary.append({"source": blob.get("source"), "keys": blob.get("keys", [])})
+            result["data_blobs"] = blob_summary
+        # Visible text — first 600 chars
+        text = response_body.get("text")
+        if text:
+            result["page_text"] = text[:600]
+        return result
+    # --- nested tree (e.g. category tree) ---
+    if isinstance(response_body, dict) and "children_data" in response_body:
+        flat = _flatten_tree(response_body)
+        sample = flat[:max_items]
+        note = (
+            f"Flattened tree — {len(flat)} total entries. "
+            f"Showing first {len(sample)}. "
+            "Use search_episode_data with a more specific name/id query to find a particular entry."
+        )
+        return {"entries_sample": sample, "total": len(flat), "note": note}
+    # --- top-level list ---
+    if isinstance(response_body, list):
+        total = len(response_body)
+        sample = [_pick_key_fields(i) for i in response_body[:max_items]]
+        note = (
+            f"{total} item(s) total. Showing first {len(sample)}. "
+            "Refine your search_episode_data query to find a specific item."
+        ) if total > max_items else f"{total} item(s)."
+        return {"items_sample": sample, "total": total, "note": note}
+    # --- dict with an "items" array (common paginated response) ---
+    if isinstance(response_body, dict) and "items" in response_body:
+        items = response_body.get("items", [])
+        total = response_body.get("total_count", len(items))
+        sample = [_pick_key_fields(i) for i in items[:max_items]]
+        note = (
+            f"{total} item(s) total. Showing first {len(sample)}. "
+            "Refine your search_episode_data query to find a specific item."
+        ) if len(items) > max_items else f"{len(items)} item(s)."
+        result = dict(response_body)
+        result["items"] = sample
+        result["_preview_note"] = note
+        result["total_count"] = total
+        return result
+    # --- plain dict — return as-is (usually already small) ---
+    return response_body
+def _pick_key_fields(item: Any) -> Any:
+    """For list items, keep only the most useful fields to reduce context size."""
+    if not isinstance(item, dict):
+        return item
+    KEEP = {"id", "sku", "name", "price", "category_id", "title", "slug",
+            "item_id", "quote_id", "qty", "status", "order_id", "email",
+            "username", "token", "cartId", "cart_id"}
+    kept = {k: v for k, v in item.items() if k in KEEP}
+    return kept if kept else item  # fallback: return full item if no key fields match
+# ---------------------------------------------------------------------------
+# Text representation for embedding / BM25
+# ---------------------------------------------------------------------------
+def _body_to_search_text(url: str, method: str, status_code: int,
+                          response_body: Any) -> str:
+    """
+    Produce a searchable text string that represents a stored API response.
+    We embed this text so the model can find responses by semantic query.
+    The full body is stored separately (in episode_raw_bodies) for retrieval.
+    """
     try:
+        body_str = json.dumps(response_body) if not isinstance(response_body, str) else response_body
+    except Exception:
+        body_str = str(response_body)
+    # Truncate for embedding (model has 512-token limit; 2000 chars is ~400 tokens)
+    if len(body_str) > 2000:
+        body_str = body_str[:2000]
+    return f"url: {url} method: {method} status: {status_code} response: {body_str}"
+# ---------------------------------------------------------------------------
+# Semantic embedding search
+# ---------------------------------------------------------------------------
+def _get_episode_embeddings(episode_store: dict) -> tuple[Any, list[str]] | None:
+    """
+    Build or retrieve embeddings for all stored episode responses.
+    Returns (embeddings_array, text_list) or None if embeddings unavailable.
+    Embeddings are cached in episode_store["response_embeddings"] after first build.
+    New responses added since last build are embedded incrementally.
+    """
+    try:
         import numpy as np
+        from .browser_agent import _embed_with_cache
+    except ImportError:
+        return None
+    texts: list[str] = episode_store.get("bm25_corpus", [])
+    if not texts:
+        return None
+    cached_embs = episode_store.get("response_embeddings")
+    cached_count = len(cached_embs) if cached_embs is not None else 0
+    if cached_count == len(texts):
+        # All texts already embedded
+        return cached_embs, texts
+    # Embed any new texts added since last call
+    new_texts = texts[cached_count:]
+    new_embs = _embed_with_cache(new_texts)
+    if new_embs is None:
+        return None
+    if cached_embs is not None and len(cached_embs) > 0:
+        combined = np.vstack([cached_embs, new_embs])
+    else:
+        combined = new_embs
+    episode_store["response_embeddings"] = combined
+    return combined, texts
+def _semantic_search(query: str, episode_store: dict,
+                     top_k: int = 5) -> list[int] | None:
+    """
+    Return top_k indices ranked by cosine similarity to the query.
+    Returns None if embeddings are unavailable (fall back to BM25).
+    """
+    try:
+        import numpy as np
+        from .browser_agent import _embed_with_cache
     except ImportError:
+        return None
+    result = _get_episode_embeddings(episode_store)
+    if result is None:
+        return None
+    embs, _ = result
+    query_emb = _embed_with_cache([query])
+    if query_emb is None:
+        return None
+    scores = embs @ query_emb[0]  # dot product = cosine sim (both L2-normalised)
+    top_k = min(top_k, len(scores))
+    return sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
+# ---------------------------------------------------------------------------
+# BM25 fallback
+# ---------------------------------------------------------------------------
+def _bm25_search(query: str, corpus: list[str], top_k: int = 5) -> list[int]:
+    """Return top_k indices by BM25 score, or keyword-match fallback."""
+    try:
+        from rank_bm25 import BM25Okapi
+        import numpy as np
+        tokenized = [_tokenize(doc) for doc in corpus]
+        bm25 = BM25Okapi(tokenized)
+        scores = bm25.get_scores(_tokenize(query))
+        top = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
+        return [i for i in top[:top_k] if scores[i] > 0]
+    except Exception:
         pass
+    # Keyword fallback
+    q_lower = query.lower()
+    terms = q_lower.split()
+    hits = [i for i, doc in enumerate(corpus) if any(t in doc.lower() for t in terms)]
+    return hits[:top_k]
 def _tokenize(text: str) -> list[str]:
     text = text.lower()
     tokens = re.findall(r"[a-z0-9_\-\.]+", text)
     return tokens if tokens else [""]
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def search_episode_data(query: str, episode_store: dict) -> list[dict]:
+    """
+    Semantic + BM25 search over all API responses collected during this episode.
+    Each response is stored in full (untruncated) in the episode store.
+    Results are returned as compact previews so they fit the LLM context window:
+    - Nested trees are flattened to {id, name} pairs with a total-count note.
+    - Large arrays show a short sample with a note like "47 items total".
+    - Use more specific queries to drill into a particular response.
+    Args:
+        query: Natural language or keyword query (e.g. "category id for Pants",
+               "cart id", "SKU for Radiant Tee", "_csrf_token").
+        episode_store: Per-episode mutable store populated by curl_exec.
+    Returns:
+        List of up to 5 matching results, each with:
+          step, url, method, status_code, data (compact preview).
+    """
+    corpus: list[str] = episode_store.get("bm25_corpus", [])
+    metadata: list[dict] = episode_store.get("bm25_metadata", [])
+    if not corpus:
+        return [{"note": "No episode data yet. Make API calls with curl_exec() first."}]
+    # Try semantic search first
+    indices = _semantic_search(query, episode_store, top_k=5)
+    # Fall back to BM25 if semantic unavailable
+    if indices is None:
+        indices = _bm25_search(query, corpus, top_k=5)
+    if not indices:
+        return [{"note": f"No results found for: {query!r}. "
+                         "Try a different query or check your curl_exec call history."}]
+    results = []
+    for idx in indices:
+        if idx >= len(metadata):
+            continue
+        meta = metadata[idx]
+        # Full untruncated body is in episode_raw_bodies; metadata holds it too
+        raw_body = episode_store.get("episode_raw_bodies", {}).get(idx, meta.get("response_body"))
+        results.append({
+            "step": idx + 1,
+            "url": meta.get("url", ""),
+            "method": meta.get("method", ""),
+            "status_code": meta.get("status_code", 0),
+            "data": _compact_preview(raw_body),
+        })
+    return results

uv.lock CHANGED Viewed

@@ -171,6 +171,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/71/cc/18245721fa7747065ab478316c7fea7c74777d07f37ae60db2e84f8172e8/beartype-0.22.9-py3-none-any.whl", hash = "sha256:d16c9bbc61ea14637596c5f6fbff2ee99cbe3573e46a716401734ef50c3060c2", size = 1333658, upload-time = "2025-12-13T06:50:28.266Z" },
 ]
 [[package]]
 name = "brotli"
 version = "1.2.0"
@@ -1328,6 +1341,130 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/81/db/e655086b7f3a705df045bf0933bdd9c2f79bb3c97bfef1384598bb79a217/keyring-25.7.0-py3-none-any.whl", hash = "sha256:be4a0b195f149690c166e850609a477c532ddbfbaed96a404d4e43f8d5e2689f", size = 39160, upload-time = "2025-11-16T16:26:08.402Z" },
 ]
 [[package]]
 name = "markdown-it-py"
 version = "4.0.0"
@@ -1884,7 +2021,9 @@ name = "openenv-harvestgym"
 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
     { name = "fastapi" },
     { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
     { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
     { name = "openai" },
@@ -1907,7 +2046,9 @@ embeddings = [
 [package.metadata]
 requires-dist = [
     { name = "fastapi", specifier = ">=0.100.0" },
     { name = "numpy", specifier = ">=1.24.0" },
     { name = "openai", specifier = ">=1.0.0" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
@@ -3373,6 +3514,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
 ]
 [[package]]
 name = "sse-starlette"
 version = "3.3.4"

     { url = "https://files.pythonhosted.org/packages/71/cc/18245721fa7747065ab478316c7fea7c74777d07f37ae60db2e84f8172e8/beartype-0.22.9-py3-none-any.whl", hash = "sha256:d16c9bbc61ea14637596c5f6fbff2ee99cbe3573e46a716401734ef50c3060c2", size = 1333658, upload-time = "2025-12-13T06:50:28.266Z" },
 ]
+[[package]]
+name = "beautifulsoup4"
+version = "4.14.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "soupsieve" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c3/b0/1c6a16426d389813b48d95e26898aff79abbde42ad353958ad95cc8c9b21/beautifulsoup4-4.14.3.tar.gz", hash = "sha256:6292b1c5186d356bba669ef9f7f051757099565ad9ada5dd630bd9de5fa7fb86", size = 627737, upload-time = "2025-11-30T15:08:26.084Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1a/39/47f9197bdd44df24d67ac8893641e16f386c984a0619ef2ee4c51fbbc019/beautifulsoup4-4.14.3-py3-none-any.whl", hash = "sha256:0918bfe44902e6ad8d57732ba310582e98da931428d231a5ecb9e7c703a735bb", size = 107721, upload-time = "2025-11-30T15:08:24.087Z" },
+]
 [[package]]
 name = "brotli"
 version = "1.2.0"
     { url = "https://files.pythonhosted.org/packages/81/db/e655086b7f3a705df045bf0933bdd9c2f79bb3c97bfef1384598bb79a217/keyring-25.7.0-py3-none-any.whl", hash = "sha256:be4a0b195f149690c166e850609a477c532ddbfbaed96a404d4e43f8d5e2689f", size = 39160, upload-time = "2025-11-16T16:26:08.402Z" },
 ]
+[[package]]
+name = "lxml"
+version = "6.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/aa/88/262177de60548e5a2bfc46ad28232c9e9cbde697bd94132aeb80364675cb/lxml-6.0.2.tar.gz", hash = "sha256:cd79f3367bd74b317dda655dc8fcfa304d9eb6e4fb06b7168c5cf27f96e0cd62", size = 4073426, upload-time = "2025-09-22T04:04:59.287Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/db/8a/f8192a08237ef2fb1b19733f709db88a4c43bc8ab8357f01cb41a27e7f6a/lxml-6.0.2-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e77dd455b9a16bbd2a5036a63ddbd479c19572af81b624e79ef422f929eef388", size = 8590589, upload-time = "2025-09-22T04:00:10.51Z" },
+    { url = "https://files.pythonhosted.org/packages/12/64/27bcd07ae17ff5e5536e8d88f4c7d581b48963817a13de11f3ac3329bfa2/lxml-6.0.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:5d444858b9f07cefff6455b983aea9a67f7462ba1f6cbe4a21e8bf6791bf2153", size = 4629671, upload-time = "2025-09-22T04:00:15.411Z" },
+    { url = "https://files.pythonhosted.org/packages/02/5a/a7d53b3291c324e0b6e48f3c797be63836cc52156ddf8f33cd72aac78866/lxml-6.0.2-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f952dacaa552f3bb8834908dddd500ba7d508e6ea6eb8c52eb2d28f48ca06a31", size = 4999961, upload-time = "2025-09-22T04:00:17.619Z" },
+    { url = "https://files.pythonhosted.org/packages/f5/55/d465e9b89df1761674d8672bb3e4ae2c47033b01ec243964b6e334c6743f/lxml-6.0.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:71695772df6acea9f3c0e59e44ba8ac50c4f125217e84aab21074a1a55e7e5c9", size = 5157087, upload-time = "2025-09-22T04:00:19.868Z" },
+    { url = "https://files.pythonhosted.org/packages/62/38/3073cd7e3e8dfc3ba3c3a139e33bee3a82de2bfb0925714351ad3d255c13/lxml-6.0.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:17f68764f35fd78d7c4cc4ef209a184c38b65440378013d24b8aecd327c3e0c8", size = 5067620, upload-time = "2025-09-22T04:00:21.877Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/d3/1e001588c5e2205637b08985597827d3827dbaaece16348c8822bfe61c29/lxml-6.0.2-cp310-cp310-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:058027e261afed589eddcfe530fcc6f3402d7fd7e89bfd0532df82ebc1563dba", size = 5406664, upload-time = "2025-09-22T04:00:23.714Z" },
+    { url = "https://files.pythonhosted.org/packages/20/cf/cab09478699b003857ed6ebfe95e9fb9fa3d3c25f1353b905c9b73cfb624/lxml-6.0.2-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a8ffaeec5dfea5881d4c9d8913a32d10cfe3923495386106e4a24d45300ef79c", size = 5289397, upload-time = "2025-09-22T04:00:25.544Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/84/02a2d0c38ac9a8b9f9e5e1bbd3f24b3f426044ad618b552e9549ee91bd63/lxml-6.0.2-cp310-cp310-manylinux_2_31_armv7l.whl", hash = "sha256:f2e3b1a6bb38de0bc713edd4d612969dd250ca8b724be8d460001a387507021c", size = 4772178, upload-time = "2025-09-22T04:00:27.602Z" },
+    { url = "https://files.pythonhosted.org/packages/56/87/e1ceadcc031ec4aa605fe95476892d0b0ba3b7f8c7dcdf88fdeff59a9c86/lxml-6.0.2-cp310-cp310-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:d6690ec5ec1cce0385cb20896b16be35247ac8c2046e493d03232f1c2414d321", size = 5358148, upload-time = "2025-09-22T04:00:29.323Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/13/5bb6cf42bb228353fd4ac5f162c6a84fd68a4d6f67c1031c8cf97e131fc6/lxml-6.0.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:f2a50c3c1d11cad0ebebbac357a97b26aa79d2bcaf46f256551152aa85d3a4d1", size = 5112035, upload-time = "2025-09-22T04:00:31.061Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/e2/ea0498552102e59834e297c5c6dff8d8ded3db72ed5e8aad77871476f073/lxml-6.0.2-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:3efe1b21c7801ffa29a1112fab3b0f643628c30472d507f39544fd48e9549e34", size = 4799111, upload-time = "2025-09-22T04:00:33.11Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/9e/8de42b52a73abb8af86c66c969b3b4c2a96567b6ac74637c037d2e3baa60/lxml-6.0.2-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:59c45e125140b2c4b33920d21d83681940ca29f0b83f8629ea1a2196dc8cfe6a", size = 5351662, upload-time = "2025-09-22T04:00:35.237Z" },
+    { url = "https://files.pythonhosted.org/packages/28/a2/de776a573dfb15114509a37351937c367530865edb10a90189d0b4b9b70a/lxml-6.0.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:452b899faa64f1805943ec1c0c9ebeaece01a1af83e130b69cdefeda180bb42c", size = 5314973, upload-time = "2025-09-22T04:00:37.086Z" },
+    { url = "https://files.pythonhosted.org/packages/50/a0/3ae1b1f8964c271b5eec91db2043cf8c6c0bce101ebb2a633b51b044db6c/lxml-6.0.2-cp310-cp310-win32.whl", hash = "sha256:1e786a464c191ca43b133906c6903a7e4d56bef376b75d97ccbb8ec5cf1f0a4b", size = 3611953, upload-time = "2025-09-22T04:00:39.224Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/70/bd42491f0634aad41bdfc1e46f5cff98825fb6185688dc82baa35d509f1a/lxml-6.0.2-cp310-cp310-win_amd64.whl", hash = "sha256:dacf3c64ef3f7440e3167aa4b49aa9e0fb99e0aa4f9ff03795640bf94531bcb0", size = 4032695, upload-time = "2025-09-22T04:00:41.402Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/d0/05c6a72299f54c2c561a6c6cbb2f512e047fca20ea97a05e57931f194ac4/lxml-6.0.2-cp310-cp310-win_arm64.whl", hash = "sha256:45f93e6f75123f88d7f0cfd90f2d05f441b808562bf0bc01070a00f53f5028b5", size = 3680051, upload-time = "2025-09-22T04:00:43.525Z" },
+    { url = "https://files.pythonhosted.org/packages/77/d5/becbe1e2569b474a23f0c672ead8a29ac50b2dc1d5b9de184831bda8d14c/lxml-6.0.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:13e35cbc684aadf05d8711a5d1b5857c92e5e580efa9a0d2be197199c8def607", size = 8634365, upload-time = "2025-09-22T04:00:45.672Z" },
+    { url = "https://files.pythonhosted.org/packages/28/66/1ced58f12e804644426b85d0bb8a4478ca77bc1761455da310505f1a3526/lxml-6.0.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3b1675e096e17c6fe9c0e8c81434f5736c0739ff9ac6123c87c2d452f48fc938", size = 4650793, upload-time = "2025-09-22T04:00:47.783Z" },
+    { url = "https://files.pythonhosted.org/packages/11/84/549098ffea39dfd167e3f174b4ce983d0eed61f9d8d25b7bf2a57c3247fc/lxml-6.0.2-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:8ac6e5811ae2870953390452e3476694196f98d447573234592d30488147404d", size = 4944362, upload-time = "2025-09-22T04:00:49.845Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/bd/f207f16abf9749d2037453d56b643a7471d8fde855a231a12d1e095c4f01/lxml-6.0.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5aa0fc67ae19d7a64c3fe725dc9a1bb11f80e01f78289d05c6f62545affec438", size = 5083152, upload-time = "2025-09-22T04:00:51.709Z" },
+    { url = "https://files.pythonhosted.org/packages/15/ae/bd813e87d8941d52ad5b65071b1affb48da01c4ed3c9c99e40abb266fbff/lxml-6.0.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:de496365750cc472b4e7902a485d3f152ecf57bd3ba03ddd5578ed8ceb4c5964", size = 5023539, upload-time = "2025-09-22T04:00:53.593Z" },
+    { url = "https://files.pythonhosted.org/packages/02/cd/9bfef16bd1d874fbe0cb51afb00329540f30a3283beb9f0780adbb7eec03/lxml-6.0.2-cp311-cp311-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:200069a593c5e40b8f6fc0d84d86d970ba43138c3e68619ffa234bc9bb806a4d", size = 5344853, upload-time = "2025-09-22T04:00:55.524Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/89/ea8f91594bc5dbb879734d35a6f2b0ad50605d7fb419de2b63d4211765cc/lxml-6.0.2-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7d2de809c2ee3b888b59f995625385f74629707c9355e0ff856445cdcae682b7", size = 5225133, upload-time = "2025-09-22T04:00:57.269Z" },
+    { url = "https://files.pythonhosted.org/packages/b9/37/9c735274f5dbec726b2db99b98a43950395ba3d4a1043083dba2ad814170/lxml-6.0.2-cp311-cp311-manylinux_2_31_armv7l.whl", hash = "sha256:b2c3da8d93cf5db60e8858c17684c47d01fee6405e554fb55018dd85fc23b178", size = 4677944, upload-time = "2025-09-22T04:00:59.052Z" },
+    { url = "https://files.pythonhosted.org/packages/20/28/7dfe1ba3475d8bfca3878365075abe002e05d40dfaaeb7ec01b4c587d533/lxml-6.0.2-cp311-cp311-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:442de7530296ef5e188373a1ea5789a46ce90c4847e597856570439621d9c553", size = 5284535, upload-time = "2025-09-22T04:01:01.335Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/cf/5f14bc0de763498fc29510e3532bf2b4b3a1c1d5d0dff2e900c16ba021ef/lxml-6.0.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:2593c77efde7bfea7f6389f1ab249b15ed4aa5bc5cb5131faa3b843c429fbedb", size = 5067343, upload-time = "2025-09-22T04:01:03.13Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/b0/bb8275ab5472f32b28cfbbcc6db7c9d092482d3439ca279d8d6fa02f7025/lxml-6.0.2-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:3e3cb08855967a20f553ff32d147e14329b3ae70ced6edc2f282b94afbc74b2a", size = 4725419, upload-time = "2025-09-22T04:01:05.013Z" },
+    { url = "https://files.pythonhosted.org/packages/25/4c/7c222753bc72edca3b99dbadba1b064209bc8ed4ad448af990e60dcce462/lxml-6.0.2-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:2ed6c667fcbb8c19c6791bbf40b7268ef8ddf5a96940ba9404b9f9a304832f6c", size = 5275008, upload-time = "2025-09-22T04:01:07.327Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/8c/478a0dc6b6ed661451379447cdbec77c05741a75736d97e5b2b729687828/lxml-6.0.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b8f18914faec94132e5b91e69d76a5c1d7b0c73e2489ea8929c4aaa10b76bbf7", size = 5248906, upload-time = "2025-09-22T04:01:09.452Z" },
+    { url = "https://files.pythonhosted.org/packages/2d/d9/5be3a6ab2784cdf9accb0703b65e1b64fcdd9311c9f007630c7db0cfcce1/lxml-6.0.2-cp311-cp311-win32.whl", hash = "sha256:6605c604e6daa9e0d7f0a2137bdc47a2e93b59c60a65466353e37f8272f47c46", size = 3610357, upload-time = "2025-09-22T04:01:11.102Z" },
+    { url = "https://files.pythonhosted.org/packages/e2/7d/ca6fb13349b473d5732fb0ee3eec8f6c80fc0688e76b7d79c1008481bf1f/lxml-6.0.2-cp311-cp311-win_amd64.whl", hash = "sha256:e5867f2651016a3afd8dd2c8238baa66f1e2802f44bc17e236f547ace6647078", size = 4036583, upload-time = "2025-09-22T04:01:12.766Z" },
+    { url = "https://files.pythonhosted.org/packages/ab/a2/51363b5ecd3eab46563645f3a2c3836a2fc67d01a1b87c5017040f39f567/lxml-6.0.2-cp311-cp311-win_arm64.whl", hash = "sha256:4197fb2534ee05fd3e7afaab5d8bfd6c2e186f65ea7f9cd6a82809c887bd1285", size = 3680591, upload-time = "2025-09-22T04:01:14.874Z" },
+    { url = "https://files.pythonhosted.org/packages/f3/c8/8ff2bc6b920c84355146cd1ab7d181bc543b89241cfb1ebee824a7c81457/lxml-6.0.2-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:a59f5448ba2ceccd06995c95ea59a7674a10de0810f2ce90c9006f3cbc044456", size = 8661887, upload-time = "2025-09-22T04:01:17.265Z" },
+    { url = "https://files.pythonhosted.org/packages/37/6f/9aae1008083bb501ef63284220ce81638332f9ccbfa53765b2b7502203cf/lxml-6.0.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:e8113639f3296706fbac34a30813929e29247718e88173ad849f57ca59754924", size = 4667818, upload-time = "2025-09-22T04:01:19.688Z" },
+    { url = "https://files.pythonhosted.org/packages/f1/ca/31fb37f99f37f1536c133476674c10b577e409c0a624384147653e38baf2/lxml-6.0.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:a8bef9b9825fa8bc816a6e641bb67219489229ebc648be422af695f6e7a4fa7f", size = 4950807, upload-time = "2025-09-22T04:01:21.487Z" },
+    { url = "https://files.pythonhosted.org/packages/da/87/f6cb9442e4bada8aab5ae7e1046264f62fdbeaa6e3f6211b93f4c0dd97f1/lxml-6.0.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:65ea18d710fd14e0186c2f973dc60bb52039a275f82d3c44a0e42b43440ea534", size = 5109179, upload-time = "2025-09-22T04:01:23.32Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/20/a7760713e65888db79bbae4f6146a6ae5c04e4a204a3c48896c408cd6ed2/lxml-6.0.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c371aa98126a0d4c739ca93ceffa0fd7a5d732e3ac66a46e74339acd4d334564", size = 5023044, upload-time = "2025-09-22T04:01:25.118Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/b0/7e64e0460fcb36471899f75831509098f3fd7cd02a3833ac517433cb4f8f/lxml-6.0.2-cp312-cp312-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:700efd30c0fa1a3581d80a748157397559396090a51d306ea59a70020223d16f", size = 5359685, upload-time = "2025-09-22T04:01:27.398Z" },
+    { url = "https://files.pythonhosted.org/packages/b9/e1/e5df362e9ca4e2f48ed6411bd4b3a0ae737cc842e96877f5bf9428055ab4/lxml-6.0.2-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c33e66d44fe60e72397b487ee92e01da0d09ba2d66df8eae42d77b6d06e5eba0", size = 5654127, upload-time = "2025-09-22T04:01:29.629Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/d1/232b3309a02d60f11e71857778bfcd4acbdb86c07db8260caf7d008b08f8/lxml-6.0.2-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:90a345bbeaf9d0587a3aaffb7006aa39ccb6ff0e96a57286c0cb2fd1520ea192", size = 5253958, upload-time = "2025-09-22T04:01:31.535Z" },
+    { url = "https://files.pythonhosted.org/packages/35/35/d955a070994725c4f7d80583a96cab9c107c57a125b20bb5f708fe941011/lxml-6.0.2-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:064fdadaf7a21af3ed1dcaa106b854077fbeada827c18f72aec9346847cd65d0", size = 4711541, upload-time = "2025-09-22T04:01:33.801Z" },
+    { url = "https://files.pythonhosted.org/packages/1e/be/667d17363b38a78c4bd63cfd4b4632029fd68d2c2dc81f25ce9eb5224dd5/lxml-6.0.2-cp312-cp312-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fbc74f42c3525ac4ffa4b89cbdd00057b6196bcefe8bce794abd42d33a018092", size = 5267426, upload-time = "2025-09-22T04:01:35.639Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/47/62c70aa4a1c26569bc958c9ca86af2bb4e1f614e8c04fb2989833874f7ae/lxml-6.0.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6ddff43f702905a4e32bc24f3f2e2edfe0f8fde3277d481bffb709a4cced7a1f", size = 5064917, upload-time = "2025-09-22T04:01:37.448Z" },
+    { url = "https://files.pythonhosted.org/packages/bd/55/6ceddaca353ebd0f1908ef712c597f8570cc9c58130dbb89903198e441fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:6da5185951d72e6f5352166e3da7b0dc27aa70bd1090b0eb3f7f7212b53f1bb8", size = 4788795, upload-time = "2025-09-22T04:01:39.165Z" },
+    { url = "https://files.pythonhosted.org/packages/cf/e8/fd63e15da5e3fd4c2146f8bbb3c14e94ab850589beab88e547b2dbce22e1/lxml-6.0.2-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:57a86e1ebb4020a38d295c04fc79603c7899e0df71588043eb218722dabc087f", size = 5676759, upload-time = "2025-09-22T04:01:41.506Z" },
+    { url = "https://files.pythonhosted.org/packages/76/47/b3ec58dc5c374697f5ba37412cd2728f427d056315d124dd4b61da381877/lxml-6.0.2-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2047d8234fe735ab77802ce5f2297e410ff40f5238aec569ad7c8e163d7b19a6", size = 5255666, upload-time = "2025-09-22T04:01:43.363Z" },
+    { url = "https://files.pythonhosted.org/packages/19/93/03ba725df4c3d72afd9596eef4a37a837ce8e4806010569bedfcd2cb68fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:6f91fd2b2ea15a6800c8e24418c0775a1694eefc011392da73bc6cef2623b322", size = 5277989, upload-time = "2025-09-22T04:01:45.215Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/80/c06de80bfce881d0ad738576f243911fccf992687ae09fd80b734712b39c/lxml-6.0.2-cp312-cp312-win32.whl", hash = "sha256:3ae2ce7d6fedfb3414a2b6c5e20b249c4c607f72cb8d2bb7cc9c6ec7c6f4e849", size = 3611456, upload-time = "2025-09-22T04:01:48.243Z" },
+    { url = "https://files.pythonhosted.org/packages/f7/d7/0cdfb6c3e30893463fb3d1e52bc5f5f99684a03c29a0b6b605cfae879cd5/lxml-6.0.2-cp312-cp312-win_amd64.whl", hash = "sha256:72c87e5ee4e58a8354fb9c7c84cbf95a1c8236c127a5d1b7683f04bed8361e1f", size = 4011793, upload-time = "2025-09-22T04:01:50.042Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/7b/93c73c67db235931527301ed3785f849c78991e2e34f3fd9a6663ffda4c5/lxml-6.0.2-cp312-cp312-win_arm64.whl", hash = "sha256:61cb10eeb95570153e0c0e554f58df92ecf5109f75eacad4a95baa709e26c3d6", size = 3672836, upload-time = "2025-09-22T04:01:52.145Z" },
+    { url = "https://files.pythonhosted.org/packages/53/fd/4e8f0540608977aea078bf6d79f128e0e2c2bba8af1acf775c30baa70460/lxml-6.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:9b33d21594afab46f37ae58dfadd06636f154923c4e8a4d754b0127554eb2e77", size = 8648494, upload-time = "2025-09-22T04:01:54.242Z" },
+    { url = "https://files.pythonhosted.org/packages/5d/f4/2a94a3d3dfd6c6b433501b8d470a1960a20ecce93245cf2db1706adf6c19/lxml-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:6c8963287d7a4c5c9a432ff487c52e9c5618667179c18a204bdedb27310f022f", size = 4661146, upload-time = "2025-09-22T04:01:56.282Z" },
+    { url = "https://files.pythonhosted.org/packages/25/2e/4efa677fa6b322013035d38016f6ae859d06cac67437ca7dc708a6af7028/lxml-6.0.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:1941354d92699fb5ffe6ed7b32f9649e43c2feb4b97205f75866f7d21aa91452", size = 4946932, upload-time = "2025-09-22T04:01:58.989Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/0f/526e78a6d38d109fdbaa5049c62e1d32fdd70c75fb61c4eadf3045d3d124/lxml-6.0.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:bb2f6ca0ae2d983ded09357b84af659c954722bbf04dea98030064996d156048", size = 5100060, upload-time = "2025-09-22T04:02:00.812Z" },
+    { url = "https://files.pythonhosted.org/packages/81/76/99de58d81fa702cc0ea7edae4f4640416c2062813a00ff24bd70ac1d9c9b/lxml-6.0.2-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:eb2a12d704f180a902d7fa778c6d71f36ceb7b0d317f34cdc76a5d05aa1dd1df", size = 5019000, upload-time = "2025-09-22T04:02:02.671Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/35/9e57d25482bc9a9882cb0037fdb9cc18f4b79d85df94fa9d2a89562f1d25/lxml-6.0.2-cp313-cp313-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:6ec0e3f745021bfed19c456647f0298d60a24c9ff86d9d051f52b509663feeb1", size = 5348496, upload-time = "2025-09-22T04:02:04.904Z" },
+    { url = "https://files.pythonhosted.org/packages/a6/8e/cb99bd0b83ccc3e8f0f528e9aa1f7a9965dfec08c617070c5db8d63a87ce/lxml-6.0.2-cp313-cp313-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:846ae9a12d54e368933b9759052d6206a9e8b250291109c48e350c1f1f49d916", size = 5643779, upload-time = "2025-09-22T04:02:06.689Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/34/9e591954939276bb679b73773836c6684c22e56d05980e31d52a9a8deb18/lxml-6.0.2-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef9266d2aa545d7374938fb5c484531ef5a2ec7f2d573e62f8ce722c735685fd", size = 5244072, upload-time = "2025-09-22T04:02:08.587Z" },
+    { url = "https://files.pythonhosted.org/packages/8d/27/b29ff065f9aaca443ee377aff699714fcbffb371b4fce5ac4ca759e436d5/lxml-6.0.2-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:4077b7c79f31755df33b795dc12119cb557a0106bfdab0d2c2d97bd3cf3dffa6", size = 4718675, upload-time = "2025-09-22T04:02:10.783Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/9f/f756f9c2cd27caa1a6ef8c32ae47aadea697f5c2c6d07b0dae133c244fbe/lxml-6.0.2-cp313-cp313-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a7c5d5e5f1081955358533be077166ee97ed2571d6a66bdba6ec2f609a715d1a", size = 5255171, upload-time = "2025-09-22T04:02:12.631Z" },
+    { url = "https://files.pythonhosted.org/packages/61/46/bb85ea42d2cb1bd8395484fd72f38e3389611aa496ac7772da9205bbda0e/lxml-6.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:8f8d0cbd0674ee89863a523e6994ac25fd5be9c8486acfc3e5ccea679bad2679", size = 5057175, upload-time = "2025-09-22T04:02:14.718Z" },
+    { url = "https://files.pythonhosted.org/packages/95/0c/443fc476dcc8e41577f0af70458c50fe299a97bb6b7505bb1ae09aa7f9ac/lxml-6.0.2-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:2cbcbf6d6e924c28f04a43f3b6f6e272312a090f269eff68a2982e13e5d57659", size = 4785688, upload-time = "2025-09-22T04:02:16.957Z" },
+    { url = "https://files.pythonhosted.org/packages/48/78/6ef0b359d45bb9697bc5a626e1992fa5d27aa3f8004b137b2314793b50a0/lxml-6.0.2-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:dfb874cfa53340009af6bdd7e54ebc0d21012a60a4e65d927c2e477112e63484", size = 5660655, upload-time = "2025-09-22T04:02:18.815Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/ea/e1d33808f386bc1339d08c0dcada6e4712d4ed8e93fcad5f057070b7988a/lxml-6.0.2-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:fb8dae0b6b8b7f9e96c26fdd8121522ce5de9bb5538010870bd538683d30e9a2", size = 5247695, upload-time = "2025-09-22T04:02:20.593Z" },
+    { url = "https://files.pythonhosted.org/packages/4f/47/eba75dfd8183673725255247a603b4ad606f4ae657b60c6c145b381697da/lxml-6.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:358d9adae670b63e95bc59747c72f4dc97c9ec58881d4627fe0120da0f90d314", size = 5269841, upload-time = "2025-09-22T04:02:22.489Z" },
+    { url = "https://files.pythonhosted.org/packages/76/04/5c5e2b8577bc936e219becb2e98cdb1aca14a4921a12995b9d0c523502ae/lxml-6.0.2-cp313-cp313-win32.whl", hash = "sha256:e8cd2415f372e7e5a789d743d133ae474290a90b9023197fd78f32e2dc6873e2", size = 3610700, upload-time = "2025-09-22T04:02:24.465Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/0a/4643ccc6bb8b143e9f9640aa54e38255f9d3b45feb2cbe7ae2ca47e8782e/lxml-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:b30d46379644fbfc3ab81f8f82ae4de55179414651f110a1514f0b1f8f6cb2d7", size = 4010347, upload-time = "2025-09-22T04:02:26.286Z" },
+    { url = "https://files.pythonhosted.org/packages/31/ef/dcf1d29c3f530577f61e5fe2f1bd72929acf779953668a8a47a479ae6f26/lxml-6.0.2-cp313-cp313-win_arm64.whl", hash = "sha256:13dcecc9946dca97b11b7c40d29fba63b55ab4170d3c0cf8c0c164343b9bfdcf", size = 3671248, upload-time = "2025-09-22T04:02:27.918Z" },
+    { url = "https://files.pythonhosted.org/packages/03/15/d4a377b385ab693ce97b472fe0c77c2b16ec79590e688b3ccc71fba19884/lxml-6.0.2-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:b0c732aa23de8f8aec23f4b580d1e52905ef468afb4abeafd3fec77042abb6fe", size = 8659801, upload-time = "2025-09-22T04:02:30.113Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/e8/c128e37589463668794d503afaeb003987373c5f94d667124ffd8078bbd9/lxml-6.0.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4468e3b83e10e0317a89a33d28f7aeba1caa4d1a6fd457d115dd4ffe90c5931d", size = 4659403, upload-time = "2025-09-22T04:02:32.119Z" },
+    { url = "https://files.pythonhosted.org/packages/00/ce/74903904339decdf7da7847bb5741fc98a5451b42fc419a86c0c13d26fe2/lxml-6.0.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:abd44571493973bad4598a3be7e1d807ed45aa2adaf7ab92ab7c62609569b17d", size = 4966974, upload-time = "2025-09-22T04:02:34.155Z" },
+    { url = "https://files.pythonhosted.org/packages/1f/d3/131dec79ce61c5567fecf82515bd9bc36395df42501b50f7f7f3bd065df0/lxml-6.0.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:370cd78d5855cfbffd57c422851f7d3864e6ae72d0da615fca4dad8c45d375a5", size = 5102953, upload-time = "2025-09-22T04:02:36.054Z" },
+    { url = "https://files.pythonhosted.org/packages/3a/ea/a43ba9bb750d4ffdd885f2cd333572f5bb900cd2408b67fdda07e85978a0/lxml-6.0.2-cp314-cp314-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:901e3b4219fa04ef766885fb40fa516a71662a4c61b80c94d25336b4934b71c0", size = 5055054, upload-time = "2025-09-22T04:02:38.154Z" },
+    { url = "https://files.pythonhosted.org/packages/60/23/6885b451636ae286c34628f70a7ed1fcc759f8d9ad382d132e1c8d3d9bfd/lxml-6.0.2-cp314-cp314-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:a4bf42d2e4cf52c28cc1812d62426b9503cdb0c87a6de81442626aa7d69707ba", size = 5352421, upload-time = "2025-09-22T04:02:40.413Z" },
+    { url = "https://files.pythonhosted.org/packages/48/5b/fc2ddfc94ddbe3eebb8e9af6e3fd65e2feba4967f6a4e9683875c394c2d8/lxml-6.0.2-cp314-cp314-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2c7fdaa4d7c3d886a42534adec7cfac73860b89b4e5298752f60aa5984641a0", size = 5673684, upload-time = "2025-09-22T04:02:42.288Z" },
+    { url = "https://files.pythonhosted.org/packages/29/9c/47293c58cc91769130fbf85531280e8cc7868f7fbb6d92f4670071b9cb3e/lxml-6.0.2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98a5e1660dc7de2200b00d53fa00bcd3c35a3608c305d45a7bbcaf29fa16e83d", size = 5252463, upload-time = "2025-09-22T04:02:44.165Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/da/ba6eceb830c762b48e711ded880d7e3e89fc6c7323e587c36540b6b23c6b/lxml-6.0.2-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:dc051506c30b609238d79eda75ee9cab3e520570ec8219844a72a46020901e37", size = 4698437, upload-time = "2025-09-22T04:02:46.524Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/24/7be3f82cb7990b89118d944b619e53c656c97dc89c28cfb143fdb7cd6f4d/lxml-6.0.2-cp314-cp314-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:8799481bbdd212470d17513a54d568f44416db01250f49449647b5ab5b5dccb9", size = 5269890, upload-time = "2025-09-22T04:02:48.812Z" },
+    { url = "https://files.pythonhosted.org/packages/1b/bd/dcfb9ea1e16c665efd7538fc5d5c34071276ce9220e234217682e7d2c4a5/lxml-6.0.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9261bb77c2dab42f3ecd9103951aeca2c40277701eb7e912c545c1b16e0e4917", size = 5097185, upload-time = "2025-09-22T04:02:50.746Z" },
+    { url = "https://files.pythonhosted.org/packages/21/04/a60b0ff9314736316f28316b694bccbbabe100f8483ad83852d77fc7468e/lxml-6.0.2-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:65ac4a01aba353cfa6d5725b95d7aed6356ddc0a3cd734de00124d285b04b64f", size = 4745895, upload-time = "2025-09-22T04:02:52.968Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/bd/7d54bd1846e5a310d9c715921c5faa71cf5c0853372adf78aee70c8d7aa2/lxml-6.0.2-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:b22a07cbb82fea98f8a2fd814f3d1811ff9ed76d0fc6abc84eb21527596e7cc8", size = 5695246, upload-time = "2025-09-22T04:02:54.798Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/32/5643d6ab947bc371da21323acb2a6e603cedbe71cb4c99c8254289ab6f4e/lxml-6.0.2-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:d759cdd7f3e055d6bc8d9bec3ad905227b2e4c785dc16c372eb5b5e83123f48a", size = 5260797, upload-time = "2025-09-22T04:02:57.058Z" },
+    { url = "https://files.pythonhosted.org/packages/33/da/34c1ec4cff1eea7d0b4cd44af8411806ed943141804ac9c5d565302afb78/lxml-6.0.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:945da35a48d193d27c188037a05fec5492937f66fb1958c24fc761fb9d40d43c", size = 5277404, upload-time = "2025-09-22T04:02:58.966Z" },
+    { url = "https://files.pythonhosted.org/packages/82/57/4eca3e31e54dc89e2c3507e1cd411074a17565fa5ffc437c4ae0a00d439e/lxml-6.0.2-cp314-cp314-win32.whl", hash = "sha256:be3aaa60da67e6153eb15715cc2e19091af5dc75faef8b8a585aea372507384b", size = 3670072, upload-time = "2025-09-22T04:03:38.05Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/e0/c96cf13eccd20c9421ba910304dae0f619724dcf1702864fd59dd386404d/lxml-6.0.2-cp314-cp314-win_amd64.whl", hash = "sha256:fa25afbadead523f7001caf0c2382afd272c315a033a7b06336da2637d92d6ed", size = 4080617, upload-time = "2025-09-22T04:03:39.835Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/5d/b3f03e22b3d38d6f188ef044900a9b29b2fe0aebb94625ce9fe244011d34/lxml-6.0.2-cp314-cp314-win_arm64.whl", hash = "sha256:063eccf89df5b24e361b123e257e437f9e9878f425ee9aae3144c77faf6da6d8", size = 3754930, upload-time = "2025-09-22T04:03:41.565Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/5c/42c2c4c03554580708fc738d13414801f340c04c3eff90d8d2d227145275/lxml-6.0.2-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:6162a86d86893d63084faaf4ff937b3daea233e3682fb4474db07395794fa80d", size = 8910380, upload-time = "2025-09-22T04:03:01.645Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/4f/12df843e3e10d18d468a7557058f8d3733e8b6e12401f30b1ef29360740f/lxml-6.0.2-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:414aaa94e974e23a3e92e7ca5b97d10c0cf37b6481f50911032c69eeb3991bba", size = 4775632, upload-time = "2025-09-22T04:03:03.814Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/0c/9dc31e6c2d0d418483cbcb469d1f5a582a1cd00a1f4081953d44051f3c50/lxml-6.0.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:48461bd21625458dd01e14e2c38dd0aea69addc3c4f960c30d9f59d7f93be601", size = 4975171, upload-time = "2025-09-22T04:03:05.651Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/2b/9b870c6ca24c841bdd887504808f0417aa9d8d564114689266f19ddf29c8/lxml-6.0.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:25fcc59afc57d527cfc78a58f40ab4c9b8fd096a9a3f964d2781ffb6eb33f4ed", size = 5110109, upload-time = "2025-09-22T04:03:07.452Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/0c/4f5f2a4dd319a178912751564471355d9019e220c20d7db3fb8307ed8582/lxml-6.0.2-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5179c60288204e6ddde3f774a93350177e08876eaf3ab78aa3a3649d43eb7d37", size = 5041061, upload-time = "2025-09-22T04:03:09.297Z" },
+    { url = "https://files.pythonhosted.org/packages/12/64/554eed290365267671fe001a20d72d14f468ae4e6acef1e179b039436967/lxml-6.0.2-cp314-cp314t-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:967aab75434de148ec80597b75062d8123cadf2943fb4281f385141e18b21338", size = 5306233, upload-time = "2025-09-22T04:03:11.651Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/31/1d748aa275e71802ad9722df32a7a35034246b42c0ecdd8235412c3396ef/lxml-6.0.2-cp314-cp314t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:d100fcc8930d697c6561156c6810ab4a508fb264c8b6779e6e61e2ed5e7558f9", size = 5604739, upload-time = "2025-09-22T04:03:13.592Z" },
+    { url = "https://files.pythonhosted.org/packages/8f/41/2c11916bcac09ed561adccacceaedd2bf0e0b25b297ea92aab99fd03d0fa/lxml-6.0.2-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2ca59e7e13e5981175b8b3e4ab84d7da57993eeff53c07764dcebda0d0e64ecd", size = 5225119, upload-time = "2025-09-22T04:03:15.408Z" },
+    { url = "https://files.pythonhosted.org/packages/99/05/4e5c2873d8f17aa018e6afde417c80cc5d0c33be4854cce3ef5670c49367/lxml-6.0.2-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:957448ac63a42e2e49531b9d6c0fa449a1970dbc32467aaad46f11545be9af1d", size = 4633665, upload-time = "2025-09-22T04:03:17.262Z" },
+    { url = "https://files.pythonhosted.org/packages/0f/c9/dcc2da1bebd6275cdc723b515f93edf548b82f36a5458cca3578bc899332/lxml-6.0.2-cp314-cp314t-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b7fc49c37f1786284b12af63152fe1d0990722497e2d5817acfe7a877522f9a9", size = 5234997, upload-time = "2025-09-22T04:03:19.14Z" },
+    { url = "https://files.pythonhosted.org/packages/9c/e2/5172e4e7468afca64a37b81dba152fc5d90e30f9c83c7c3213d6a02a5ce4/lxml-6.0.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e19e0643cc936a22e837f79d01a550678da8377d7d801a14487c10c34ee49c7e", size = 5090957, upload-time = "2025-09-22T04:03:21.436Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/b3/15461fd3e5cd4ddcb7938b87fc20b14ab113b92312fc97afe65cd7c85de1/lxml-6.0.2-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:1db01e5cf14345628e0cbe71067204db658e2fb8e51e7f33631f5f4735fefd8d", size = 4764372, upload-time = "2025-09-22T04:03:23.27Z" },
+    { url = "https://files.pythonhosted.org/packages/05/33/f310b987c8bf9e61c4dd8e8035c416bd3230098f5e3cfa69fc4232de7059/lxml-6.0.2-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:875c6b5ab39ad5291588aed6925fac99d0097af0dd62f33c7b43736043d4a2ec", size = 5634653, upload-time = "2025-09-22T04:03:25.767Z" },
+    { url = "https://files.pythonhosted.org/packages/70/ff/51c80e75e0bc9382158133bdcf4e339b5886c6ee2418b5199b3f1a61ed6d/lxml-6.0.2-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:cdcbed9ad19da81c480dfd6dd161886db6096083c9938ead313d94b30aadf272", size = 5233795, upload-time = "2025-09-22T04:03:27.62Z" },
+    { url = "https://files.pythonhosted.org/packages/56/4d/4856e897df0d588789dd844dbed9d91782c4ef0b327f96ce53c807e13128/lxml-6.0.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:80dadc234ebc532e09be1975ff538d154a7fa61ea5031c03d25178855544728f", size = 5257023, upload-time = "2025-09-22T04:03:30.056Z" },
+    { url = "https://files.pythonhosted.org/packages/0f/85/86766dfebfa87bea0ab78e9ff7a4b4b45225df4b4d3b8cc3c03c5cd68464/lxml-6.0.2-cp314-cp314t-win32.whl", hash = "sha256:da08e7bb297b04e893d91087df19638dc7a6bb858a954b0cc2b9f5053c922312", size = 3911420, upload-time = "2025-09-22T04:03:32.198Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/1a/b248b355834c8e32614650b8008c69ffeb0ceb149c793961dd8c0b991bb3/lxml-6.0.2-cp314-cp314t-win_amd64.whl", hash = "sha256:252a22982dca42f6155125ac76d3432e548a7625d56f5a273ee78a5057216eca", size = 4406837, upload-time = "2025-09-22T04:03:34.027Z" },
+    { url = "https://files.pythonhosted.org/packages/92/aa/df863bcc39c5e0946263454aba394de8a9084dbaff8ad143846b0d844739/lxml-6.0.2-cp314-cp314t-win_arm64.whl", hash = "sha256:bb4c1847b303835d89d785a18801a883436cdfd5dc3d62947f9c49e24f0f5a2c", size = 3822205, upload-time = "2025-09-22T04:03:36.249Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/9c/780c9a8fce3f04690b374f72f41306866b0400b9d0fdf3e17aaa37887eed/lxml-6.0.2-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:e748d4cf8fef2526bb2a589a417eba0c8674e29ffcb570ce2ceca44f1e567bf6", size = 3939264, upload-time = "2025-09-22T04:04:32.892Z" },
+    { url = "https://files.pythonhosted.org/packages/f5/5a/1ab260c00adf645d8bf7dec7f920f744b032f69130c681302821d5debea6/lxml-6.0.2-pp310-pypy310_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:4ddb1049fa0579d0cbd00503ad8c58b9ab34d1254c77bc6a5576d96ec7853dba", size = 4216435, upload-time = "2025-09-22T04:04:34.907Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/37/565f3b3d7ffede22874b6d86be1a1763d00f4ea9fc5b9b6ccb11e4ec8612/lxml-6.0.2-pp310-pypy310_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:cb233f9c95f83707dae461b12b720c1af9c28c2d19208e1be03387222151daf5", size = 4325913, upload-time = "2025-09-22T04:04:37.205Z" },
+    { url = "https://files.pythonhosted.org/packages/22/ec/f3a1b169b2fb9d03467e2e3c0c752ea30e993be440a068b125fc7dd248b0/lxml-6.0.2-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bc456d04db0515ce3320d714a1eac7a97774ff0849e7718b492d957da4631dd4", size = 4269357, upload-time = "2025-09-22T04:04:39.322Z" },
+    { url = "https://files.pythonhosted.org/packages/77/a2/585a28fe3e67daa1cf2f06f34490d556d121c25d500b10082a7db96e3bcd/lxml-6.0.2-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2613e67de13d619fd283d58bda40bff0ee07739f624ffee8b13b631abf33083d", size = 4412295, upload-time = "2025-09-22T04:04:41.647Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/d9/a57dd8bcebd7c69386c20263830d4fa72d27e6b72a229ef7a48e88952d9a/lxml-6.0.2-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:24a8e756c982c001ca8d59e87c80c4d9dcd4d9b44a4cbeb8d9be4482c514d41d", size = 3516913, upload-time = "2025-09-22T04:04:43.602Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/11/29d08bc103a62c0eba8016e7ed5aeebbf1e4312e83b0b1648dd203b0e87d/lxml-6.0.2-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:1c06035eafa8404b5cf475bb37a9f6088b0aca288d4ccc9d69389750d5543700", size = 3949829, upload-time = "2025-09-22T04:04:45.608Z" },
+    { url = "https://files.pythonhosted.org/packages/12/b3/52ab9a3b31e5ab8238da241baa19eec44d2ab426532441ee607165aebb52/lxml-6.0.2-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c7d13103045de1bdd6fe5d61802565f1a3537d70cd3abf596aa0af62761921ee", size = 4226277, upload-time = "2025-09-22T04:04:47.754Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/33/1eaf780c1baad88224611df13b1c2a9dfa460b526cacfe769103ff50d845/lxml-6.0.2-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0a3c150a95fbe5ac91de323aa756219ef9cf7fde5a3f00e2281e30f33fa5fa4f", size = 4330433, upload-time = "2025-09-22T04:04:49.907Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/c1/27428a2ff348e994ab4f8777d3a0ad510b6b92d37718e5887d2da99952a2/lxml-6.0.2-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:60fa43be34f78bebb27812ed90f1925ec99560b0fa1decdb7d12b84d857d31e9", size = 4272119, upload-time = "2025-09-22T04:04:51.801Z" },
+    { url = "https://files.pythonhosted.org/packages/f0/d0/3020fa12bcec4ab62f97aab026d57c2f0cfd480a558758d9ca233bb6a79d/lxml-6.0.2-pp311-pypy311_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:21c73b476d3cfe836be731225ec3421fa2f048d84f6df6a8e70433dff1376d5a", size = 4417314, upload-time = "2025-09-22T04:04:55.024Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/77/d7f491cbc05303ac6801651aabeb262d43f319288c1ea96c66b1d2692ff3/lxml-6.0.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:27220da5be049e936c3aca06f174e8827ca6445a4353a1995584311487fc4e3e", size = 3518768, upload-time = "2025-09-22T04:04:57.097Z" },
+]
 [[package]]
 name = "markdown-it-py"
 version = "4.0.0"
 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
+    { name = "beautifulsoup4" },
     { name = "fastapi" },
+    { name = "lxml" },
     { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
     { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
     { name = "openai" },
 [package.metadata]
 requires-dist = [
+    { name = "beautifulsoup4", specifier = ">=4.14.3" },
     { name = "fastapi", specifier = ">=0.100.0" },
+    { name = "lxml", specifier = ">=6.0.2" },
     { name = "numpy", specifier = ">=1.24.0" },
     { name = "openai", specifier = ">=1.0.0" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
     { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
 ]
+[[package]]
+name = "soupsieve"
+version = "2.8.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7b/ae/2d9c981590ed9999a0d91755b47fc74f74de286b0f5cee14c9269041e6c4/soupsieve-2.8.3.tar.gz", hash = "sha256:3267f1eeea4251fb42728b6dfb746edc9acaffc4a45b27e19450b676586e8349", size = 118627, upload-time = "2026-01-20T04:27:02.457Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" },
+]
 [[package]]
 name = "sse-starlette"
 version = "3.3.4"