Spaces:
Sleeping
Ground Truth Extraction
Extract the API catalog from each live WebArena container by connecting to EC2 from Cursor, entering each container, and running Claude Code with the prompts below.
Container names confirmed from the running EC2 instance:
| App | Container name | Image |
|---|---|---|
| Shopping | shopping |
shopping_final_0712 |
| Shopping Admin | shopping_admin |
shopping_admin_final_0719 |
| Forum | forum |
postmill-populated-exposed-withimg |
| Wikipedia | kiwix33 |
ghcr.io/kiwix/kiwix-serve:3.3.0 |
| Map (web) | openstreetmap-website-web-1 |
openstreetmap-website-web |
Skipping GitLab for now (gitlab) - facing some 502 errors.
Wikipedia (Kiwix) serves a static ZIM file β there is no source code to analyze. Its catalog entry is hardcoded at the bottom of this file.
Connection workflow
Cursor β Remote SSH β EC2 host β Dev Containers extension β Attach to Running Container β pick app container β paste prompt into Cursor sidebar
Step 1 β Connect from Cursor to EC2
Cmd+Shift+P β Remote-SSH: Connect to Host β ubuntu@<EC2_IP>
Step 2 β Attach to a container
With the Remote-SSH window open: Cmd+Shift+P β Dev Containers: Attach to Running Container β select the container (e.g. shopping).
Cursor opens a new window with the container's filesystem as the workspace. The full source code is now loaded and indexed β no copying needed.
Step 3 β Paste the prompt into the Cursor sidebar
Open the AI chat sidebar, paste the prompt for that container (see sections below), and run it. Cursor's AI has the full codebase in context and will write api_catalog.json into the workspace (inside the container).
Step 4 β Copy the output back
The file written inside the container can be downloaded via File β Download from Cursor's remote explorer, or via scp from the EC2 host after the fact.
Repeat for each container: shopping, shopping_admin, forum, openstreetmap-website-web-1.
Expected output format
The catalog captures all API surface found in the codebase β REST, GraphQL, WebSocket, form submissions. The api_type field distinguishes them. Do not restrict to only the endpoints used by the 7 training tasks; document everything.
[
{
"api_type": "rest",
"endpoint": "POST /rest/V1/guest-carts/{cartId}/items",
"auth": "none",
"path_params": {
"cartId": {
"type": "string",
"source": "PREV_CALL",
"from_endpoint": "POST /rest/V1/guest-carts",
"from_field": ".body",
"notes": "entire response body is the cartId string"
}
},
"body_params": {
"cartItem.sku": { "type": "string", "source": "PREV_CALL", "from_endpoint": "GET /rest/V1/products", "from_field": ".items[0].sku" },
"cartItem.qty": { "type": "number", "source": "TASK_SPEC" },
"cartItem.quote_id": { "type": "string", "source": "DERIVED", "same_as": "cartId" }
},
"response_key_fields": []
},
{
"api_type": "graphql",
"endpoint": "POST /graphql",
"operation_name": "GetProducts",
"operation_type": "query",
"auth": "none",
"variables": {
"search": { "type": "String", "source": "TASK_SPEC" },
"pageSize": { "type": "Int", "source": "STATIC", "value": 20 }
},
"response_key_fields": [".products.items[].sku", ".products.items[].name"]
},
{
"api_type": "websocket",
"endpoint": "ws:///realtime",
"auth": "session_cookie",
"notes": "describe message protocol; include event names and payload shapes"
},
{
"api_type": "form",
"endpoint": "POST /submission/create",
"auth": "session_cookie+csrf",
"content_type": "application/x-www-form-urlencoded",
"form_params": {
"_token": { "type": "string", "source": "AUTH_FLOW", "notes": "hidden input in page HTML" },
"title": { "type": "string", "source": "TASK_SPEC" },
"url": { "type": "string", "source": "TASK_SPEC", "notes": "optional if body provided" }
},
"response_key_fields": []
}
]
**api_type:** rest | graphql | websocket | form
**source:**
TASK_SPECβ given in the task descriptionPREV_CALLβ from a prior response this episode; specifyfrom_endpoint+from_fieldAUTH_FLOWβ token / cookie / CSRF from the login flowSTATICβ hardcoded in the app; document the actual valueDERIVEDβ aliased from another value (e.g.quote_id=cartId)
App 1 β Shopping and Shopping Admin (Magento 2)
Container: shopping
Source root: /var/www/magento2/ (confirmed β WebArena README runs docker exec shopping /var/www/magento2/bin/magento ...)
Attach to the shopping container in Cursor, open /var/www/magento2/ as the workspace, then paste this prompt into the sidebar:
You are working inside a Magento 2 codebase. Start by exploring the directory structure to orient yourself.
Your job: produce a COMPLETE api_catalog.json covering ALL APIs exposed by this Magento 2
installation β not just a subset. Document every endpoint you find, regardless of whether
it is used in a specific task or not. The goal is a full map of the application's API surface.
API types to scan for (all of them):
1. REST endpoints β declared in webapi.xml files. These are the primary API.
2. GraphQL β Magento has a full GraphQL API parallel to REST. Find the .graphqls schema files
and document every query and mutation.
3. WebSockets β Magento does not typically use WebSockets, but check. If none found, note it.
4. Admin AJAX endpoints β controllers under adminhtml/ that handle JSON AJAX requests.
These are separate from the REST API.
For REST and Admin AJAX endpoints, produce:
{
"api_type": "rest",
"endpoint": "METHOD /path/{template}",
"auth": "none | bearer_token | admin_bearer_token | session_cookie",
"path_params": { "<name>": { "type": "...", "source": "...", "from_endpoint": "...", "from_field": "...", "notes": "..." } },
"query_params": { ... },
"body_params": { ... },
"response_key_fields": ["jq paths that downstream calls will consume"]
}
For GraphQL queries/mutations, produce:
{
"api_type": "graphql",
"endpoint": "POST /graphql",
"operation_name": "...",
"operation_type": "query | mutation | subscription",
"auth": "none | bearer_token",
"variables": { "<name>": { "type": "...", "source": "...", "notes": "..." } },
"response_key_fields": ["jq paths that downstream calls will consume"]
}
Source types: TASK_SPEC | PREV_CALL | AUTH_FLOW | STATIC | DERIVED
- PREV_CALL: must include from_endpoint and from_field (jq path into that response)
- AUTH_FLOW: any token/cookie obtained during login
- STATIC: include the actual static value
Rules:
- Document REQUIRED parameters only. Skip X-Requested-With, Cache-Control, correlation IDs.
- For guest-cart: POST /rest/V1/guest-carts returns a plain quoted string β that IS the cartId.
- quote_id in add-item body equals cartId β mark DERIVED.
- For searchCriteria filter params, document the exact query string structure.
- For GraphQL: read ALL .graphqls files to find every query, mutation, subscription.
Write the output to api_catalog.json at the root of the codebase.
App 2 β Forum (Postmill / Symfony)
Container: forum
Source root: /var/www/html/ (confirmed β docker exec forum find / -name "composer.json" returned /var/www/html/composer.json)
Attach to the forum container in Cursor, open /var/www/html/ as the workspace, then paste this prompt into the sidebar:
You are working inside a Postmill forum codebase (PHP / Symfony). Start by exploring the
directory structure to orient yourself.
Your job: produce a COMPLETE api_catalog.json covering ALL HTTP endpoints exposed by this
Postmill installation β every route, every form action, every AJAX endpoint.
API types to scan for:
1. Form submissions (POST, application/x-www-form-urlencoded) β the primary interaction pattern
2. JSON AJAX endpoints β controllers that return JsonResponse
3. REST-style endpoints β if any exist under /api/
4. WebSockets β Postmill does not typically use WebSockets but check for any Mercure
or Pusher integration. If none, note it.
For form submissions and JSON endpoints, produce:
{
"api_type": "form" | "rest",
"endpoint": "METHOD /path/{template}",
"auth": "none | session_cookie | session_cookie+csrf",
"content_type": "application/x-www-form-urlencoded | application/json | multipart/form-data",
"path_params": { "<name>": { "type": "...", "source": "...", "from_endpoint": "...", "from_field": "...", "notes": "..." } },
"query_params": { ... },
"form_params": { ... }, // use this for form submissions
"body_params": { ... }, // use this for JSON body
"response_key_fields": ["what downstream calls consume from this response"]
}
Source types: TASK_SPEC | PREV_CALL | AUTH_FLOW | STATIC | DERIVED
Postmill-specific notes:
- Login is a form POST. Find the exact CSRF token field name in the security config or login form type.
- All write operations (create post, vote, comment) require session_cookie+csrf.
- The community slug / post ID in path templates come from TASK_SPEC or PREV_CALL.
- Read every FormType class to get the exact field names for each form.
- For CSRF tokens in forms: source is AUTH_FLOW (extracted from the page HTML before submit).
Write the output to api_catalog.json at the root of the codebase.
App 3 β Map (OpenStreetMap / Rails)
Container: openstreetmap-website-web-1
Source root: /app (confirmed β docker exec openstreetmap-website-web-1 ls /app shows Gemfile, app/, config/, db/, etc.)
Attach to the openstreetmap-website-web-1 container in Cursor, open /app as the workspace, then paste this prompt into the sidebar:
You are working inside an OpenStreetMap Rails codebase. Start by exploring the directory
structure to orient yourself.
Your job: produce a COMPLETE api_catalog.json covering ALL HTTP endpoints exposed by this
OpenStreetMap installation β every route, every API endpoint, every format variant.
API types to scan for:
1. REST API under /api/0.6/ β the main machine-readable API (XML and JSON variants)
2. Search / geocoding β how place searches are handled (may proxy to Nominatim, or local)
3. Web interface endpoints β HTML controllers, but also any that return JSON
4. OAuth endpoints β any OAuth 1.0 or 2.0 flows
5. WebSockets β unlikely but check for ActionCable or similar integration
For each endpoint, produce:
{
"api_type": "rest" | "form" | "websocket",
"endpoint": "METHOD /path/{template}",
"auth": "none | oauth | session_cookie",
"format_variants": [".json", ".xml"], // if the endpoint supports multiple formats via extension
"path_params": { "<name>": { "type": "...", "source": "...", "from_endpoint": "...", "from_field": "...", "notes": "..." } },
"query_params": { ... },
"body_params": { ... },
"response_key_fields": ["XPath or jq paths downstream calls consume"]
}
Source types: TASK_SPEC | PREV_CALL | AUTH_FLOW | STATIC | DERIVED
OpenStreetMap-specific notes:
- The /api/0.6/ endpoints return XML by default; .json suffix returns JSON. Document both variants.
- Node, way, relation IDs are integers β source is TASK_SPEC for direct tasks, PREV_CALL when
they come from a search result.
- The search endpoint may be /search or proxied through Nominatim β read the routes and
controllers carefully to find where geographic searches are handled.
- Read ALL controller files, not just the api/ subdirectory. There may be JSON endpoints in
the main web controllers too.
Start with the routes file (e.g. config/routes.rb) to get the complete route list, then read each controller. Write the output to api_catalog.json at the root of the codebase.
App 4 β Wikipedia (Kiwix) β No extraction needed
Kiwix serves a static ZIM file at /data/wikipedia_en_all_maxi_2022-05.zim (confirmed β docker exec kiwix33 find / -name "*.zim" returned that path). There is no application source code to analyze β kiwix-serve is a C++ binary, not a web framework. The catalog entry is hardcoded below.
Hardcoded catalog entry (save as catalogs/wikipedia.json):
{
"_meta": {
"generated": "2026-04-08",
"source": "hardcoded β kiwix-serve binary serves a static ZIM file; no application source to analyze",
"zim_file": "/data/wikipedia_en_all_maxi_2022-05.zim",
"search_response": "HTML only β GET /search returns HTML page; agent must parse <a href> links for article URLs",
"article_page": "GET /wikipedia_en_all_maxi_2022-05/A/{title} β returns HTML article",
"websockets": "none"
},
"endpoints": [
{
"api_type": "rest",
"endpoint": "GET /search",
"auth": "none",
"query_params": {
"pattern": {
"type": "string",
"source": "TASK_SPEC",
"notes": "the search query, URL-encoded"
},
"books.name": {
"type": "string",
"source": "STATIC",
"value": "wikipedia_en_all_maxi_2022-05",
"notes": "selects which ZIM book to search"
}
},
"response_key_fields": [],
"notes": "IMPORTANT: response is HTML, not JSON. Parse <a href> anchor links matching /wikipedia_en_all_maxi_2022-05/A/... to extract article slugs."
},
{
"api_type": "rest",
"endpoint": "GET /wikipedia_en_all_maxi_2022-05/A/{article_title}",
"auth": "none",
"path_params": {
"article_title": {
"type": "string",
"source": "PREV_CALL",
"from_endpoint": "GET /search",
"from_field": "href attribute of first search result <a> tag",
"notes": "URL-encoded article slug, e.g. Albert_Einstein. Extract from the href on the search results HTML page."
}
},
"response_key_fields": [],
"notes": "Returns full HTML article page. HTTP 200 when article exists, 404 when not found."
}
]
}
Validation β smoke-test each catalog entry
After Claude Code writes api_catalog.json for each app, validate a few key entries against the live server before committing:
EC2="ec2-16-59-2-56.us-east-2.compute.amazonaws.com"
# Shopping: guest cart + add item
CART=$(curl -s -X POST http://$EC2:7770/rest/V1/guest-carts \
-H "Content-Type: application/json" | tr -d '"')
echo "cart_id: $CART"
# Get admin token first (product listing requires auth)
ADMIN_TOKEN=$(curl -s -X POST http://$EC2:7770/rest/V1/integration/admin/token \
-H "Content-Type: application/json" \
-d '{"username":"admin","password":"admin1234"}' | tr -d '"')
SKU=$(curl -s "http://$EC2:7770/rest/V1/products?searchCriteria%5BpageSize%5D=1" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['items'][0]['sku'])")
echo "sku: $SKU"
curl -s -X POST http://$EC2:7770/rest/V1/guest-carts/$CART/items \
-H "Content-Type: application/json" \
-d "{\"cartItem\":{\"sku\":\"$SKU\",\"qty\":1,\"quote_id\":\"$CART\"}}" | python3 -m json.tool
If the response is a 200 with item details, the catalog entries for tasks 3β5 are correct.
Or use the automated validator (requires pip install requests):
python3 validate_catalog.py --host ec2-16-59-2-56.us-east-2.compute.amazonaws.com --all
Final structure
All source roots confirmed by running commands on the live EC2 instance:
| Container | Source root | How confirmed |
|---|---|---|
shopping |
/var/www/magento2/ |
WebArena README docker exec commands |
shopping_admin |
/var/www/magento2/ |
Same |
forum |
/var/www/html/ |
find / -name "composer.json" returned /var/www/html/composer.json |
openstreetmap-website-web-1 |
/app |
ls /app shows Gemfile, app/, config/, db/ |
kiwix33 |
N/A | Binary server; data at /data/wikipedia_en_all_maxi_2022-05.zim |
After running the AI on each container, download api_catalog.json via Cursor's remote explorer (Right-click β Download) and save locally as:
catalogs/
shopping.json β from shopping container
shopping_admin.json β from shopping_admin container
forum.json β from forum container
osm.json β from openstreetmap-website-web-1 container
wikipedia.json β hardcoded above (no container needed)
These five files are committed to the repo and loaded by the judge at startup. They are never regenerated during training.
Catalog status β live endpoint verification (2026-04-08)
All five catalogs have been extracted and are committed. Below is the result of live testing against ec2-16-59-2-56.us-east-2.compute.amazonaws.com.
Summary table
| Catalog | Endpoints | JSON valid | Structure | Live test | Notes |
|---|---|---|---|---|---|
shopping.json |
502 | β | β | β PASS | See details below |
shopping_admin.json |
552 | β | β | β PASS | See details below |
forum.json |
91 | β | β | β οΈ WARN | Login not verified (see below) |
osm.json |
217 | β | β | β PASS | See details below |
wikipedia.json |
2 | β | β | β οΈ WARN | Search returns HTML not JSON (corrected) |
Shopping (port 7770) β PASS
Auth: POST /rest/V1/integration/admin/token with admin/admin1234 returns a JWT bearer token. β
Guest cart: POST /rest/V1/guest-carts returns a plain quoted string β confirmed this is the cartId. β
Product listing: GET /rest/V1/products?searchCriteria[pageSize]=N requires bearer token β returns full product JSON with total_count: 104368. β
Add to cart: POST /rest/V1/guest-carts/{cartId}/items with {sku, qty, quote_id} returns item detail at HTTP 200. β
Key finding: GET /rest/V1/products without auth returns HTTP 401 ("consumer isn't authorized"). The catalog documents auth: "admin_bearer_token" for this endpoint β correct.
Shopping Admin (port 7780)
Shopping Admin uses the same Magento 2 REST API as Shopping but accessed on port 7780 with admin credentials. The shopping_admin.json catalog documents the same REST surface with admin-scoped auth. The admin UI itself is a browser-based SPA β its internal AJAX endpoints are documented in the catalog under admin_ajax type entries.
Forum (port 9999)
Homepage: HTTP 200. β
Login form structure: Confirmed via HTML inspection. Form action is POST /login_check. CSRF field is _csrf_token (Symfony token, not form_key). β
Login result: POST /login_check with MarvelsGrantMan136/test1234 redirects to / (homepage) β login successful. β
(The original password notarobot from WebArena defaults was stale; the correct password on this instance is test1234.)
Catalog correctness: The forum.json catalog correctly documents:
POST /login_checkwith_csrf_token,_username,_password- All write endpoints require
session_cookie+csrf route_namefield on each entry (extra metadata, not used by judge)
OSM / Map (port 3000)
Capabilities: GET /api/0.6/capabilities returns XML. β
Map bbox: GET /api/0.6/map?bbox=-0.1,51.5,0.1,51.6 returns valid OSM XML with <osm> root. β
Search finding: GET /search?query=... returns an HTML page (HTTP 200), not JSON. The actual geocoding is dispatched client-side to sub-endpoints:
POST /geocoder/search_osm_nominatimβ Nominatim-backed searchPOST /geocoder/search_latlonβ coordinate-based searchPOST /geocoder/search_osm_nominatim_reverseβ reverse geocode
Search param name: The catalog documents query as the query param name. Confirmed: GET /search?query=New+York returns HTTP 200 (HTML).