sql_env / specs /F001-DEMO.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified

Feature Demo: F001 — Core Environment Loop

Generated: 2026-03-24T21:36:32Z Context source: spec + discovery only (implementation not read) Feature entry: FEATURES.json (F001)


What This Feature Does

F001 turns the SQL environment from a non-functional loop into a usable episode flow: an agent can reset into a question, explore schema/data with structured actions, run SQL safely, and terminate with an answer or budget exhaustion.

From a user perspective, this should feel predictable and teachable: fast query feedback, clear errors when a query/action is invalid, and clean episode boundaries.


What Is Already Proven

Verified in This Demo Run

  • Server startup works locally via uv run uvicorn server.app:app --host 127.0.0.1 --port 8011 (startup/shutdown logs captured).
  • The environment currently fails at /reset in this workspace because the required Spider DB file is missing (FileNotFoundError for student_assessment).
  • Downloader CLI is present and runnable (--help works).
  • Downloader input hardening rejects unsafe DB identifiers (e.g. ../bad).
  • Full local test suite passes (25 passed).

Previously Verified Evidence

  • specs/FEATURES.json (features[].id == F001) records verification evidence: uv run pytest tests/ -v, 25/25 passed, verifier approved at 2026-03-24T21:27:31Z.
  • specs/F001-IMPLEMENTATION_SPEC.md Section 10 states user-value behavior for reset/step lifecycle and structured actions.

What Still Needs User Verification

  • Provision data/databases/student_assessment/student_assessment.sqlite successfully in your environment.
  • Re-run live /reset and /step API calls after DB provisioning to confirm end-to-end episode behavior (DESCRIBE/SAMPLE/QUERY/ANSWER).

Quickstart / Verification Steps

Run these commands to see the feature in action:

uv run uvicorn server.app:app --host 127.0.0.1 --port 8011
uv run python scripts/download_spider_databases.py --db-id student_assessment
uv run pytest tests/ -v

If /reset fails with missing DB, complete the DB download/provisioning first, then retry API interactions.


Live Local Proof

Start the Environment Server

This confirms the feature surface is exposed on a local API endpoint.

uv run uvicorn server.app:app --host 127.0.0.1 --port 8011
INFO:     Started server process [26402]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8011 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [26402]

<bash_metadata>
bash tool terminated command after exceeding timeout 8000 ms
</bash_metadata>

The API process starts successfully and advertises the expected local URL.

Attempt Reset Without Database Provisioning (Proof Boundary)

This shows the current environment boundary in this workspace: reset cannot complete until DB assets are present.

uv run python - <<'PY'
import httpx
from server.app import app

transport = httpx.ASGITransport(app=app)

async def main():
    async with httpx.AsyncClient(transport=transport, base_url="http://local") as client:
        try:
            await client.post('/reset', json={})
        except Exception as exc:
            print(type(exc).__name__)
            print(str(exc))

import asyncio
asyncio.run(main())
PY
Loaded tokenizer: mistralai/Mistral-7B-Instruct-v0.1
FileNotFoundError
Database 'student_assessment' not found in /Users/hjerp/Projects/sql-env-F001-core-environment-loop/data/databases

The failure is explicit and actionable (missing DB), not a crash or opaque error.


Existing Evidence

  • Verification record source: specs/FEATURES.jsonfeatures[F001].verification_evidence.
  • Verification spec source: specs/F001-VERIFICATION_SPEC.md (unit/integration/API/E2E scenarios and edge-case checklist).

Manual Verification Checklist

  1. Download/provision Spider DB files so student_assessment.sqlite exists under data/databases/student_assessment/.
  2. Start server: uv run uvicorn server.app:app --host 127.0.0.1 --port 8011.
  3. POST /reset and confirm done=false, question present, and schema table names visible.
  4. POST /step with DESCRIBE and QUERY actions; confirm step/budget updates and readable results.
  5. POST invalid QUERY (non-SELECT) and verify clear error in observation.
  6. POST ANSWER and verify terminal done=true with reward behavior.

Edge Cases Exercised

Unsafe Database Identifier Rejected

uv run python scripts/download_spider_databases.py --db-id "../bad"
ValueError: Invalid db_id. Only letters, numbers, and underscores are allowed.

This confirms input hardening against path-traversal style DB IDs.

Upstream Database URL Failure Is Surfaced Clearly

uv run python scripts/download_spider_databases.py --db-id student_assessment
RuntimeError: Failed to download 'student_assessment' from Spider raw URL: HTTP Error 404: Not Found

This demonstrates an explicit failure mode for data provisioning when upstream URL resolution fails.


Test Evidence (Optional)

Supplementary proof that the feature works correctly across scenarios.

Test Suite Tests Status
Smoke / contract regression (tests/test_smoke.py) 25 All passed

Representative command:

uv run pytest tests/ -v
============================= test session starts ==============================
...
collected 25 items
...
============================== 25 passed in 6.27s ==============================

Feature Links

  • Implementation spec: specs/F001-IMPLEMENTATION_SPEC.md
  • Verification spec: specs/F001-VERIFICATION_SPEC.md

Demo generated by feature-demo agent. Re-run with /feature-demo F001 to refresh.