Feature Demo: F001 — Core Environment Loop
Generated: 2026-03-24T21:36:32Z Context source: spec + discovery only (implementation not read) Feature entry: FEATURES.json (F001)
What This Feature Does
F001 turns the SQL environment from a non-functional loop into a usable episode flow: an agent can reset into a question, explore schema/data with structured actions, run SQL safely, and terminate with an answer or budget exhaustion.
From a user perspective, this should feel predictable and teachable: fast query feedback, clear errors when a query/action is invalid, and clean episode boundaries.
What Is Already Proven
Verified in This Demo Run
- Server startup works locally via
uv run uvicorn server.app:app --host 127.0.0.1 --port 8011(startup/shutdown logs captured). - The environment currently fails at
/resetin this workspace because the required Spider DB file is missing (FileNotFoundErrorforstudent_assessment). - Downloader CLI is present and runnable (
--helpworks). - Downloader input hardening rejects unsafe DB identifiers (e.g.
../bad). - Full local test suite passes (
25 passed).
Previously Verified Evidence
specs/FEATURES.json(features[].id == F001) records verification evidence:uv run pytest tests/ -v, 25/25 passed, verifierapprovedat2026-03-24T21:27:31Z.specs/F001-IMPLEMENTATION_SPEC.mdSection 10 states user-value behavior for reset/step lifecycle and structured actions.
What Still Needs User Verification
- Provision
data/databases/student_assessment/student_assessment.sqlitesuccessfully in your environment. - Re-run live
/resetand/stepAPI calls after DB provisioning to confirm end-to-end episode behavior (DESCRIBE/SAMPLE/QUERY/ANSWER).
Quickstart / Verification Steps
Run these commands to see the feature in action:
uv run uvicorn server.app:app --host 127.0.0.1 --port 8011
uv run python scripts/download_spider_databases.py --db-id student_assessment
uv run pytest tests/ -v
If /reset fails with missing DB, complete the DB download/provisioning first, then retry API interactions.
Live Local Proof
Start the Environment Server
This confirms the feature surface is exposed on a local API endpoint.
uv run uvicorn server.app:app --host 127.0.0.1 --port 8011
INFO: Started server process [26402]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8011 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [26402]
<bash_metadata>
bash tool terminated command after exceeding timeout 8000 ms
</bash_metadata>
The API process starts successfully and advertises the expected local URL.
Attempt Reset Without Database Provisioning (Proof Boundary)
This shows the current environment boundary in this workspace: reset cannot complete until DB assets are present.
uv run python - <<'PY'
import httpx
from server.app import app
transport = httpx.ASGITransport(app=app)
async def main():
async with httpx.AsyncClient(transport=transport, base_url="http://local") as client:
try:
await client.post('/reset', json={})
except Exception as exc:
print(type(exc).__name__)
print(str(exc))
import asyncio
asyncio.run(main())
PY
Loaded tokenizer: mistralai/Mistral-7B-Instruct-v0.1
FileNotFoundError
Database 'student_assessment' not found in /Users/hjerp/Projects/sql-env-F001-core-environment-loop/data/databases
The failure is explicit and actionable (missing DB), not a crash or opaque error.
Existing Evidence
- Verification record source:
specs/FEATURES.json→features[F001].verification_evidence. - Verification spec source:
specs/F001-VERIFICATION_SPEC.md(unit/integration/API/E2E scenarios and edge-case checklist).
Manual Verification Checklist
- Download/provision Spider DB files so
student_assessment.sqliteexists underdata/databases/student_assessment/. - Start server:
uv run uvicorn server.app:app --host 127.0.0.1 --port 8011. - POST
/resetand confirmdone=false, question present, and schema table names visible. - POST
/stepwithDESCRIBEandQUERYactions; confirm step/budget updates and readable results. - POST invalid
QUERY(non-SELECT) and verify clear error in observation. - POST
ANSWERand verify terminaldone=truewith reward behavior.
Edge Cases Exercised
Unsafe Database Identifier Rejected
uv run python scripts/download_spider_databases.py --db-id "../bad"
ValueError: Invalid db_id. Only letters, numbers, and underscores are allowed.
This confirms input hardening against path-traversal style DB IDs.
Upstream Database URL Failure Is Surfaced Clearly
uv run python scripts/download_spider_databases.py --db-id student_assessment
RuntimeError: Failed to download 'student_assessment' from Spider raw URL: HTTP Error 404: Not Found
This demonstrates an explicit failure mode for data provisioning when upstream URL resolution fails.
Test Evidence (Optional)
Supplementary proof that the feature works correctly across scenarios.
| Test Suite | Tests | Status |
|---|---|---|
Smoke / contract regression (tests/test_smoke.py) |
25 | All passed |
Representative command:
uv run pytest tests/ -v
============================= test session starts ==============================
...
collected 25 items
...
============================== 25 passed in 6.27s ==============================
Feature Links
- Implementation spec:
specs/F001-IMPLEMENTATION_SPEC.md - Verification spec:
specs/F001-VERIFICATION_SPEC.md
Demo generated by feature-demo agent. Re-run with /feature-demo F001 to refresh.