Spaces:

hjerpe
/

sql_env

Running

App Files Files Community

sql_env / specs /F007-VERIFICATION_SPEC.md

hjerpe

Upload folder using huggingface_hub

5dd1bb4 verified 21 days ago

preview code

raw

history blame contribute delete

12.4 kB

Verification Specification

Feature: F007 Generated from: specs/F007-VERIFICATION_INPUT.json Generated: 2026-03-27

1. Unit Tests

Dockerfile Validation

Test	Description	Input	Expected	Category
test_dockerfile_exists	Dockerfile exists at server/Dockerfile	N/A	File exists	happy
test_dockerfile_has_base_image_arg	BASE_IMAGE ARG is declared	Parse Dockerfile	`ARG BASE_IMAGE` present	happy
test_dockerfile_port_env_variable	PORT env var with fallback to 8000	Parse Dockerfile	`ENV PORT` or CMD reads `$PORT`	happy
test_dockerfile_cmd_uses_port_env	CMD respects PORT env override	Set `PORT=7860`	Server binds to 7860	happy
test_dockerfile_non_root_user	Container runs as non-root user	Parse Dockerfile	`USER appuser` or equivalent non-root USER directive	security
test_dockerfile_copies_databases	Spider databases are bundled	Parse Dockerfile	COPY instruction includes `data/databases/`	happy
test_dockerfile_healthcheck	Health check endpoint configured	Parse Dockerfile	HEALTHCHECK directive present	happy
test_dockerfile_no_dev_dependencies	No test/dev packages in final image	Inspect final stage	No pytest, ruff, etc.	edge

Run: uv run pytest tests/unit/test_dockerfile.py -v

openenv.yaml Manifest

Test	Description	Input	Expected	Category
test_manifest_exists	openenv.yaml exists at project root	N/A	File exists	happy
test_manifest_spec_version	spec_version field equals 1	Parse YAML	`spec_version: 1`	happy
test_manifest_name	name field is sql_env	Parse YAML	`name: sql_env`	happy
test_manifest_type_space	type field is 'space'	Parse YAML	`type: space`	happy
test_manifest_runtime_fastapi	runtime field is 'fastapi'	Parse YAML	`runtime: fastapi`	happy
test_manifest_app_entrypoint	app field points to valid module	Parse YAML	`app: server.app:app`	happy
test_manifest_port	port field is 8000	Parse YAML	`port: 8000`	happy
test_manifest_no_extra_fields	No unrecognized fields	Parse YAML	Only spec_version, name, type, runtime, app, port	edge
test_manifest_missing_required_field	Missing field produces validation error	Remove `name`	Validation error	error

Run: uv run pytest tests/unit/test_manifest.py -v

Blog Outline (docs/blog-outline.md)

Test	Description	Input	Expected	Category
test_blog_outline_exists	Blog outline file exists	N/A	File at `docs/blog-outline.md`	happy
test_blog_has_hook_section	Hook section present	Parse markdown	Section heading for hook/intro	happy
test_blog_has_problem_section	Problem section present	Parse markdown	Section about static benchmarks	happy
test_blog_has_solution_section	Solution/architecture section present	Parse markdown	Section about SQLEnv architecture	happy
test_blog_has_results_placeholder	Results placeholder for F006	Parse markdown	Placeholder text for training results	happy
test_blog_has_try_it_section	Try-it-yourself section with links	Parse markdown	Links to HF Space, notebook, GitHub	happy
test_blog_links_not_broken	All links in blog are valid or marked placeholder	Parse markdown	No dead internal links	edge
test_blog_minimum_length	Blog outline has substantive content	Parse markdown	At least 200 words	edge

Run: uv run pytest tests/unit/test_blog_outline.py -v

Training Notebook (notebooks/train_grpo.ipynb)

Test	Description	Input	Expected	Category
test_notebook_exists	Notebook file exists	N/A	File at `notebooks/train_grpo.ipynb`	happy
test_notebook_valid_json	Notebook is valid JSON / ipynb format	Parse file	Valid nbformat structure	happy
test_notebook_has_setup_cell	Setup cell with pip install	Inspect cells	Cell containing `pip install`	happy
test_notebook_has_connect_cell	Connect cell using SQLEnvClient	Inspect cells	Cell importing/using SQLEnvClient	happy
test_notebook_has_train_cell	Training cell with GRPO loop	Inspect cells	Cell with training logic	happy
test_notebook_has_eval_cell	Evaluation cell for held-out questions	Inspect cells	Cell with evaluation logic	happy
test_notebook_has_plot_cell	Plotting cell with matplotlib	Inspect cells	Cell importing matplotlib and plotting	happy
test_notebook_colab_compatible	Colab badge or runtime metadata	Inspect metadata	`colab` in metadata or Colab badge in first cell	happy
test_notebook_no_hardcoded_paths	No absolute local paths	Inspect all cells	No `/Users/`, `/home/`, `C:\\` paths	edge
test_notebook_cells_ordered	Setup before connect before train	Inspect cell order	Correct logical ordering	edge
test_notebook_empty_outputs	Notebook shipped with cleared outputs	Inspect cells	All `outputs` arrays empty	edge

Run: uv run pytest tests/unit/test_notebook.py -v

2. Integration Tests

Flow: Local Docker Build and Run

Step	Action	Expected	Verification
1	`docker build -t sql-env:test -f server/Dockerfile .`	Build succeeds with exit code 0	Check exit code
2	`docker run -d -p 8000:8000 --name sql-env-test sql-env:test`	Container starts	Container running (`docker ps`)
3	Wait for health check (up to 30s)	`/health` returns 200	`curl -f http://localhost:8000/health`
4	Connect WebSocket client, call reset	Episode starts, observation returned	Valid SQLObservation JSON
5	Send DESCRIBE action via WebSocket	Column info returned	Non-empty result field
6	Send ANSWER action via WebSocket	Episode ends, reward returned	`done: true`, reward is numeric
7	Stop container	Container stops cleanly	`docker stop sql-env-test` exits 0

Run: uv run pytest tests/integration/test_docker_local.py -v

Flow: PORT Override for HF Spaces

Step	Action	Expected	Verification
1	`docker run -d -p 7860:7860 -e PORT=7860 --name sql-env-port sql-env:test`	Container starts on port 7860	Container running
2	`curl -f http://localhost:7860/health`	Health check passes	HTTP 200
3	Port 8000 is NOT listening	No response on 8000	`curl` fails on port 8000

Run: uv run pytest tests/integration/test_port_override.py -v

Flow: Database Bundling Verification

Step	Action	Expected	Verification
1	Build Docker image	Build succeeds	Exit code 0
2	`docker run --rm sql-env:test ls /app/env/data/databases/`	Spider databases present	At least one database directory listed
3	`docker run --rm sql-env:test find /app/env/data/databases/ -name "*.sqlite"`	SQLite files present	At least one .sqlite file found
4	Start container and reset episode	Episode loads a bundled database	No "database not found" error

Run: uv run pytest tests/integration/test_db_bundling.py -v

3. API Tests

No new API endpoints are introduced by F007. The existing /health, WebSocket, and REST endpoints from prior features are tested via integration tests above.

4. E2E Tests

Scenario: Judge Experience -- Visit HF Space and Play Episode

Setup: Docker container running (locally simulating HF Space) Actions:

Open health endpoint URL -- confirm service is up
Connect via WebSocket
Call reset -- receive initial observation with question and schema
Call step with DESCRIBE action -- receive column details
Call step with QUERY action -- receive query results
Call step with ANSWER action -- receive terminal observation with reward Expected: Full episode completes without errors; reward is 0.0 or 1.0

Run: uv run pytest tests/e2e/test_judge_experience.py -v

Scenario: Notebook Cell Sequence Validation

Setup: Notebook file at notebooks/train_grpo.ipynb Actions:

Parse notebook JSON
Validate each cell type and content markers in order:
- Cell with pip install (setup)
- Cell with SQLEnvClient (connect)
- Cell with training loop keywords: grpo, train, optimizer (train)
- Cell with eval or accuracy or held-out (evaluate)
- Cell with matplotlib or plt. (plot)
Validate no syntax errors in code cells (compile check) Expected: All five cell categories present in correct order; no syntax errors

Run: uv run pytest tests/e2e/test_notebook_validation.py -v

Scenario: README Has Competition-Ready Content

Setup: README.md at project root Actions:

Verify README contains project description
Verify README contains quickstart / getting started section
Verify README contains link to HF Space (or placeholder)
Verify README contains link to training notebook
Verify README contains architecture or how-it-works section Expected: All five content sections present

Run: uv run pytest tests/e2e/test_readme_completeness.py -v

5. Edge Cases Checklist

Dockerfile builds on CPU-only machine (no CUDA dependencies in final image)
Container memory stays under HF Spaces free tier limit (~16GB)
PORT env variable with non-numeric value handled gracefully
PORT env variable with value 0 or negative handled gracefully
Missing data/databases/ directory causes clear error at startup, not silent failure
openenv.yaml with wrong spec_version is rejected by openenv validate
Blog outline contains no TODO/FIXME/placeholder markers except the results section
Notebook code cells have no import errors when dependencies are installed
Notebook does not require GPU (runs on Colab free tier CPU)
Container starts within 60 seconds (reasonable cold start)
Docker image size is under 2GB (reasonable for free tier)
.dockerignore excludes test files, .git, pycache, .env
Non-root user can read database files (file permissions correct)
Container handles SIGTERM gracefully (clean shutdown)

6. Evidence Requirements

Category	Evidence Type	Example
Unit tests	pytest output	`X passed`
Integration	pytest + docker logs	`Container healthy, episode complete`
Dockerfile	docker build output	`Successfully built <hash>`
Port override	curl output	`HTTP 200 on port 7860`
Database bundling	docker exec output	`ls` shows .sqlite files
Blog outline	File exists + content check	`5 sections present`
Notebook	nbformat validation	`Valid ipynb, 5+ cells in order`
README	Content grep	`All required sections present`
E2E	Full episode log	`reset -> steps -> answer, reward=1.0`
Image size	docker images output	`< 2GB`

7. External Deployment Prerequisites and Remediation

Use this checklist when deployment verification fails with external auth/access errors.

GHCR Base Image Access (`403 Forbidden`)

Authenticate Docker to GHCR:
- echo "$GITHUB_TOKEN" | docker login ghcr.io -u <github-username> --password-stdin
Ensure GITHUB_TOKEN has package read scope for ghcr.io/meta-pytorch/openenv-base.
Retry build using explicit lowercase tag:
- uv run openenv build -t openenv-sql-env-f007-hf-submission

Hugging Face Push Readiness

Authenticate Hugging Face CLI:
- huggingface-cli login
Confirm target Space repo exists and token has write access.
Run push:
- uv run openenv push

Verification Outcome Rules for External Failures

If local tests pass but GHCR/HF auth fails, record status as partial verification (external blocker) and include exact remediation commands above.
Do not mark verifier result as approved until at least one authenticated build+push attempt is documented.
Record authenticated evidence in specs/F007-DEMO.md under ## Live Local Proof with separate Authenticated Build Evidence and Hugging Face Push Evidence subsections containing raw command output.