Verification Specification
Feature: F007 Generated from: specs/F007-VERIFICATION_INPUT.json Generated: 2026-03-27
1. Unit Tests
Dockerfile Validation
| Test | Description | Input | Expected | Category |
|---|---|---|---|---|
| test_dockerfile_exists | Dockerfile exists at server/Dockerfile | N/A | File exists | happy |
| test_dockerfile_has_base_image_arg | BASE_IMAGE ARG is declared | Parse Dockerfile | ARG BASE_IMAGE present |
happy |
| test_dockerfile_port_env_variable | PORT env var with fallback to 8000 | Parse Dockerfile | ENV PORT or CMD reads $PORT |
happy |
| test_dockerfile_cmd_uses_port_env | CMD respects PORT env override | Set PORT=7860 |
Server binds to 7860 | happy |
| test_dockerfile_non_root_user | Container runs as non-root user | Parse Dockerfile | USER appuser or equivalent non-root USER directive |
security |
| test_dockerfile_copies_databases | Spider databases are bundled | Parse Dockerfile | COPY instruction includes data/databases/ |
happy |
| test_dockerfile_healthcheck | Health check endpoint configured | Parse Dockerfile | HEALTHCHECK directive present | happy |
| test_dockerfile_no_dev_dependencies | No test/dev packages in final image | Inspect final stage | No pytest, ruff, etc. | edge |
Run: uv run pytest tests/unit/test_dockerfile.py -v
openenv.yaml Manifest
| Test | Description | Input | Expected | Category |
|---|---|---|---|---|
| test_manifest_exists | openenv.yaml exists at project root | N/A | File exists | happy |
| test_manifest_spec_version | spec_version field equals 1 | Parse YAML | spec_version: 1 |
happy |
| test_manifest_name | name field is sql_env | Parse YAML | name: sql_env |
happy |
| test_manifest_type_space | type field is 'space' | Parse YAML | type: space |
happy |
| test_manifest_runtime_fastapi | runtime field is 'fastapi' | Parse YAML | runtime: fastapi |
happy |
| test_manifest_app_entrypoint | app field points to valid module | Parse YAML | app: server.app:app |
happy |
| test_manifest_port | port field is 8000 | Parse YAML | port: 8000 |
happy |
| test_manifest_no_extra_fields | No unrecognized fields | Parse YAML | Only spec_version, name, type, runtime, app, port | edge |
| test_manifest_missing_required_field | Missing field produces validation error | Remove name |
Validation error | error |
Run: uv run pytest tests/unit/test_manifest.py -v
Blog Outline (docs/blog-outline.md)
| Test | Description | Input | Expected | Category |
|---|---|---|---|---|
| test_blog_outline_exists | Blog outline file exists | N/A | File at docs/blog-outline.md |
happy |
| test_blog_has_hook_section | Hook section present | Parse markdown | Section heading for hook/intro | happy |
| test_blog_has_problem_section | Problem section present | Parse markdown | Section about static benchmarks | happy |
| test_blog_has_solution_section | Solution/architecture section present | Parse markdown | Section about SQLEnv architecture | happy |
| test_blog_has_results_placeholder | Results placeholder for F006 | Parse markdown | Placeholder text for training results | happy |
| test_blog_has_try_it_section | Try-it-yourself section with links | Parse markdown | Links to HF Space, notebook, GitHub | happy |
| test_blog_links_not_broken | All links in blog are valid or marked placeholder | Parse markdown | No dead internal links | edge |
| test_blog_minimum_length | Blog outline has substantive content | Parse markdown | At least 200 words | edge |
Run: uv run pytest tests/unit/test_blog_outline.py -v
Training Notebook (notebooks/train_grpo.ipynb)
| Test | Description | Input | Expected | Category |
|---|---|---|---|---|
| test_notebook_exists | Notebook file exists | N/A | File at notebooks/train_grpo.ipynb |
happy |
| test_notebook_valid_json | Notebook is valid JSON / ipynb format | Parse file | Valid nbformat structure | happy |
| test_notebook_has_setup_cell | Setup cell with pip install | Inspect cells | Cell containing pip install |
happy |
| test_notebook_has_connect_cell | Connect cell using SQLEnvClient | Inspect cells | Cell importing/using SQLEnvClient | happy |
| test_notebook_has_train_cell | Training cell with GRPO loop | Inspect cells | Cell with training logic | happy |
| test_notebook_has_eval_cell | Evaluation cell for held-out questions | Inspect cells | Cell with evaluation logic | happy |
| test_notebook_has_plot_cell | Plotting cell with matplotlib | Inspect cells | Cell importing matplotlib and plotting | happy |
| test_notebook_colab_compatible | Colab badge or runtime metadata | Inspect metadata | colab in metadata or Colab badge in first cell |
happy |
| test_notebook_no_hardcoded_paths | No absolute local paths | Inspect all cells | No /Users/, /home/, C:\\ paths |
edge |
| test_notebook_cells_ordered | Setup before connect before train | Inspect cell order | Correct logical ordering | edge |
| test_notebook_empty_outputs | Notebook shipped with cleared outputs | Inspect cells | All outputs arrays empty |
edge |
Run: uv run pytest tests/unit/test_notebook.py -v
2. Integration Tests
Flow: Local Docker Build and Run
| Step | Action | Expected | Verification |
|---|---|---|---|
| 1 | docker build -t sql-env:test -f server/Dockerfile . |
Build succeeds with exit code 0 | Check exit code |
| 2 | docker run -d -p 8000:8000 --name sql-env-test sql-env:test |
Container starts | Container running (docker ps) |
| 3 | Wait for health check (up to 30s) | /health returns 200 |
curl -f http://localhost:8000/health |
| 4 | Connect WebSocket client, call reset | Episode starts, observation returned | Valid SQLObservation JSON |
| 5 | Send DESCRIBE action via WebSocket | Column info returned | Non-empty result field |
| 6 | Send ANSWER action via WebSocket | Episode ends, reward returned | done: true, reward is numeric |
| 7 | Stop container | Container stops cleanly | docker stop sql-env-test exits 0 |
Run: uv run pytest tests/integration/test_docker_local.py -v
Flow: PORT Override for HF Spaces
| Step | Action | Expected | Verification |
|---|---|---|---|
| 1 | docker run -d -p 7860:7860 -e PORT=7860 --name sql-env-port sql-env:test |
Container starts on port 7860 | Container running |
| 2 | curl -f http://localhost:7860/health |
Health check passes | HTTP 200 |
| 3 | Port 8000 is NOT listening | No response on 8000 | curl fails on port 8000 |
Run: uv run pytest tests/integration/test_port_override.py -v
Flow: Database Bundling Verification
| Step | Action | Expected | Verification |
|---|---|---|---|
| 1 | Build Docker image | Build succeeds | Exit code 0 |
| 2 | docker run --rm sql-env:test ls /app/env/data/databases/ |
Spider databases present | At least one database directory listed |
| 3 | docker run --rm sql-env:test find /app/env/data/databases/ -name "*.sqlite" |
SQLite files present | At least one .sqlite file found |
| 4 | Start container and reset episode | Episode loads a bundled database | No "database not found" error |
Run: uv run pytest tests/integration/test_db_bundling.py -v
3. API Tests
No new API endpoints are introduced by F007. The existing /health, WebSocket, and REST endpoints from prior features are tested via integration tests above.
4. E2E Tests
Scenario: Judge Experience -- Visit HF Space and Play Episode
Setup: Docker container running (locally simulating HF Space) Actions:
- Open health endpoint URL -- confirm service is up
- Connect via WebSocket
- Call
reset-- receive initial observation with question and schema - Call
stepwith DESCRIBE action -- receive column details - Call
stepwith QUERY action -- receive query results - Call
stepwith ANSWER action -- receive terminal observation with reward Expected: Full episode completes without errors; reward is 0.0 or 1.0
Run: uv run pytest tests/e2e/test_judge_experience.py -v
Scenario: Notebook Cell Sequence Validation
Setup: Notebook file at notebooks/train_grpo.ipynb
Actions:
- Parse notebook JSON
- Validate each cell type and content markers in order:
- Cell with
pip install(setup) - Cell with
SQLEnvClient(connect) - Cell with training loop keywords:
grpo,train,optimizer(train) - Cell with
evaloraccuracyorheld-out(evaluate) - Cell with
matplotliborplt.(plot)
- Cell with
- Validate no syntax errors in code cells (compile check) Expected: All five cell categories present in correct order; no syntax errors
Run: uv run pytest tests/e2e/test_notebook_validation.py -v
Scenario: README Has Competition-Ready Content
Setup: README.md at project root Actions:
- Verify README contains project description
- Verify README contains quickstart / getting started section
- Verify README contains link to HF Space (or placeholder)
- Verify README contains link to training notebook
- Verify README contains architecture or how-it-works section Expected: All five content sections present
Run: uv run pytest tests/e2e/test_readme_completeness.py -v
5. Edge Cases Checklist
- Dockerfile builds on CPU-only machine (no CUDA dependencies in final image)
- Container memory stays under HF Spaces free tier limit (~16GB)
- PORT env variable with non-numeric value handled gracefully
- PORT env variable with value 0 or negative handled gracefully
- Missing data/databases/ directory causes clear error at startup, not silent failure
- openenv.yaml with wrong spec_version is rejected by openenv validate
- Blog outline contains no TODO/FIXME/placeholder markers except the results section
- Notebook code cells have no import errors when dependencies are installed
- Notebook does not require GPU (runs on Colab free tier CPU)
- Container starts within 60 seconds (reasonable cold start)
- Docker image size is under 2GB (reasonable for free tier)
- .dockerignore excludes test files, .git, pycache, .env
- Non-root user can read database files (file permissions correct)
- Container handles SIGTERM gracefully (clean shutdown)
6. Evidence Requirements
| Category | Evidence Type | Example |
|---|---|---|
| Unit tests | pytest output | X passed |
| Integration | pytest + docker logs | Container healthy, episode complete |
| Dockerfile | docker build output | Successfully built <hash> |
| Port override | curl output | HTTP 200 on port 7860 |
| Database bundling | docker exec output | ls shows .sqlite files |
| Blog outline | File exists + content check | 5 sections present |
| Notebook | nbformat validation | Valid ipynb, 5+ cells in order |
| README | Content grep | All required sections present |
| E2E | Full episode log | reset -> steps -> answer, reward=1.0 |
| Image size | docker images output | < 2GB |
7. External Deployment Prerequisites and Remediation
Use this checklist when deployment verification fails with external auth/access errors.
GHCR Base Image Access (403 Forbidden)
- Authenticate Docker to GHCR:
echo "$GITHUB_TOKEN" | docker login ghcr.io -u <github-username> --password-stdin
- Ensure
GITHUB_TOKENhas package read scope forghcr.io/meta-pytorch/openenv-base. - Retry build using explicit lowercase tag:
uv run openenv build -t openenv-sql-env-f007-hf-submission
Hugging Face Push Readiness
- Authenticate Hugging Face CLI:
huggingface-cli login
- Confirm target Space repo exists and token has write access.
- Run push:
uv run openenv push
Verification Outcome Rules for External Failures
- If local tests pass but GHCR/HF auth fails, record status as partial verification (external blocker) and include exact remediation commands above.
- Do not mark verifier result as
approveduntil at least one authenticated build+push attempt is documented. - Record authenticated evidence in
specs/F007-DEMO.mdunder## Live Local Proofwith separateAuthenticated Build EvidenceandHugging Face Push Evidencesubsections containing raw command output.