# Verification Specification **Feature:** F007 **Generated from:** specs/F007-VERIFICATION_INPUT.json **Generated:** 2026-03-27 --- ## 1. Unit Tests ### Dockerfile Validation | Test | Description | Input | Expected | Category | |------|-------------|-------|----------|----------| | test_dockerfile_exists | Dockerfile exists at server/Dockerfile | N/A | File exists | happy | | test_dockerfile_has_base_image_arg | BASE_IMAGE ARG is declared | Parse Dockerfile | `ARG BASE_IMAGE` present | happy | | test_dockerfile_port_env_variable | PORT env var with fallback to 8000 | Parse Dockerfile | `ENV PORT` or CMD reads `$PORT` | happy | | test_dockerfile_cmd_uses_port_env | CMD respects PORT env override | Set `PORT=7860` | Server binds to 7860 | happy | | test_dockerfile_non_root_user | Container runs as non-root user | Parse Dockerfile | `USER appuser` or equivalent non-root USER directive | security | | test_dockerfile_copies_databases | Spider databases are bundled | Parse Dockerfile | COPY instruction includes `data/databases/` | happy | | test_dockerfile_healthcheck | Health check endpoint configured | Parse Dockerfile | HEALTHCHECK directive present | happy | | test_dockerfile_no_dev_dependencies | No test/dev packages in final image | Inspect final stage | No pytest, ruff, etc. | edge | **Run:** `uv run pytest tests/unit/test_dockerfile.py -v` ### openenv.yaml Manifest | Test | Description | Input | Expected | Category | |------|-------------|-------|----------|----------| | test_manifest_exists | openenv.yaml exists at project root | N/A | File exists | happy | | test_manifest_spec_version | spec_version field equals 1 | Parse YAML | `spec_version: 1` | happy | | test_manifest_name | name field is sql_env | Parse YAML | `name: sql_env` | happy | | test_manifest_type_space | type field is 'space' | Parse YAML | `type: space` | happy | | test_manifest_runtime_fastapi | runtime field is 'fastapi' | Parse YAML | `runtime: fastapi` | happy | | test_manifest_app_entrypoint | app field points to valid module | Parse YAML | `app: server.app:app` | happy | | test_manifest_port | port field is 8000 | Parse YAML | `port: 8000` | happy | | test_manifest_no_extra_fields | No unrecognized fields | Parse YAML | Only spec_version, name, type, runtime, app, port | edge | | test_manifest_missing_required_field | Missing field produces validation error | Remove `name` | Validation error | error | **Run:** `uv run pytest tests/unit/test_manifest.py -v` ### Blog Outline (docs/blog-outline.md) | Test | Description | Input | Expected | Category | |------|-------------|-------|----------|----------| | test_blog_outline_exists | Blog outline file exists | N/A | File at `docs/blog-outline.md` | happy | | test_blog_has_hook_section | Hook section present | Parse markdown | Section heading for hook/intro | happy | | test_blog_has_problem_section | Problem section present | Parse markdown | Section about static benchmarks | happy | | test_blog_has_solution_section | Solution/architecture section present | Parse markdown | Section about SQLEnv architecture | happy | | test_blog_has_results_placeholder | Results placeholder for F006 | Parse markdown | Placeholder text for training results | happy | | test_blog_has_try_it_section | Try-it-yourself section with links | Parse markdown | Links to HF Space, notebook, GitHub | happy | | test_blog_links_not_broken | All links in blog are valid or marked placeholder | Parse markdown | No dead internal links | edge | | test_blog_minimum_length | Blog outline has substantive content | Parse markdown | At least 200 words | edge | **Run:** `uv run pytest tests/unit/test_blog_outline.py -v` ### Training Notebook (notebooks/train_grpo.ipynb) | Test | Description | Input | Expected | Category | |------|-------------|-------|----------|----------| | test_notebook_exists | Notebook file exists | N/A | File at `notebooks/train_grpo.ipynb` | happy | | test_notebook_valid_json | Notebook is valid JSON / ipynb format | Parse file | Valid nbformat structure | happy | | test_notebook_has_setup_cell | Setup cell with pip install | Inspect cells | Cell containing `pip install` | happy | | test_notebook_has_connect_cell | Connect cell using SQLEnvClient | Inspect cells | Cell importing/using SQLEnvClient | happy | | test_notebook_has_train_cell | Training cell with GRPO loop | Inspect cells | Cell with training logic | happy | | test_notebook_has_eval_cell | Evaluation cell for held-out questions | Inspect cells | Cell with evaluation logic | happy | | test_notebook_has_plot_cell | Plotting cell with matplotlib | Inspect cells | Cell importing matplotlib and plotting | happy | | test_notebook_colab_compatible | Colab badge or runtime metadata | Inspect metadata | `colab` in metadata or Colab badge in first cell | happy | | test_notebook_no_hardcoded_paths | No absolute local paths | Inspect all cells | No `/Users/`, `/home/`, `C:\\` paths | edge | | test_notebook_cells_ordered | Setup before connect before train | Inspect cell order | Correct logical ordering | edge | | test_notebook_empty_outputs | Notebook shipped with cleared outputs | Inspect cells | All `outputs` arrays empty | edge | **Run:** `uv run pytest tests/unit/test_notebook.py -v` --- ## 2. Integration Tests ### Flow: Local Docker Build and Run | Step | Action | Expected | Verification | |------|--------|----------|--------------| | 1 | `docker build -t sql-env:test -f server/Dockerfile .` | Build succeeds with exit code 0 | Check exit code | | 2 | `docker run -d -p 8000:8000 --name sql-env-test sql-env:test` | Container starts | Container running (`docker ps`) | | 3 | Wait for health check (up to 30s) | `/health` returns 200 | `curl -f http://localhost:8000/health` | | 4 | Connect WebSocket client, call reset | Episode starts, observation returned | Valid SQLObservation JSON | | 5 | Send DESCRIBE action via WebSocket | Column info returned | Non-empty result field | | 6 | Send ANSWER action via WebSocket | Episode ends, reward returned | `done: true`, reward is numeric | | 7 | Stop container | Container stops cleanly | `docker stop sql-env-test` exits 0 | **Run:** `uv run pytest tests/integration/test_docker_local.py -v` ### Flow: PORT Override for HF Spaces | Step | Action | Expected | Verification | |------|--------|----------|--------------| | 1 | `docker run -d -p 7860:7860 -e PORT=7860 --name sql-env-port sql-env:test` | Container starts on port 7860 | Container running | | 2 | `curl -f http://localhost:7860/health` | Health check passes | HTTP 200 | | 3 | Port 8000 is NOT listening | No response on 8000 | `curl` fails on port 8000 | **Run:** `uv run pytest tests/integration/test_port_override.py -v` ### Flow: Database Bundling Verification | Step | Action | Expected | Verification | |------|--------|----------|--------------| | 1 | Build Docker image | Build succeeds | Exit code 0 | | 2 | `docker run --rm sql-env:test ls /app/env/data/databases/` | Spider databases present | At least one database directory listed | | 3 | `docker run --rm sql-env:test find /app/env/data/databases/ -name "*.sqlite"` | SQLite files present | At least one .sqlite file found | | 4 | Start container and reset episode | Episode loads a bundled database | No "database not found" error | **Run:** `uv run pytest tests/integration/test_db_bundling.py -v` --- ## 3. API Tests No new API endpoints are introduced by F007. The existing `/health`, WebSocket, and REST endpoints from prior features are tested via integration tests above. --- ## 4. E2E Tests ### Scenario: Judge Experience -- Visit HF Space and Play Episode **Setup:** Docker container running (locally simulating HF Space) **Actions:** 1. Open health endpoint URL -- confirm service is up 2. Connect via WebSocket 3. Call `reset` -- receive initial observation with question and schema 4. Call `step` with DESCRIBE action -- receive column details 5. Call `step` with QUERY action -- receive query results 6. Call `step` with ANSWER action -- receive terminal observation with reward **Expected:** Full episode completes without errors; reward is 0.0 or 1.0 **Run:** `uv run pytest tests/e2e/test_judge_experience.py -v` ### Scenario: Notebook Cell Sequence Validation **Setup:** Notebook file at `notebooks/train_grpo.ipynb` **Actions:** 1. Parse notebook JSON 2. Validate each cell type and content markers in order: - Cell with `pip install` (setup) - Cell with `SQLEnvClient` (connect) - Cell with training loop keywords: `grpo`, `train`, `optimizer` (train) - Cell with `eval` or `accuracy` or `held-out` (evaluate) - Cell with `matplotlib` or `plt.` (plot) 3. Validate no syntax errors in code cells (compile check) **Expected:** All five cell categories present in correct order; no syntax errors **Run:** `uv run pytest tests/e2e/test_notebook_validation.py -v` ### Scenario: README Has Competition-Ready Content **Setup:** README.md at project root **Actions:** 1. Verify README contains project description 2. Verify README contains quickstart / getting started section 3. Verify README contains link to HF Space (or placeholder) 4. Verify README contains link to training notebook 5. Verify README contains architecture or how-it-works section **Expected:** All five content sections present **Run:** `uv run pytest tests/e2e/test_readme_completeness.py -v` --- ## 5. Edge Cases Checklist - [ ] Dockerfile builds on CPU-only machine (no CUDA dependencies in final image) - [ ] Container memory stays under HF Spaces free tier limit (~16GB) - [ ] PORT env variable with non-numeric value handled gracefully - [ ] PORT env variable with value 0 or negative handled gracefully - [ ] Missing data/databases/ directory causes clear error at startup, not silent failure - [ ] openenv.yaml with wrong spec_version is rejected by openenv validate - [ ] Blog outline contains no TODO/FIXME/placeholder markers except the results section - [ ] Notebook code cells have no import errors when dependencies are installed - [ ] Notebook does not require GPU (runs on Colab free tier CPU) - [ ] Container starts within 60 seconds (reasonable cold start) - [ ] Docker image size is under 2GB (reasonable for free tier) - [ ] .dockerignore excludes test files, .git, __pycache__, .env - [ ] Non-root user can read database files (file permissions correct) - [ ] Container handles SIGTERM gracefully (clean shutdown) --- ## 6. Evidence Requirements | Category | Evidence Type | Example | |----------|---------------|---------| | Unit tests | pytest output | `X passed` | | Integration | pytest + docker logs | `Container healthy, episode complete` | | Dockerfile | docker build output | `Successfully built ` | | Port override | curl output | `HTTP 200 on port 7860` | | Database bundling | docker exec output | `ls` shows .sqlite files | | Blog outline | File exists + content check | `5 sections present` | | Notebook | nbformat validation | `Valid ipynb, 5+ cells in order` | | README | Content grep | `All required sections present` | | E2E | Full episode log | `reset -> steps -> answer, reward=1.0` | | Image size | docker images output | `< 2GB` | --- ## 7. External Deployment Prerequisites and Remediation Use this checklist when deployment verification fails with external auth/access errors. ### GHCR Base Image Access (`403 Forbidden`) 1. Authenticate Docker to GHCR: - `echo "$GITHUB_TOKEN" | docker login ghcr.io -u --password-stdin` 2. Ensure `GITHUB_TOKEN` has package read scope for `ghcr.io/meta-pytorch/openenv-base`. 3. Retry build using explicit lowercase tag: - `uv run openenv build -t openenv-sql-env-f007-hf-submission` ### Hugging Face Push Readiness 1. Authenticate Hugging Face CLI: - `huggingface-cli login` 2. Confirm target Space repo exists and token has write access. 3. Run push: - `uv run openenv push` ### Verification Outcome Rules for External Failures - If local tests pass but GHCR/HF auth fails, record status as **partial verification** (external blocker) and include exact remediation commands above. - Do not mark verifier result as `approved` until at least one authenticated build+push attempt is documented. - Record authenticated evidence in `specs/F007-DEMO.md` under `## Live Local Proof` with separate `Authenticated Build Evidence` and `Hugging Face Push Evidence` subsections containing raw command output.