# Implementation Specification **Change:** F007 — HuggingFace Deployment & Submission Package **Date:** 2026-03-27 **Research Summary:** [F007-RESEARCH_SUMMARY.md](./F007-RESEARCH_SUMMARY.md) **Verification Spec:** See VERIFICATION_SPEC.md (generated by autocode-verification-planner) **Behavior Delta:** Archived to [specs/behavior/deployment.md](./behavior/deployment.md) **Plan Status:** - [x] Draft - [x] Approved for Implementation - [x] Implementation Complete - [x] Verification Passed --- ## Core Intent (Immutable) > **DO NOT MODIFY THIS SECTION DURING REFINEMENT** > Changes to Core Intent mean you're describing a different feature. > If refinement reveals the need to change this section, create a new feature instead. **User Problem:** Judges can: read the blog, visit the HF Space, run the training notebook, and reproduce results. Someone outside the team can understand, use, and build on SQLEnv. **Success Criteria:** - Blog tells a compelling story even if training results are modest - HF Space just works -- connect, reset, play an episode - Training notebook runs end-to-end on Colab with one click **Avoid:** - Docker build fails on HF Spaces (free tier CPU) - Blog is all technical with no narrative hook - Notebook has undocumented setup steps **Out of Scope:** - Full blog post writing (outline + key sections only, manual polish later) - Paid HF Spaces tier or GPU resources - Training the agent (that is F006) - Video recording of demo (manual task) --- ## 0. Slicing & Scope Budget (Anti-Waterfall) This spec must be executable in **small, mergeable increments**. ### Scope Budget - Target: **3 slices** - Hard max: **<= 10 steps total** - Each step must end in: **implement -> verify -> merge** ### Slice Definition A slice is a vertical increment that delivers user-visible value or a safe internal capability. **Each slice must have:** - Clear outcome - Minimal interface change - Merge criteria **Note:** Verification criteria are defined in VERIFICATION_SPEC.md (separate agent). ## Status Icons **Step Status:** - !! Not Started - :: In Progress - OK Completed - XX Blocked/Failed **Result Outcome:** - OK Fully Successful (all tests passed, no issues) - ~~ Completed with Issues (needs follow-up) - XX Failed/Blocked --- ## 1. Implementation Overview ### Summary Prepare the complete competition submission package: (1) harden the Dockerfile for HF Spaces free-tier deployment with bundled Spider databases, (2) overhaul README.md to be a polished project showcase, (3) create a blog post outline with key narrative sections, and (4) create a Colab-ready training notebook stub that references F006 outputs. This is the terminal feature -- it depends on F001-F006 being complete. ### Scope **In Scope:** - Dockerfile hardening for HF Spaces (bundle Spider DBs, CPU-only, health check) - `openenv.yaml` validation for HF Hub compatibility - README.md overhaul (architecture diagram, setup, usage, links) - Blog post outline (`docs/blog-outline.md`) - Training notebook stub (`notebooks/train_grpo.ipynb`) - `.dockerignore` for clean builds **Out of Scope:** - Full blog prose (outline only) - Agent training (F006) - Reward/verifier logic (F003/F004) - Video demo recording - Paid HF Spaces configuration --- ## 1a. Execution Status **Progress:** 7/7 steps complete **Current Step:** Finalization Protocol (OK Completed) **Last Updated:** 2026-03-29T07:29:32Z **Latest Result:** OK Final verification gate passed. Authenticated deployment evidence is now complete: `uv run openenv build -t openenv-sql-env-f007-hf-submission` succeeded, `uv run openenv push` completed successfully to `https://huggingface.co/spaces/hjerpe/sql_env`, and regression verification remained green (`uv run --with pytest pytest tests/ -v`: 250 passed, 1 skipped). `uv run openenv validate --verbose` still reports non-Docker entrypoint warnings, but Docker mode is supported and remains the scoped deployment path for F007. **Blockers:** None. --- ## 1b. Risk Assessment **Risk Tier:** Low **Risk Tier Definitions:** - **Low:** Pure logic, non-user-facing, no security implications - **Medium:** User input handling, data validation, API changes - **High:** Authentication, payments, secrets management, untrusted input **High-Risk Indicators Present:** None **Security Review Required:** No **Justification:** This feature creates documentation, configuration files, and a notebook. No authentication, secrets, or untrusted input handling. The Dockerfile bundles existing data and runs an existing server. --- ## 2. Change Manifest ### Files to Create | File | Purpose | |------|---------| | `notebooks/train_grpo.ipynb` | Colab-ready training notebook stub | | `docs/blog-outline.md` | HF blog post outline with narrative structure | | `.dockerignore` | Exclude dev artifacts from Docker build | ### Files to Modify | File | Changes | |------|---------| | `server/Dockerfile` | Bundle Spider DBs, optimize for HF Spaces free tier | | `openenv.yaml` | Validate/update for HF Hub push compatibility | | `README.md` | Full overhaul -- polished project showcase | ### Files to Delete None. --- ## 3. Interface Specifications ### Dockerfile Structure ```dockerfile # server/Dockerfile -- HF Spaces compatible # Key changes from current: # 1. Bundle Spider databases (COPY data/databases/ ...) # 2. Ensure CPU-only (no torch GPU deps) # 3. Expose port 7860 (HF Spaces default) OR 8000 (openenv default) # 4. HEALTHCHECK on /health endpoint # 5. Non-root user for HF Spaces security ``` ### openenv.yaml Schema ```yaml spec_version: 1 name: sql_env type: space runtime: fastapi app: server.app:app port: 8000 ``` No structural changes needed -- validate existing manifest is HF Hub compatible. ### Blog Outline Structure ```markdown # docs/blog-outline.md # Sections: # 1. Hook -- "Teaching AI to think like a data analyst" # 2. Problem -- Static benchmarks vs. interactive exploration # 3. Solution -- SQLEnv architecture overview # 4. How It Works -- Episode flow, reward design # 5. Results -- Learning curves, comparison (placeholder for F006 data) # 6. Technical Deep Dive -- Reward architecture, GRPO training # 7. Try It Yourself -- Links to HF Space, notebook, GitHub ``` ### Training Notebook Structure ```python # notebooks/train_grpo.ipynb # Cells: # 1. Setup -- pip install, clone repo # 2. Configure -- HF Space URL, model selection # 3. Connect -- SQLEnvClient connect + test # 4. Train -- GRPO training loop (references F006 scripts/) # 5. Evaluate -- Run eval episodes, plot results # 6. Results -- Display learning curves ``` ### New Functions No new Python functions. This feature produces configuration and documentation artifacts. --- ## 4. Data Flow ### Primary Flow: HF Spaces Deployment ``` 1. Developer runs `openenv validate` - Input: openenv.yaml, Dockerfile - Action: Validates manifest and Docker build locally - Output: Pass/fail with diagnostics 2. Developer runs `openenv build` - Input: Dockerfile, project files, Spider DBs - Action: Builds Docker image with bundled databases - Output: Docker image (~200MB with DBs) 3. Developer runs `openenv push` - Input: Built Docker image, HF token - Action: Pushes to HuggingFace Spaces - Output: Live HF Space URL ``` ### Alternative Flow: Local Docker Test ``` 1. docker build -t sql-env:latest -f server/Dockerfile . 2. docker run -p 8000:8000 sql-env:latest 3. curl http://localhost:8000/health -> {"status": "healthy"} 4. WebSocket client connects, plays episode ``` --- ## 5. Error Handling ### Error Types | Error | When | Resolution | |-------|------|------------| | Docker build failure | Missing deps or files | Check .dockerignore, verify COPY paths | | DB not found at runtime | DBs not bundled correctly | Verify COPY data/databases/ in Dockerfile | | Port mismatch | HF Spaces expects 7860 | Use PORT env var with fallback | | Memory limit exceeded | Container too large for free tier | Reduce bundled DBs to essential set | ### Error Handling Strategy The Dockerfile should: 1. Use a PORT environment variable with default 8000 (HF Spaces sets PORT=7860) 2. Include a startup check that verifies databases are accessible 3. Keep image size minimal (no dev dependencies, no torch GPU packages) --- ## 6. Slice Plan (What we will ship, in order) ### Slice S1 -- Docker & Deployment **Value:** HF Space can be built and deployed; server runs on free tier **User-visible change:** Yes -- live HF Space **Interfaces introduced/changed:** Dockerfile, .dockerignore, openenv.yaml **Rollback safety:** Additive only, no existing behavior changed ### Slice S2 -- Documentation & README **Value:** GitHub repo is a polished showcase; judges can understand the project **User-visible change:** Yes -- README overhaul, blog outline **Interfaces introduced/changed:** README.md, docs/blog-outline.md **Rollback safety:** Documentation only, fully reversible ### Slice S3 -- Training Notebook **Value:** Judges can reproduce training with one click on Colab **User-visible change:** Yes -- notebook artifact **Interfaces introduced/changed:** notebooks/train_grpo.ipynb **Rollback safety:** New file only, no existing code changed --- ## 7. Implementation Steps > **VERIFICATION NOTE:** Test criteria for each step are defined in VERIFICATION_SPEC.md. > The verification-planner (separate agent) generated independent test criteria. > Run the tests specified there after implementing each step. ### Step 1.1: Dockerfile Hardening for HF Spaces **Slice:** S1 **Goal:** Update Dockerfile to bundle Spider databases, support HF Spaces PORT variable, run as non-root user, and minimize image size. **Files:** - `server/Dockerfile` - modify - Harden for HF Spaces free tier - `.dockerignore` - create - Exclude dev artifacts (tests, docs, .git, __pycache__) **Details:** 1. Add COPY for `data/databases/` into the Docker image (bundle the SQLite files) 2. Add `ENV PORT=8000` with CMD that reads `$PORT` (HF Spaces sets PORT=7860) 3. Add non-root user (`useradd --create-home appuser`) for HF Spaces security requirement 4. Ensure no GPU/CUDA dependencies are installed (CPU-only) 5. Create `.dockerignore` excluding: `.git`, `__pycache__`, `tests/`, `docs/`, `docs_draft/`, `specs/`, `vision/`, `*.md` (except README), `.env` **Interface Changes:** None (Dockerfile is configuration) **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Updated `server/Dockerfile` with `ENV PORT=8000` and runtime `uvicorn` command that respects `${PORT:-8000}` for HF Spaces compatibility. - Added explicit database bundling copy instruction: `COPY --from=builder /app/env/data/databases /app/env/data/databases`. - Added non-root runtime user (`appuser`) and ownership handoff for `/app`. - Created `.dockerignore` to exclude dev/test/docs/spec artifacts and keep only `README.md` among markdown files. **Result:** - OK Fully Successful - Verification command: `uv run --with pytest pytest tests/ -v` - Verification evidence: 250 passed, 1 skipped **Context for Next Step:** - Continue with Step 1.2 by validating database source requirements from `data/questions/db_list.json` and aligning Docker health checks with bundled DB presence. **Status:** OK Completed --- ### Step 1.2: Bundle Spider Databases for Docker **Slice:** S1 **Goal:** Ensure the essential Spider SQLite databases are available for bundling into Docker, and the Dockerfile COPY path is correct. **Files:** - `server/Dockerfile` - modify - Verify COPY paths for data/databases/ - `data/questions/db_list.json` - read - Identify which DBs are required **Details:** 1. Read `data/questions/db_list.json` to identify the required database IDs 2. Ensure the Dockerfile copies `data/databases/` into the image at the correct path 3. Add a Docker HEALTHCHECK that also verifies at least one database file exists 4. The bundled DBs are small SQLite files (~50MB total), well within free tier limits **Interface Changes:** None **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Read `data/questions/db_list.json` and confirmed required bundled DB IDs: `student_assessment`, `concert_singer`, `world_1`, `car_1`, `employee_hire_evaluation`, `pets_1`, `cre_Doc_Template_Mgt`, `dog_kennels`, `flight_2`, `poker_player`. - Verified Docker bundling path remains correct: `COPY --from=builder /app/env/data/databases /app/env/data/databases`. - Updated Docker `HEALTHCHECK` to enforce both bundled DB presence (`*.sqlite` under `/app/env/data/databases`) and API liveness via `/health` on `${PORT:-8000}`. **Result:** - OK Fully Successful - Verification command: `uv run --with pytest pytest tests/ -v` - Verification evidence: 250 passed, 1 skipped **Context for Next Step:** - Proceed to Step 1.3 by validating `openenv.yaml` shape (`spec_version`, `name`, `type`, `runtime`, `app`, `port`) and running `openenv validate`. **Status:** OK Completed --- ### Step 1.3: Validate openenv.yaml **Slice:** S1 **Goal:** Ensure openenv.yaml is valid for `openenv push` to HuggingFace Spaces. **Files:** - `openenv.yaml` - modify (if needed) - Ensure HF Hub compatibility **Details:** 1. Verify `spec_version`, `name`, `type`, `runtime`, `app`, and `port` fields 2. Confirm `app: server.app:app` matches the actual FastAPI application path inside the Docker container 3. Update `port` if needed (openenv framework may handle PORT mapping) 4. Run `openenv validate` locally to check **Interface Changes:** None **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Validated `openenv.yaml` fields against the required HF Space manifest shape (`spec_version`, `name`, `type`, `runtime`, `app`, `port`) and confirmed no manifest edits were needed. - Ran `uv run openenv validate --verbose`; manifest compatibility checks passed for Docker mode, with non-blocking warnings that `openenv_serve`/`uv_run`/`python_module` modes need a callable `server/app.py main()` entrypoint. - Ran full regression suite via `uv run --with pytest pytest tests/ -v` to ensure no feature regressions while validating deployment configuration. **Result:** - OK Fully Successful - Verification command: `uv run --with pytest pytest tests/ -v` - Verification evidence: 250 passed, 1 skipped **Context for Next Step:** - Proceed to Step 2.1 and overhaul `README.md` into competition-ready narrative + quickstart + architecture flow, using the now-validated `openenv.yaml` values as the source-of-truth deployment metadata. **Status:** OK Completed --- ### Step 2.1: README.md Overhaul **Slice:** S2 **Goal:** Transform README into a polished project showcase suitable for competition judges. **Files:** - `README.md` - modify - Full overhaul **Details:** 1. **Header:** Project name, one-line description, badges (Python version, license) 2. **Elevator Pitch:** 2-3 sentences explaining what SQLEnv does and why it matters (narrative hook: "Teaching AI to think like a data analyst") 3. **Architecture Diagram:** ASCII or Mermaid diagram showing Agent <-> Client <-> Server <-> SQLite flow 4. **Quick Start:** Streamlined setup (3 commands max to get running) 5. **How It Works:** Episode flow with action types table (DESCRIBE, SAMPLE, QUERY, ANSWER) 6. **Training:** Link to notebook, brief GRPO explanation 7. **HF Space:** Link to live deployment 8. **Project Structure:** Updated tree reflecting final state 9. **Links:** OpenEnv, Spider, HF Space, blog post 10. Remove "Current Status" section (no longer relevant for submission) 11. Remove cautionary notes about untested Docker paths **Interface Changes:** None **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Rewrote `README.md` into a submission-facing narrative that starts with a clear elevator pitch and removes stale cautionary/status language. - Added a compact architecture diagram and refreshed "How It Works" with explicit action semantics (`DESCRIBE`, `SAMPLE`, `QUERY`, `ANSWER`) and episode flow. - Replaced setup sprawl with a 3-command quickstart, plus explicit local server and Docker launch commands. - Added sections for training artifacts, HuggingFace Space deployment path, project structure, deployment checklist, and canonical resource links. **Result:** - OK Fully Successful - Verification command: `uv run --with pytest pytest tests/ -v` - Verification evidence: 250 passed, 1 skipped **Context for Next Step:** - Proceed to Step 2.2 by creating `docs/blog-outline.md` with hook/problem/solution/how-it-works/results placeholder/technical highlights/try-it sections and 2-4 bullets per section. **Status:** OK Completed --- ### Step 2.2: Blog Post Outline **Slice:** S2 **Goal:** Create a structured blog post outline with key narrative sections for the HF blog submission. **Files:** - `docs/blog-outline.md` - create - Blog post outline **Details:** 1. **Hook:** "What if we taught AI to explore databases the way a data analyst does -- not memorize answers, but learn to ask the right questions?" 2. **The Problem:** Static text-to-SQL benchmarks reward memorization, not reasoning. One-shot generation fails on novel schemas. 3. **Our Approach:** SQLEnv -- an RL environment where agents learn through iterative exploration (DESCRIBE, SAMPLE, QUERY, ANSWER) 4. **How SQLEnv Works:** Episode flow diagram, reward design (execution + correctness + efficiency) 5. **Training with GRPO:** Brief explanation of Group Relative Policy Optimization, why it fits 6. **Results:** [PLACEHOLDER for F006 data] Learning curves, comparison with baselines 7. **Technical Highlights:** Multi-DB support, token-level reward shaping, OpenEnv compatibility 8. **Try It Yourself:** Links to HF Space, Colab notebook, GitHub repo 9. **What We Learned:** Key insights from building the environment Each section should have 2-4 bullet points of key content to include when writing the full post. **Interface Changes:** None **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Created `docs/blog-outline.md` with a complete submission-ready structure covering hook, benchmark problem framing, SQLEnv approach, episode/reward flow, GRPO training context, results placeholder, technical highlights, try-it links section, and lessons learned. - Ensured each section has 2-4 concrete bullets and expanded prose sufficient for a substantive draft handoff. - Kept the only explicit placeholder in the Results section for F006 metric insertion, aligned with scope. **Result:** - OK Fully Successful - Verification command: `uv run --with pytest pytest tests/ -v` - Verification evidence: 250 passed, 1 skipped **Context for Next Step:** - Proceed to Step 3.1 by creating `notebooks/train_grpo.ipynb` with Colab-compatible metadata and ordered cells for setup, configuration, connect/test episode, training loop, evaluation, and plotting. **Status:** OK Completed --- ### Step 3.1: Training Notebook Stub **Slice:** S3 **Goal:** Create a Colab-ready Jupyter notebook that demonstrates end-to-end training with SQLEnv. **Files:** - `notebooks/train_grpo.ipynb` - create - Colab training notebook **Details:** Create a Jupyter notebook with these cells: 1. **Title + Description** (markdown): "Training a SQL Agent with GRPO + SQLEnv" 2. **Setup** (code): `!pip install sql-env[train]` or `!pip install -r requirements.txt`, clone repo if needed 3. **Configuration** (code): Set HF Space URL (or local server), model name, hyperparameters 4. **Connect & Test** (code): Create `SQLEnvClient`, connect, run a test episode (reset + 2 steps) 5. **Training Loop** (code): GRPO training referencing F006 scripts (import from scripts/ or inline simplified version) 6. **Evaluation** (code): Run eval episodes on held-out questions, compute metrics 7. **Plot Results** (code): matplotlib learning curves (reward over episodes) 8. **Next Steps** (markdown): Links to full training script, HF Space, blog post Each code cell should have markdown cells above explaining what it does and why. Include `# TODO: update after F006` comments where training-specific code depends on F006 outputs. **Interface Changes:** None **Verification:** > See VERIFICATION_SPEC.md for test criteria defined by independent verification planner. **Risk Tier for This Step:** Low **Merge Criteria:** - [x] Tests from VERIFICATION_SPEC.md pass - [x] No TODOs left in changed code (or explicitly tracked) - [x] Backwards compatible (or flag/migration documented) **Changes Made:** - Replaced `notebooks/train_grpo.ipynb` with a clean, Colab-compatible training stub organized as: title/description, setup, configuration, connect smoke test, GRPO training loop, held-out evaluation, plotting, and next steps. - Added explicit `SQLEnvClient` connectivity example and retained F006 training hooks (`GRPOConfig`, `load_model_and_tokenizer`, `build_trainer`, `run_training_with_metrics`, and `sample_random_baseline`) so notebook smoke tests continue to validate expected flow. - Cleared all notebook cell outputs and removed hardcoded local absolute paths to keep the artifact reproducible for judges and portable to Colab/local runs. **Result:** - OK Fully Successful - Verification commands: - `uv run --with pytest pytest tests/e2e/test_training_e2e.py -v` - `uv run --with pytest pytest tests/ -v` - Verification evidence: - Targeted notebook E2E: 5 passed - Full regression suite: 250 passed, 1 skipped **Context for Next Step:** - Implementation steps are complete for F007; proceed to finalization protocol (verification gate + verifier/compound-engineer/archive-spec + Plan Status/PR Contract/FEATURES sync). **Status:** OK Completed --- ## 8. Rollout Considerations ### Feature Flags - Required: No - This is a one-time deployment, not a progressive rollout ### Migration - Data migration needed: No - Spider databases are bundled fresh in Docker build ### Rollback Plan HF Spaces can be deleted/recreated. README and docs changes are pure git reverts. No data migration or state to worry about. --- ## 9. Execution Tracking All execution state is tracked within this document: - **Section 1a:** Overall progress summary - **Section 7:** Per-step completion details, test results, and handoff context - **FEATURES.json:** Feature-level status/progress metadata used by `/autocode-next-step` and `opencode-ctx ralph run` - **Git history:** Full audit trail of changes to this file The implementing agent updates this document after each step and keeps the matching `FEATURES.json` entry in sync during implementation/finalization. Humans can monitor progress by: - Checking Section 1a for summary - Reviewing Section 7 for detailed step status - Inspecting the feature's `progress` and `status` fields in `FEATURES.json` - Running `git log --oneline IMPLEMENTATION_SPEC.md` for change history --- ## 9a. Slice Completion Protocol After all steps in a slice pass verification: 1. **Run verifier subagent** for spec compliance - Validates against VERIFICATION_SPEC.md criteria - Ensures no TODOs or incomplete work in slice 2. **Run compound-engineer subagent** to extract learnings - **Mandatory invocation** after every slice completion - Updates CLAUDE.md Learnings section (if durable patterns found) - May exit with "no update needed" (valid for routine work) 3. **Commit** the slice changes - Follow commit message format in CLAUDE.md - Each slice gets its own atomic commit 4. **Continue to next slice** (if more slices remain) - Or proceed to final verification if all slices complete **Note:** PR creation happens only after ALL slices are complete. Use `/commit-push-pr` manually when ready. --- ## 10. User Value Summary **Status:** Generated ### What Users Can Now Do Judges and external developers can now consume a full submission package: deploy and run SQLEnv in HF Spaces with bundled databases, follow a polished README quickstart, use a structured blog outline for narrative submission, and run a Colab-ready GRPO notebook workflow end-to-end. ### How to Access/Test - README quickstart: Follow commands in `README.md` - Blog outline: Open `docs/blog-outline.md` - Notebook: Open `notebooks/train_grpo.ipynb` in Colab - Deployment assets: `server/Dockerfile`, `.dockerignore`, and `openenv.yaml` ### Demo - **Command:** `uv run --with pytest pytest tests/ -v` - **Health Check (after deploy):** `curl https:///health` - **Notebook:** `notebooks/train_grpo.ipynb` ### Release Notes Snippet Completed submission-ready packaging for SQLEnv with HF Spaces-compatible Docker deployment, polished repository docs, blog narrative outline, and a Colab-ready GRPO training notebook. --- ## 11. PR Contract (Auto-Generated by autocode-next-step) **Status:** Generated ### PR Title feat(submission): finalize F007 huggingface deployment package ### PR Summary - Finalize HF Spaces submission artifacts: hardened Docker packaging, deployment-ready manifest, polished README, blog outline, and Colab-ready training notebook. - Complete final verification gate with full regression evidence and archive behavior deltas into the deployment behavior spec. - Sync F007 completion metadata in `specs/FEATURES.json` and extract durable learnings for future delivery cycles. ### Verification - `uv run --with pytest pytest tests/ -v` ### Follow-up None. --- ## Stop Conditions (When to Split This Spec) Stop and create a new IMPLEMENTATION_SPEC if: - A step requires touching more than **3 files** in unrelated areas - You need to introduce **multiple new abstractions** "just in case" - Verification cannot be made targeted and concrete - You discover new unknowns that change the plan materially - The next slice cannot be merged safely without finishing later slices When splitting, ensure the current slice ends in a merged, stable state. --- ## Human Checkpoint **Before handing to AI agent:** - [ ] Interface specifications are complete - [ ] Data flow is accurate - [ ] Error handling is specified - [ ] Implementation order makes sense - [ ] VERIFICATION_SPEC.md has been generated **Questions:** 1. Confirm Spider database list for bundling (from `data/questions/db_list.json`) 2. Confirm HF Space repository name for `openenv push` --- ## Handoff Notes **For the implementing AI agent:** ``` Context: See RESEARCH_SUMMARY.md for system understanding Spec: Follow this document exactly Verification: Use tests from VERIFICATION_SPEC.md (independent agent) Ambiguity: Stop and ask rather than assume Order: Follow implementation order exactly Dependencies: This feature assumes F001-F006 are complete ``` --- *Specification completed: 2026-03-27* *Approved by: --* *Verification spec: VERIFICATION_SPEC.md* *Verification input: [F007-VERIFICATION_INPUT.json](./F007-VERIFICATION_INPUT.json)* *Target agent: Claude Code* ## User Clarifications ### 2026-03-28 21:40:54 **Question:** External deployment verification is blocked by GHCR access/auth failure (403 pulling base image), so verifier gate cannot approve final completion yet. **Response:** Clearly state in demo and verification what the user needs to adjust ### 2026-03-28 22:02:53 **Question:** External credential/access dependency remains: need authenticated GHCR pull and HF push evidence (build+push attempt) to satisfy final verifier approval. **Response:** Ensure you write what the user should verify and we will manually validate ### 2026-03-28 22:55:03 **Question:** Missing external authenticated deployment evidence (GHCR-authenticated build and Hugging Face push output) required by F007 final verification gate. **Response:** I have already authenticated you should be able to run the commands now