sql_env / specs /F007-DEMO.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

Feature Demo: F007 β€” HuggingFace Deployment & Submission

Generated: 2026-03-29T07:33:23Z Context source: spec + discovery only (implementation not read) Feature entry: FEATURES.json #F007


What This Feature Does

F007 packages SQLEnv so a judge can actually consume it end-to-end: discover the project from README, run or visit the deployed Hugging Face Space, and use the training notebook workflow.

From a user perspective, the core value is trust and usability: deployment assets validate/build/push cleanly, and the submission package is runnable by someone outside the team.


What Is Already Proven

Verified in This Demo Run

  • Ran deployment validation locally with uv run openenv validate --verbose.
  • Built deployment image locally with uv run openenv build -t openenv-sql-env-f007-hf-submission.
  • Ran authenticated deployment push with uv run openenv push to https://huggingface.co/spaces/hjerpe/sql_env.
  • Ran notebook/training E2E checks (tests/e2e/test_training_e2e.py): 5 passed.
  • Ran full regression suite: 250 passed, 1 skipped.

Previously Verified Evidence

  • specs/FEATURES.json β†’ verification_evidence for F007: 250/250 tests passed, verifier approved.
  • specs/F007-IMPLEMENTATION_SPEC.md (Section 1a) records authenticated build + push completion evidence.

What Still Needs User Verification

  • Open the live Space in a browser and manually run a reset/step/answer episode flow.
  • Open notebooks/train_grpo.ipynb in Colab and execute cells in order on a clean runtime.

Quickstart / Verification Steps

Run these commands to see the feature in action:

uv run openenv validate --verbose
uv run openenv build -t openenv-sql-env-f007-hf-submission
uv run openenv push

Prereq: authenticated Hugging Face CLI/account with write access to target Space.


Live Local Proof

Validate Deployment Configuration

This confirms deployment mode support and flags non-Docker modes clearly.

uv run openenv validate --verbose
[OK] sql-env-F007-huggingface-deployment-submission: Ready for multi-mode deployment

Supported deployment modes:
  [YES] docker
  [YES] openenv_serve
  [YES] uv_run
  [YES] python_module

What to notice: All four deployment modes are supported.

Build the Hugging Face Deployment Image

uv run openenv build -t openenv-sql-env-f007-hf-submission
Building Docker image for: sql-env-F007-huggingface-deployment-submission
...
#18 naming to docker.io/library/openenv-sql-env-f007-hf-submission done
βœ“ Docker build successful

Done!

What to notice: image build completed successfully with the expected tag.

Push to Hugging Face Space

uv run openenv push
βœ“ Authenticated as: hjerpe
Creating/verifying space: hjerpe/sql_env
βœ“ Space hjerpe/sql_env is ready
Uploading files to hjerpe/sql_env...
βœ“ Upload completed successfully
Space URL: https://huggingface.co/spaces/hjerpe/sql_env

βœ“ Deployment complete!
Visit your space at: https://huggingface.co/spaces/hjerpe/sql_env

What to notice: authenticated push succeeded and produced a live Space URL.


Existing Evidence

  • Verification spec target command (uv run --with pytest pytest tests/ -v) was re-run in this demo and passed.
  • F007 entry in specs/FEATURES.json already recorded verifier approval before this refresh.

Manual Verification Checklist

  1. Open https://huggingface.co/spaces/hjerpe/sql_env.
  2. Confirm the app loads without startup errors.
  3. Start an episode (reset), then run at least one exploration step.
  4. Submit an answer action and confirm terminal response/reward appears.
  5. Open notebooks/train_grpo.ipynb in Colab and run setup + connect + one training/eval pass.

Edge Cases Exercised

All deployment modes pass validation

uv run openenv validate --verbose
Supported deployment modes:
  [YES] docker
  [YES] openenv_serve
  [YES] uv_run
  [YES] python_module

This matters because all four modes pass cleanly β€” no warnings or caveats for the submission reviewer.

Verification-spec command drift (error case)

uv run --with pytest pytest tests/e2e/test_readme_completeness.py -v
ERROR: file or directory not found: tests/e2e/test_readme_completeness.py
collected 0 items
============================ no tests ran in 0.00s ============================

This matters because it reveals a spec-to-repo mismatch that should be corrected in verification artifacts.

Notebook pipeline smoke validation still passes

uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
collected 5 items
...
============================== 5 passed in 11.33s ==============================

This confirms the training notebook path still has executable smoke coverage.


Test Evidence (Optional)

Supplementary proof that the feature works correctly across all scenarios.

Test Suite Tests Status
Full regression (uv run --with pytest pytest tests/ -v) 251 collected 250 passed, 1 skipped
Training E2E (tests/e2e/test_training_e2e.py) 5 All passed

Feature Links

  • Implementation spec: specs/F007-IMPLEMENTATION_SPEC.md
  • Verification spec: specs/F007-VERIFICATION_SPEC.md

Demo generated by feature-demo agent. Re-run with /feature-demo F007 to refresh.