sql_env / docs /exploration /f007-prelaunch-checklist.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

F007 Pre-Launch Checklist (temp, 2026-04-12)

Scope: verify the HF Space deployment is real and usable before the blog post goes live today. Delete this file after launch.


TL;DR β€” what to do in the next ~60 min

# Action Time Value Do it?
1 Open Space in browser, confirm it loads 2 min Critical β€” judges will click the link first YES
2 Hit /health and /docs 1 min Critical β€” proves server is up YES
3 Run one full episode via /web UI 5 min Critical β€” proves action space works end-to-end YES
4 Fix stale docs/competition-deliverables.md status 3 min High β€” doc claims "Not started", Space is live YES
5 Python client smoke test against live Space 10 min High β€” proves programmatic access (the thing the blog promises) YES
6 Pull registry.hf.space/... Docker image and run locally 10 min Medium β€” nice to have, judges rarely do this If time
7 pip install from Space URL 5 min Medium β€” validates pyproject.toml inside Space If time
8 Concurrency audit (SUPPORTS_CONCURRENT_SESSIONS) 15 min Low for launch, High for anyone retraining Skip today, file issue
9 TRL environment_factory wrapper β€” β€” Already done (see below)

Recommendation: Do 1–5 before publishing. Skip 6–8. Item 9 is already in the repo.


About TRL (already integrated β€” do not re-research)

TRL = Hugging Face's transformers-based RL library. Its GRPOTrainer accepts an environment_factory=MyEnvClass argument and runs the multi-turn tool-calling loop automatically: generate β†’ parse tool call β†’ call your env β†’ feed result back β†’ repeat. No custom rollout_func needed.

We already implement this. training/trl_adapter.py::SQLEnvTRL is a TRL-native environment class with:

  • reset(**kwargs) β€” reads question_text from the dataset column to route to the correct database
  • Named tool methods with docstrings: describe(table_name), sample(table_name), query(sql), answer(value) β€” not a generic step()
  • sql_env_reward_func as the reward function

notebooks/train_grpo.ipynb cell 16 passes it directly:

trainer = build_trainer(
    ...
    reward_funcs=[sql_env_reward_func],
    environment_factory=SQLEnvTRL,
    ...
)

The Setup cell pins trl>=0.29.0 and transformers from main specifically because environment_factory requires transformers β‰₯5.2. Our v1/v2 runs used this path.

One nuance (intentional design): SQLEnvTRL.__init__ instantiates a local in-process SQLEnvironment, not a WebSocket client to https://hjerpe-sql-env.hf.space. Reasons:

  • Training opens N parallel sessions (one per generation). The hosted Space defaults to 1 concurrent session β€” see SUPPORTS_CONCURRENT_SESSIONS in the TRL↔OpenEnv docs.
  • Local is faster, no network hops, no rate limits.
  • The hosted Space is for judges (clicking /web) and external users consuming the env via pip/Docker. Training correctly bypasses it.

Implication for the blog: you can claim "TRL-native integration via environment_factory" factually. It's already true and the notebook proves it.

What's still on the post-launch list is the Space-side concurrency config (item 8), not the adapter. Without SUPPORTS_CONCURRENT_SESSIONS=True on the server, an external user trying to retrain against the hosted Space would hit the 1-session cap. This does not affect our own training (we use local).


1. Browser smoke test (2 min) β€” CRITICAL

How:

open https://huggingface.co/spaces/hjerpe/sql_env

(or paste the URL into a browser manually)

What to check:

  • Space status is Running (green), not Building / Sleeping / Error
  • README renders with a clear one-liner of what the env does
  • No red error banner at the top

What you're validating: that HF Spaces successfully built our image and the container is alive. If it's sleeping, the first visit wakes it (~30s cold start). Warm it up now and leave the tab open so blog readers don't hit a cold Space.

If it's broken: open the "Logs" tab on the Space page β†’ look for the Docker build error β†’ fix locally β†’ re-push with uv run openenv push.


2. Health + API docs (1 min) β€” CRITICAL

How:

curl -sS https://hjerpe-sql-env.hf.space/health
curl -sS https://hjerpe-sql-env.hf.space/docs | head -20   # should be HTML
open https://hjerpe-sql-env.hf.space/docs                  # visual check

What to check:

  • /health returns HTTP 200 and a JSON body (e.g. {"status":"ok"})
  • /docs Swagger page lists /reset, /step, /ws endpoints
  • /step request schema mentions our SQLAction fields (action_type, argument) with values DESCRIBE, SAMPLE, QUERY, ANSWER

What you're validating: the FastAPI server inside the container is up and the OpenAPI schema published to the Space matches our local SQLAction model. If schemas drift, clients break.

If it's broken: usually means the Dockerfile picked up a stale version of sql_env/models.py. Rebuild and push: uv run openenv build -t … then uv run openenv push.


3. One full episode via the built-in web UI (5 min) β€” CRITICAL

How:

open https://hjerpe-sql-env.hf.space/web

OpenEnv ships a /web interactive UI on every env. Walk through one full episode:

  1. Click Reset β€” a schema hint + question prompt should appear
  2. Enter action DESCRIBE with argument = a table name from the reset output
  3. Enter action SAMPLE with a table name (confirm 5 sample rows come back)
  4. Enter action QUERY with a valid SELECT ... (confirm rows return)
  5. Enter action ANSWER with your final answer (confirm reward + done=true)

What to check:

  • Each step returns a new observation without error
  • Terminal ANSWER produces a reward (even 0.0 is fine β€” we're testing plumbing, not correctness)
  • Screenshot the final screen β€” free blog content

What you're validating: the end-to-end action space a judge will exercise. This is our judges' happiest path.

If any step errors, do not publish the blog until it's fixed.


4. Fix stale deliverables doc (3 min) β€” HIGH

docs/competition-deliverables.md line 30 says:

Status: Not started (no Dockerfile yet)

This is wrong. F007 demo (specs/F007-DEMO.md) shows a successful authenticated push to https://huggingface.co/spaces/hjerpe/sql_env on 2026-03-29. Update to:

Status: Live at https://huggingface.co/spaces/hjerpe/sql_env β€” manual episode flow verified 2026-04-12.

Also update the open items list at the bottom β€” "Deploy HuggingFace Space" should be checked off.


5. Python client smoke test (10 min) β€” HIGH

How:

First find the actual client class name and action constructor args β€” our client module may not match the generic OpenEnv template:

rg -n "class SQLEnv\b|base_url" -g '!**/tests/**' -g '!**/docs/**' .
rg -n "class SQLAction|action_type|argument" sql_env/models.py

Then create a throwaway script scratch_hf_smoke.py in the repo root:

from sql_env.client import SQLEnv          # adjust after grep above
from sql_env.models import SQLAction

URL = "https://hjerpe-sql-env.hf.space"

with SQLEnv(base_url=URL).sync() as env:
    r = env.reset()
    print("RESET:", r.observation)

    # Pick any table name from the schema hint in r.observation
    r = env.step(SQLAction(action_type="DESCRIBE", argument="<table>"))
    print("DESCRIBE:", r.observation)

    r = env.step(SQLAction(action_type="QUERY", argument="SELECT 1"))
    print("QUERY:", r.observation)

    r = env.step(SQLAction(action_type="ANSWER", argument="<answer>"))
    print("ANSWER reward=", r.reward, "done=", r.done)

Run:

uv run python scratch_hf_smoke.py

What to check:

  • No connection / WebSocket handshake errors
  • r.observation is a populated dict/string at each step
  • Final step: r.reward is a float and r.done == True
  • Delete scratch_hf_smoke.py after β€” do not commit it

What you're validating: that a blog reader copy-pasting our snippet against the live Space actually gets a working client. If this fails and the /web UI (step 3) works, the problem is likely a client-side model drift β€” check that our shipped sql_env/models.py matches what the server inside the Space expects.


6. Docker image pull (10 min) β€” IF TIME

This is the pattern every OpenEnv env on the hub ships. It's how external users run our env locally for training (no rate limits, full concurrency).

How β€” option A: pull the pre-built image from HF registry

docker pull --platform linux/amd64 registry.hf.space/hjerpe-sql_env:latest
docker run -d --name sqlenv-smoke -p 8001:8000 --platform linux/amd64 \
  registry.hf.space/hjerpe-sql_env:latest

# Wait ~5s for uvicorn to boot
sleep 5
curl -sS http://0.0.0.0:8001/health
open http://0.0.0.0:8001/docs

# Clean up
docker stop sqlenv-smoke && docker rm sqlenv-smoke

How β€” option B: rebuild locally from our repo (same image the Space runs)

uv run openenv validate --verbose          # dry-run config check
uv run openenv build -t openenv-sql-env:local
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local

What to check:

  • Image pulls without auth (Space is public)
  • Container starts, /health returns 200
  • /docs renders Swagger on localhost
  • No --platform warnings on Apple Silicon (the Space is linux/amd64, which runs under Rosetta on M-series Macs β€” slow but functional)

What you're validating: the reproducibility story. A broken image here means the blog's "clone and train" path is dead. Judges rarely click this, but any serious user will.


7. pip install from Space (5 min) β€” IF TIME

How:

First check the package name declared inside the pushed Space:

curl -sS https://huggingface.co/spaces/hjerpe/sql_env/raw/main/pyproject.toml \
  | grep -E '^name'

Then install it into a throwaway venv:

uv venv /tmp/sqlenv-pip-test
source /tmp/sqlenv-pip-test/bin/activate

# Replace "openenv-sql-env" with whatever name the pyproject.toml above shows
pip install "openenv-sql-env @ git+https://huggingface.co/spaces/hjerpe/sql_env"

python -c "from sql_env.client import SQLEnv; print('OK:', SQLEnv)"

deactivate && rm -rf /tmp/sqlenv-pip-test

What to check:

  • pip install resolves without dependency errors
  • The client class imports from the installed wheel

What you're validating: the pyproject.toml we pushed into the Space actually declares the package correctly. This is the install method TRL documents: pip install "<pkg> @ git+https://huggingface.co/spaces/<space>".


8. Concurrency audit β€” POST-LAUNCH

How:

rg -n "SUPPORTS_CONCURRENT_SESSIONS|max_concurrent_envs|create_app\(" sql_env/server/

Expected result today: no matches (the flag is not set), which means the Space defaults to 1 concurrent WebSocket session. Per the OpenEnv↔TRL docs, any training run with num_generations > 1 against the hosted Space will hit capacity errors.

Fix (post-launch): in sql_env/server/app.py (or wherever create_app(...) is called):

SUPPORTS_CONCURRENT_SESSIONS = True

app = create_app(
    create_sql_environment,
    SQLAction,
    SQLObservation,
    max_concurrent_envs=64,   # β‰₯ TRL's generation_batch_size
)

Then uv run openenv build -t ... and uv run openenv push again.

Why it's not a launch blocker: the blog does not ask readers to train against the hosted Space. Our own training uses the in-process SQLEnvironment via SQLEnvTRL (not the WebSocket client), so we never hit this limit. Only matters if an external user wants to run GRPOTrainer against https://hjerpe-sql-env.hf.space directly. File as a GitHub issue after the blog ships.


9. TRL environment_factory wrapper β€” DONE

Already implemented in training/trl_adapter.py::SQLEnvTRL and wired into notebooks/train_grpo.ipynb cell 16. See the TRL section at the top of this document for details. No action.



Appendix A: Republish the Space from scratch (reference)

Only run these if step 1–3 show the Space is broken and a rebuild+push is needed. Otherwise skip β€” the current Space is already live.

Prereqs (one-time):

uv sync                                            # project deps
hf auth login                                      # HuggingFace CLI auth
# (token with write access to hjerpe/sql_env)

Validate + build + push:

# 1. Dry-run config check β€” confirms the openenv manifest, Dockerfile
#    and server entrypoint agree
uv run openenv validate --verbose

# 2. Build the Docker image locally (same image HF Spaces will run)
uv run openenv build -t openenv-sql-env:local

# 3. Optional: smoke-test the local image before pushing
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local

# 4. Push to the Space β€” creates hjerpe/sql_env if it doesn't exist,
#    uploads files, and triggers the Space's own Docker build
uv run openenv push
# expected tail:
#   βœ“ Authenticated as: hjerpe
#   βœ“ Space hjerpe/sql_env is ready
#   βœ“ Upload completed successfully
#   Space URL: https://huggingface.co/spaces/hjerpe/sql_env

After push: the Space rebuilds its own Docker image on HF's infra (takes 2–5 min). Watch the build logs in the browser at https://huggingface.co/spaces/hjerpe/sql_env β†’ "Logs" tab. When it turns green, re-run steps 1–5 at the top of this doc to verify.

Files that must exist for openenv push to work (already in the repo):

  • openenv.yaml β€” manifest with name, version, description
  • sql_env/server/Dockerfile β€” FastAPI + uvicorn container
  • sql_env/server/app.py β€” create_app(...) entrypoint
  • sql_env/models.py β€” SQLAction / SQLObservation Pydantic models
  • pyproject.toml β€” pip-installable package metadata
  • README.md β€” Space landing page (HF renders it on the Space page)

If any of these drifts out of sync, openenv validate --verbose will flag it before you push.


Appendix B: Research finding β€” dangling legacy reward module

Finding: training/rewards.py (151 lines) is legacy dead code from the pre-F010 rollout-based architecture. It is not used by the production training path and can be deleted post-launch.

Evidence:

  • Module docstring (line 1–5): "Reward callables for TRL GRPO training. These helpers consume rollout metadata..." β€” this is the OLD pattern where reward functions parsed kwargs['metadata'] from TRL rollouts instead of reading env.reward from environment instances.
  • Internal helper _extract_metadata_rows() (line 41): "TRL can pass rollout metadata in different shapes depending on wrapper code." β€” explicit confirmation this is replay-based reward parsing.
  • Functions exposed: reward_correctness, reward_progress, reward_operational.
  • Zero production imports. rg 'from.*training\.rewards|training\.rewards\.reward_' returns exactly one hit: tests/unit/test_rewards.py. No script, notebook, or other module in training/ imports it.
  • The real training path uses sql_env_reward_func in training/trl_adapter.py, which reads env.reward directly from SQLEnvTRL instances. This is the environment_factory pattern mandated by F010 and documented as the correct choice (see specs/F010-IMPLEMENTATION_SPEC.md:173 and the user's own memory note: "Use environment_factory or rollout_func, not replay-based reward parsing").
  • Notebook train_grpo.ipynb cell 16: reward_funcs=[sql_env_reward_func] β€” pulls from trl_adapter, not rewards.py.

The only rollout matches in training/ are harmless:

  • training/prompts.py:1 β€” docstring mentions "GRPO training rollouts"
  • training/rewards.py β€” the legacy module itself
  • notebooks/train_grpo.ipynb cell 16 β€” a local variable before_rollouts = sample_random_baseline(...) that has nothing to do with TRL's rollout_func

Recommendation (post-launch, low priority):

  1. Delete training/rewards.py
  2. Delete tests/unit/test_rewards.py
  3. Confirm uv run pytest tests/ -v still passes
  4. Commit with message: refactor: remove legacy rollout-metadata reward module superseded by F010 environment_factory

Why not today: zero risk on launch (nothing imports it in production), and deleting files during blog-publish day is the wrong kind of churn. File as a post-launch cleanup.


Post-launch cleanup

  • Delete this file
  • File issue for item 8 (Space concurrency)
  • Delete training/rewards.py + tests/unit/test_rewards.py (see Appendix B)
  • Update docs/competition-deliverables.md open-items list