Spaces:

hjerpe
/

sql_env

Running

App Files Files Community

sql_env / docs /exploration /f007-prelaunch-checklist.md

hjerpe

Upload folder using huggingface_hub

9e64e71 verified 8 days ago

preview code

raw

history blame contribute delete

17.2 kB

F007 Pre-Launch Checklist (temp, 2026-04-12)

Scope: verify the HF Space deployment is real and usable before the blog post goes live today. Delete this file after launch.

TL;DR — what to do in the next ~60 min

#	Action	Time	Value	Do it?
1	Open Space in browser, confirm it loads	2 min	Critical — judges will click the link first	YES
2	Hit `/health` and `/docs`	1 min	Critical — proves server is up	YES
3	Run one full episode via `/web` UI	5 min	Critical — proves action space works end-to-end	YES
4	Fix stale `docs/competition-deliverables.md` status	3 min	High — doc claims "Not started", Space is live	YES
5	Python client smoke test against live Space	10 min	High — proves programmatic access (the thing the blog promises)	YES
6	Pull `registry.hf.space/...` Docker image and run locally	10 min	Medium — nice to have, judges rarely do this	If time
7	`pip install` from Space URL	5 min	Medium — validates `pyproject.toml` inside Space	If time
8	Concurrency audit (`SUPPORTS_CONCURRENT_SESSIONS`)	15 min	Low for launch, High for anyone retraining	Skip today, file issue
9	TRL `environment_factory` wrapper	—	—	Already done (see below)

Recommendation: Do 1–5 before publishing. Skip 6–8. Item 9 is already in the repo.

About TRL (already integrated — do not re-research)

TRL = Hugging Face's transformers-based RL library. Its GRPOTrainer accepts an environment_factory=MyEnvClass argument and runs the multi-turn tool-calling loop automatically: generate → parse tool call → call your env → feed result back → repeat. No custom rollout_func needed.

We already implement this. training/trl_adapter.py::SQLEnvTRL is a TRL-native environment class with:

reset(**kwargs) — reads question_text from the dataset column to route to the correct database
Named tool methods with docstrings: describe(table_name), sample(table_name), query(sql), answer(value) — not a generic step()
sql_env_reward_func as the reward function

notebooks/train_grpo.ipynb cell 16 passes it directly:

trainer = build_trainer(
    ...
    reward_funcs=[sql_env_reward_func],
    environment_factory=SQLEnvTRL,
    ...
)

The Setup cell pins trl>=0.29.0 and transformers from main specifically because environment_factory requires transformers ≥5.2. Our v1/v2 runs used this path.

One nuance (intentional design): SQLEnvTRL.__init__ instantiates a local in-process SQLEnvironment, not a WebSocket client to https://hjerpe-sql-env.hf.space. Reasons:

Training opens N parallel sessions (one per generation). The hosted Space defaults to 1 concurrent session — see SUPPORTS_CONCURRENT_SESSIONS in the TRL↔OpenEnv docs.
Local is faster, no network hops, no rate limits.
The hosted Space is for judges (clicking /web) and external users consuming the env via pip/Docker. Training correctly bypasses it.

Implication for the blog: you can claim "TRL-native integration via environment_factory" factually. It's already true and the notebook proves it.

What's still on the post-launch list is the Space-side concurrency config (item 8), not the adapter. Without SUPPORTS_CONCURRENT_SESSIONS=True on the server, an external user trying to retrain against the hosted Space would hit the 1-session cap. This does not affect our own training (we use local).

1. Browser smoke test (2 min) — CRITICAL

How:

open https://huggingface.co/spaces/hjerpe/sql_env

(or paste the URL into a browser manually)

What to check:

Space status is Running (green), not Building / Sleeping / Error
README renders with a clear one-liner of what the env does
No red error banner at the top

What you're validating: that HF Spaces successfully built our image and the container is alive. If it's sleeping, the first visit wakes it (~30s cold start). Warm it up now and leave the tab open so blog readers don't hit a cold Space.

If it's broken: open the "Logs" tab on the Space page → look for the Docker build error → fix locally → re-push with uv run openenv push.

2. Health + API docs (1 min) — CRITICAL

How:

curl -sS https://hjerpe-sql-env.hf.space/health
curl -sS https://hjerpe-sql-env.hf.space/docs | head -20   # should be HTML
open https://hjerpe-sql-env.hf.space/docs                  # visual check

What to check:

/health returns HTTP 200 and a JSON body (e.g. {"status":"ok"})
/docs Swagger page lists /reset, /step, /ws endpoints
/step request schema mentions our SQLAction fields (action_type, argument) with values DESCRIBE, SAMPLE, QUERY, ANSWER

What you're validating: the FastAPI server inside the container is up and the OpenAPI schema published to the Space matches our local SQLAction model. If schemas drift, clients break.

If it's broken: usually means the Dockerfile picked up a stale version of sql_env/models.py. Rebuild and push: uv run openenv build -t … then uv run openenv push.

3. One full episode via the built-in web UI (5 min) — CRITICAL

How:

open https://hjerpe-sql-env.hf.space/web

OpenEnv ships a /web interactive UI on every env. Walk through one full episode:

Click Reset — a schema hint + question prompt should appear
Enter action DESCRIBE with argument = a table name from the reset output
Enter action SAMPLE with a table name (confirm 5 sample rows come back)
Enter action QUERY with a valid SELECT ... (confirm rows return)
Enter action ANSWER with your final answer (confirm reward + done=true)

What to check:

Each step returns a new observation without error
Terminal ANSWER produces a reward (even 0.0 is fine — we're testing plumbing, not correctness)
Screenshot the final screen — free blog content

What you're validating: the end-to-end action space a judge will exercise. This is our judges' happiest path.

If any step errors, do not publish the blog until it's fixed.

4. Fix stale deliverables doc (3 min) — HIGH

docs/competition-deliverables.md line 30 says:

Status: Not started (no Dockerfile yet)

This is wrong. F007 demo (specs/F007-DEMO.md) shows a successful authenticated push to https://huggingface.co/spaces/hjerpe/sql_env on 2026-03-29. Update to:

Status: Live at https://huggingface.co/spaces/hjerpe/sql_env — manual episode flow verified 2026-04-12.

Also update the open items list at the bottom — "Deploy HuggingFace Space" should be checked off.

5. Python client smoke test (10 min) — HIGH

How:

First find the actual client class name and action constructor args — our client module may not match the generic OpenEnv template:

rg -n "class SQLEnv\b|base_url" -g '!**/tests/**' -g '!**/docs/**' .
rg -n "class SQLAction|action_type|argument" sql_env/models.py

Then create a throwaway script scratch_hf_smoke.py in the repo root:

from sql_env.client import SQLEnv          # adjust after grep above
from sql_env.models import SQLAction

URL = "https://hjerpe-sql-env.hf.space"

with SQLEnv(base_url=URL).sync() as env:
    r = env.reset()
    print("RESET:", r.observation)

    # Pick any table name from the schema hint in r.observation
    r = env.step(SQLAction(action_type="DESCRIBE", argument="<table>"))
    print("DESCRIBE:", r.observation)

    r = env.step(SQLAction(action_type="QUERY", argument="SELECT 1"))
    print("QUERY:", r.observation)

    r = env.step(SQLAction(action_type="ANSWER", argument="<answer>"))
    print("ANSWER reward=", r.reward, "done=", r.done)

Run:

uv run python scratch_hf_smoke.py

What to check:

No connection / WebSocket handshake errors
r.observation is a populated dict/string at each step
Final step: r.reward is a float and r.done == True
Delete scratch_hf_smoke.py after — do not commit it

What you're validating: that a blog reader copy-pasting our snippet against the live Space actually gets a working client. If this fails and the /web UI (step 3) works, the problem is likely a client-side model drift — check that our shipped sql_env/models.py matches what the server inside the Space expects.

6. Docker image pull (10 min) — IF TIME

This is the pattern every OpenEnv env on the hub ships. It's how external users run our env locally for training (no rate limits, full concurrency).

How — option A: pull the pre-built image from HF registry

docker pull --platform linux/amd64 registry.hf.space/hjerpe-sql_env:latest
docker run -d --name sqlenv-smoke -p 8001:8000 --platform linux/amd64 \
  registry.hf.space/hjerpe-sql_env:latest

# Wait ~5s for uvicorn to boot
sleep 5
curl -sS http://0.0.0.0:8001/health
open http://0.0.0.0:8001/docs

# Clean up
docker stop sqlenv-smoke && docker rm sqlenv-smoke

How — option B: rebuild locally from our repo (same image the Space runs)

uv run openenv validate --verbose          # dry-run config check
uv run openenv build -t openenv-sql-env:local
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local

What to check:

Image pulls without auth (Space is public)
Container starts, /health returns 200
/docs renders Swagger on localhost
No --platform warnings on Apple Silicon (the Space is linux/amd64, which runs under Rosetta on M-series Macs — slow but functional)

What you're validating: the reproducibility story. A broken image here means the blog's "clone and train" path is dead. Judges rarely click this, but any serious user will.

7. pip install from Space (5 min) — IF TIME

How:

First check the package name declared inside the pushed Space:

curl -sS https://huggingface.co/spaces/hjerpe/sql_env/raw/main/pyproject.toml \
  | grep -E '^name'

Then install it into a throwaway venv:

uv venv /tmp/sqlenv-pip-test
source /tmp/sqlenv-pip-test/bin/activate

# Replace "openenv-sql-env" with whatever name the pyproject.toml above shows
pip install "openenv-sql-env @ git+https://huggingface.co/spaces/hjerpe/sql_env"

python -c "from sql_env.client import SQLEnv; print('OK:', SQLEnv)"

deactivate && rm -rf /tmp/sqlenv-pip-test

What to check:

pip install resolves without dependency errors
The client class imports from the installed wheel

What you're validating: the pyproject.toml we pushed into the Space actually declares the package correctly. This is the install method TRL documents: pip install "<pkg> @ git+https://huggingface.co/spaces/<space>".

8. Concurrency audit — POST-LAUNCH

How:

rg -n "SUPPORTS_CONCURRENT_SESSIONS|max_concurrent_envs|create_app\(" sql_env/server/

Expected result today: no matches (the flag is not set), which means the Space defaults to 1 concurrent WebSocket session. Per the OpenEnv↔TRL docs, any training run with num_generations > 1 against the hosted Space will hit capacity errors.

Fix (post-launch): in sql_env/server/app.py (or wherever create_app(...) is called):

SUPPORTS_CONCURRENT_SESSIONS = True

app = create_app(
    create_sql_environment,
    SQLAction,
    SQLObservation,
    max_concurrent_envs=64,   # ≥ TRL's generation_batch_size
)

Then uv run openenv build -t ... and uv run openenv push again.

Why it's not a launch blocker: the blog does not ask readers to train against the hosted Space. Our own training uses the in-process SQLEnvironment via SQLEnvTRL (not the WebSocket client), so we never hit this limit. Only matters if an external user wants to run GRPOTrainer against https://hjerpe-sql-env.hf.space directly. File as a GitHub issue after the blog ships.

9. TRL `environment_factory` wrapper — DONE

Already implemented in training/trl_adapter.py::SQLEnvTRL and wired into notebooks/train_grpo.ipynb cell 16. See the TRL section at the top of this document for details. No action.

Appendix A: Republish the Space from scratch (reference)

Only run these if step 1–3 show the Space is broken and a rebuild+push is needed. Otherwise skip — the current Space is already live.

Prereqs (one-time):

uv sync                                            # project deps
hf auth login                                      # HuggingFace CLI auth
# (token with write access to hjerpe/sql_env)

Validate + build + push:

# 1. Dry-run config check — confirms the openenv manifest, Dockerfile
#    and server entrypoint agree
uv run openenv validate --verbose

# 2. Build the Docker image locally (same image HF Spaces will run)
uv run openenv build -t openenv-sql-env:local

# 3. Optional: smoke-test the local image before pushing
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local

# 4. Push to the Space — creates hjerpe/sql_env if it doesn't exist,
#    uploads files, and triggers the Space's own Docker build
uv run openenv push
# expected tail:
#   ✓ Authenticated as: hjerpe
#   ✓ Space hjerpe/sql_env is ready
#   ✓ Upload completed successfully
#   Space URL: https://huggingface.co/spaces/hjerpe/sql_env

After push: the Space rebuilds its own Docker image on HF's infra (takes 2–5 min). Watch the build logs in the browser at https://huggingface.co/spaces/hjerpe/sql_env → "Logs" tab. When it turns green, re-run steps 1–5 at the top of this doc to verify.

Files that must exist for openenv push to work (already in the repo):

openenv.yaml — manifest with name, version, description
sql_env/server/Dockerfile — FastAPI + uvicorn container
sql_env/server/app.py — create_app(...) entrypoint
sql_env/models.py — SQLAction / SQLObservation Pydantic models
pyproject.toml — pip-installable package metadata
README.md — Space landing page (HF renders it on the Space page)

If any of these drifts out of sync, openenv validate --verbose will flag it before you push.

Appendix B: Research finding — dangling legacy reward module

Finding: training/rewards.py (151 lines) is legacy dead code from the pre-F010 rollout-based architecture. It is not used by the production training path and can be deleted post-launch.

Evidence:

Module docstring (line 1–5): "Reward callables for TRL GRPO training. These helpers consume rollout metadata..." — this is the OLD pattern where reward functions parsed kwargs['metadata'] from TRL rollouts instead of reading env.reward from environment instances.
Internal helper _extract_metadata_rows() (line 41): "TRL can pass rollout metadata in different shapes depending on wrapper code." — explicit confirmation this is replay-based reward parsing.
Functions exposed: reward_correctness, reward_progress, reward_operational.
Zero production imports. rg 'from.*training\.rewards|training\.rewards\.reward_' returns exactly one hit: tests/unit/test_rewards.py. No script, notebook, or other module in training/ imports it.
The real training path uses sql_env_reward_func in training/trl_adapter.py, which reads env.reward directly from SQLEnvTRL instances. This is the environment_factory pattern mandated by F010 and documented as the correct choice (see specs/F010-IMPLEMENTATION_SPEC.md:173 and the user's own memory note: "Use environment_factory or rollout_func, not replay-based reward parsing").
Notebook train_grpo.ipynb cell 16: reward_funcs=[sql_env_reward_func] — pulls from trl_adapter, not rewards.py.

The only rollout matches in training/ are harmless:

training/prompts.py:1 — docstring mentions "GRPO training rollouts"
training/rewards.py — the legacy module itself
notebooks/train_grpo.ipynb cell 16 — a local variable before_rollouts = sample_random_baseline(...) that has nothing to do with TRL's rollout_func

Recommendation (post-launch, low priority):

Delete training/rewards.py
Delete tests/unit/test_rewards.py
Confirm uv run pytest tests/ -v still passes
Commit with message: refactor: remove legacy rollout-metadata reward module superseded by F010 environment_factory

Why not today: zero risk on launch (nothing imports it in production), and deleting files during blog-publish day is the wrong kind of churn. File as a post-launch cleanup.

Post-launch cleanup

Delete this file
File issue for item 8 (Space concurrency)
Delete training/rewards.py + tests/unit/test_rewards.py (see Appendix B)
Update docs/competition-deliverables.md open-items list

F007 Pre-Launch Checklist (temp, 2026-04-12)

TL;DR — what to do in the next ~60 min

About TRL (already integrated — do not re-research)

1. Browser smoke test (2 min) — CRITICAL

2. Health + API docs (1 min) — CRITICAL

3. One full episode via the built-in web UI (5 min) — CRITICAL

4. Fix stale deliverables doc (3 min) — HIGH

5. Python client smoke test (10 min) — HIGH

6. Docker image pull (10 min) — IF TIME

7. pip install from Space (5 min) — IF TIME

8. Concurrency audit — POST-LAUNCH

9. TRL environment_factory wrapper — DONE

Appendix A: Republish the Space from scratch (reference)

Appendix B: Research finding — dangling legacy reward module

Post-launch cleanup

9. TRL `environment_factory` wrapper — DONE