F007 Pre-Launch Checklist (temp, 2026-04-12)
Scope: verify the HF Space deployment is real and usable before the blog post goes live today. Delete this file after launch.
TL;DR β what to do in the next ~60 min
| # | Action | Time | Value | Do it? |
|---|---|---|---|---|
| 1 | Open Space in browser, confirm it loads | 2 min | Critical β judges will click the link first | YES |
| 2 | Hit /health and /docs |
1 min | Critical β proves server is up | YES |
| 3 | Run one full episode via /web UI |
5 min | Critical β proves action space works end-to-end | YES |
| 4 | Fix stale docs/competition-deliverables.md status |
3 min | High β doc claims "Not started", Space is live | YES |
| 5 | Python client smoke test against live Space | 10 min | High β proves programmatic access (the thing the blog promises) | YES |
| 6 | Pull registry.hf.space/... Docker image and run locally |
10 min | Medium β nice to have, judges rarely do this | If time |
| 7 | pip install from Space URL |
5 min | Medium β validates pyproject.toml inside Space |
If time |
| 8 | Concurrency audit (SUPPORTS_CONCURRENT_SESSIONS) |
15 min | Low for launch, High for anyone retraining | Skip today, file issue |
| 9 | TRL environment_factory wrapper |
β | β | Already done (see below) |
Recommendation: Do 1β5 before publishing. Skip 6β8. Item 9 is already in the repo.
About TRL (already integrated β do not re-research)
TRL = Hugging Face's transformers-based RL library. Its GRPOTrainer
accepts an environment_factory=MyEnvClass argument and runs the multi-turn
tool-calling loop automatically: generate β parse tool call β call your env β
feed result back β repeat. No custom rollout_func needed.
We already implement this. training/trl_adapter.py::SQLEnvTRL is a
TRL-native environment class with:
reset(**kwargs)β readsquestion_textfrom the dataset column to route to the correct database- Named tool methods with docstrings:
describe(table_name),sample(table_name),query(sql),answer(value)β not a genericstep() sql_env_reward_funcas the reward function
notebooks/train_grpo.ipynb cell 16 passes it directly:
trainer = build_trainer(
...
reward_funcs=[sql_env_reward_func],
environment_factory=SQLEnvTRL,
...
)
The Setup cell pins trl>=0.29.0 and transformers from main specifically
because environment_factory requires transformers β₯5.2. Our v1/v2 runs used
this path.
One nuance (intentional design): SQLEnvTRL.__init__ instantiates a
local in-process SQLEnvironment, not a WebSocket client to
https://hjerpe-sql-env.hf.space. Reasons:
- Training opens N parallel sessions (one per generation). The hosted Space
defaults to 1 concurrent session β see
SUPPORTS_CONCURRENT_SESSIONSin the TRLβOpenEnv docs. - Local is faster, no network hops, no rate limits.
- The hosted Space is for judges (clicking
/web) and external users consuming the env via pip/Docker. Training correctly bypasses it.
Implication for the blog: you can claim "TRL-native integration via
environment_factory" factually. It's already true and the notebook proves it.
What's still on the post-launch list is the Space-side concurrency config
(item 8), not the adapter. Without SUPPORTS_CONCURRENT_SESSIONS=True on the
server, an external user trying to retrain against the hosted Space would hit
the 1-session cap. This does not affect our own training (we use local).
1. Browser smoke test (2 min) β CRITICAL
How:
open https://huggingface.co/spaces/hjerpe/sql_env
(or paste the URL into a browser manually)
What to check:
- Space status is Running (green), not Building / Sleeping / Error
- README renders with a clear one-liner of what the env does
- No red error banner at the top
What you're validating: that HF Spaces successfully built our image and the container is alive. If it's sleeping, the first visit wakes it (~30s cold start). Warm it up now and leave the tab open so blog readers don't hit a cold Space.
If it's broken: open the "Logs" tab on the Space page β look for the
Docker build error β fix locally β re-push with uv run openenv push.
2. Health + API docs (1 min) β CRITICAL
How:
curl -sS https://hjerpe-sql-env.hf.space/health
curl -sS https://hjerpe-sql-env.hf.space/docs | head -20 # should be HTML
open https://hjerpe-sql-env.hf.space/docs # visual check
What to check:
-
/healthreturns HTTP 200 and a JSON body (e.g.{"status":"ok"}) -
/docsSwagger page lists/reset,/step,/wsendpoints -
/steprequest schema mentions ourSQLActionfields (action_type,argument) with valuesDESCRIBE,SAMPLE,QUERY,ANSWER
What you're validating: the FastAPI server inside the container is up
and the OpenAPI schema published to the Space matches our local SQLAction
model. If schemas drift, clients break.
If it's broken: usually means the Dockerfile picked up a stale version
of sql_env/models.py. Rebuild and push: uv run openenv build -t β¦ then
uv run openenv push.
3. One full episode via the built-in web UI (5 min) β CRITICAL
How:
open https://hjerpe-sql-env.hf.space/web
OpenEnv ships a /web interactive UI on every env. Walk through one full
episode:
- Click Reset β a schema hint + question prompt should appear
- Enter action
DESCRIBEwith argument = a table name from the reset output - Enter action
SAMPLEwith a table name (confirm 5 sample rows come back) - Enter action
QUERYwith a validSELECT ...(confirm rows return) - Enter action
ANSWERwith your final answer (confirm reward +done=true)
What to check:
- Each step returns a new observation without error
- Terminal
ANSWERproduces a reward (even 0.0 is fine β we're testing plumbing, not correctness) - Screenshot the final screen β free blog content
What you're validating: the end-to-end action space a judge will exercise. This is our judges' happiest path.
If any step errors, do not publish the blog until it's fixed.
4. Fix stale deliverables doc (3 min) β HIGH
docs/competition-deliverables.md line 30 says:
Status: Not started (no Dockerfile yet)
This is wrong. F007 demo (specs/F007-DEMO.md) shows a successful authenticated
push to https://huggingface.co/spaces/hjerpe/sql_env on 2026-03-29. Update to:
Status: Live at https://huggingface.co/spaces/hjerpe/sql_env β manual episode flow verified 2026-04-12.
Also update the open items list at the bottom β "Deploy HuggingFace Space" should be checked off.
5. Python client smoke test (10 min) β HIGH
How:
First find the actual client class name and action constructor args β our client module may not match the generic OpenEnv template:
rg -n "class SQLEnv\b|base_url" -g '!**/tests/**' -g '!**/docs/**' .
rg -n "class SQLAction|action_type|argument" sql_env/models.py
Then create a throwaway script scratch_hf_smoke.py in the repo root:
from sql_env.client import SQLEnv # adjust after grep above
from sql_env.models import SQLAction
URL = "https://hjerpe-sql-env.hf.space"
with SQLEnv(base_url=URL).sync() as env:
r = env.reset()
print("RESET:", r.observation)
# Pick any table name from the schema hint in r.observation
r = env.step(SQLAction(action_type="DESCRIBE", argument="<table>"))
print("DESCRIBE:", r.observation)
r = env.step(SQLAction(action_type="QUERY", argument="SELECT 1"))
print("QUERY:", r.observation)
r = env.step(SQLAction(action_type="ANSWER", argument="<answer>"))
print("ANSWER reward=", r.reward, "done=", r.done)
Run:
uv run python scratch_hf_smoke.py
What to check:
- No connection / WebSocket handshake errors
-
r.observationis a populated dict/string at each step - Final step:
r.rewardis a float andr.done == True - Delete
scratch_hf_smoke.pyafter β do not commit it
What you're validating: that a blog reader copy-pasting our snippet
against the live Space actually gets a working client. If this fails and
the /web UI (step 3) works, the problem is likely a client-side model
drift β check that our shipped sql_env/models.py matches what the server
inside the Space expects.
6. Docker image pull (10 min) β IF TIME
This is the pattern every OpenEnv env on the hub ships. It's how external users run our env locally for training (no rate limits, full concurrency).
How β option A: pull the pre-built image from HF registry
docker pull --platform linux/amd64 registry.hf.space/hjerpe-sql_env:latest
docker run -d --name sqlenv-smoke -p 8001:8000 --platform linux/amd64 \
registry.hf.space/hjerpe-sql_env:latest
# Wait ~5s for uvicorn to boot
sleep 5
curl -sS http://0.0.0.0:8001/health
open http://0.0.0.0:8001/docs
# Clean up
docker stop sqlenv-smoke && docker rm sqlenv-smoke
How β option B: rebuild locally from our repo (same image the Space runs)
uv run openenv validate --verbose # dry-run config check
uv run openenv build -t openenv-sql-env:local
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local
What to check:
- Image pulls without auth (Space is public)
- Container starts,
/healthreturns 200 -
/docsrenders Swagger on localhost - No
--platformwarnings on Apple Silicon (the Space islinux/amd64, which runs under Rosetta on M-series Macs β slow but functional)
What you're validating: the reproducibility story. A broken image here means the blog's "clone and train" path is dead. Judges rarely click this, but any serious user will.
7. pip install from Space (5 min) β IF TIME
How:
First check the package name declared inside the pushed Space:
curl -sS https://huggingface.co/spaces/hjerpe/sql_env/raw/main/pyproject.toml \
| grep -E '^name'
Then install it into a throwaway venv:
uv venv /tmp/sqlenv-pip-test
source /tmp/sqlenv-pip-test/bin/activate
# Replace "openenv-sql-env" with whatever name the pyproject.toml above shows
pip install "openenv-sql-env @ git+https://huggingface.co/spaces/hjerpe/sql_env"
python -c "from sql_env.client import SQLEnv; print('OK:', SQLEnv)"
deactivate && rm -rf /tmp/sqlenv-pip-test
What to check:
-
pip installresolves without dependency errors - The client class imports from the installed wheel
What you're validating: the pyproject.toml we pushed into the Space
actually declares the package correctly. This is the install method TRL
documents: pip install "<pkg> @ git+https://huggingface.co/spaces/<space>".
8. Concurrency audit β POST-LAUNCH
How:
rg -n "SUPPORTS_CONCURRENT_SESSIONS|max_concurrent_envs|create_app\(" sql_env/server/
Expected result today: no matches (the flag is not set), which means the
Space defaults to 1 concurrent WebSocket session. Per the OpenEnvβTRL
docs, any training run with num_generations > 1 against the hosted Space
will hit capacity errors.
Fix (post-launch): in sql_env/server/app.py (or wherever
create_app(...) is called):
SUPPORTS_CONCURRENT_SESSIONS = True
app = create_app(
create_sql_environment,
SQLAction,
SQLObservation,
max_concurrent_envs=64, # β₯ TRL's generation_batch_size
)
Then uv run openenv build -t ... and uv run openenv push again.
Why it's not a launch blocker: the blog does not ask readers to train
against the hosted Space. Our own training uses the in-process
SQLEnvironment via SQLEnvTRL (not the WebSocket client), so we never hit
this limit. Only matters if an external user wants to run GRPOTrainer
against https://hjerpe-sql-env.hf.space directly. File as a GitHub issue
after the blog ships.
9. TRL environment_factory wrapper β DONE
Already implemented in training/trl_adapter.py::SQLEnvTRL and wired into
notebooks/train_grpo.ipynb cell 16. See the TRL section at the top of this
document for details. No action.
Appendix A: Republish the Space from scratch (reference)
Only run these if step 1β3 show the Space is broken and a rebuild+push is needed. Otherwise skip β the current Space is already live.
Prereqs (one-time):
uv sync # project deps
hf auth login # HuggingFace CLI auth
# (token with write access to hjerpe/sql_env)
Validate + build + push:
# 1. Dry-run config check β confirms the openenv manifest, Dockerfile
# and server entrypoint agree
uv run openenv validate --verbose
# 2. Build the Docker image locally (same image HF Spaces will run)
uv run openenv build -t openenv-sql-env:local
# 3. Optional: smoke-test the local image before pushing
docker run -d --name sqlenv-local -p 8001:8000 openenv-sql-env:local
curl -sS http://0.0.0.0:8001/health
docker stop sqlenv-local && docker rm sqlenv-local
# 4. Push to the Space β creates hjerpe/sql_env if it doesn't exist,
# uploads files, and triggers the Space's own Docker build
uv run openenv push
# expected tail:
# β Authenticated as: hjerpe
# β Space hjerpe/sql_env is ready
# β Upload completed successfully
# Space URL: https://huggingface.co/spaces/hjerpe/sql_env
After push: the Space rebuilds its own Docker image on HF's infra (takes
2β5 min). Watch the build logs in the browser at
https://huggingface.co/spaces/hjerpe/sql_env β "Logs" tab. When it turns
green, re-run steps 1β5 at the top of this doc to verify.
Files that must exist for openenv push to work (already in the repo):
openenv.yamlβ manifest with name, version, descriptionsql_env/server/Dockerfileβ FastAPI + uvicorn containersql_env/server/app.pyβcreate_app(...)entrypointsql_env/models.pyβSQLAction/SQLObservationPydantic modelspyproject.tomlβ pip-installable package metadataREADME.mdβ Space landing page (HF renders it on the Space page)
If any of these drifts out of sync, openenv validate --verbose will flag
it before you push.
Appendix B: Research finding β dangling legacy reward module
Finding: training/rewards.py (151 lines) is legacy dead code from the
pre-F010 rollout-based architecture. It is not used by the production
training path and can be deleted post-launch.
Evidence:
- Module docstring (line 1β5): "Reward callables for TRL GRPO training.
These helpers consume rollout metadata..." β this is the OLD
pattern where reward functions parsed
kwargs['metadata']from TRL rollouts instead of readingenv.rewardfrom environment instances. - Internal helper
_extract_metadata_rows()(line 41): "TRL can pass rollout metadata in different shapes depending on wrapper code." β explicit confirmation this is replay-based reward parsing. - Functions exposed:
reward_correctness,reward_progress,reward_operational. - Zero production imports.
rg 'from.*training\.rewards|training\.rewards\.reward_'returns exactly one hit:tests/unit/test_rewards.py. No script, notebook, or other module intraining/imports it. - The real training path uses
sql_env_reward_funcintraining/trl_adapter.py, which readsenv.rewarddirectly fromSQLEnvTRLinstances. This is theenvironment_factorypattern mandated by F010 and documented as the correct choice (seespecs/F010-IMPLEMENTATION_SPEC.md:173and the user's own memory note: "Use environment_factory or rollout_func, not replay-based reward parsing"). - Notebook
train_grpo.ipynbcell 16:reward_funcs=[sql_env_reward_func]β pulls fromtrl_adapter, notrewards.py.
The only rollout matches in training/ are harmless:
training/prompts.py:1β docstring mentions "GRPO training rollouts"training/rewards.pyβ the legacy module itselfnotebooks/train_grpo.ipynbcell 16 β a local variablebefore_rollouts = sample_random_baseline(...)that has nothing to do with TRL'srollout_func
Recommendation (post-launch, low priority):
- Delete
training/rewards.py - Delete
tests/unit/test_rewards.py - Confirm
uv run pytest tests/ -vstill passes - Commit with message:
refactor: remove legacy rollout-metadata reward module superseded by F010 environment_factory
Why not today: zero risk on launch (nothing imports it in production), and deleting files during blog-publish day is the wrong kind of churn. File as a post-launch cleanup.
Post-launch cleanup
- Delete this file
- File issue for item 8 (Space concurrency)
- Delete
training/rewards.py+tests/unit/test_rewards.py(see Appendix B) - Update
docs/competition-deliverables.mdopen-items list