Spaces:

hjerpe
/

sql_env

Running

App Files Files Community

hjerpe commited on 7 days ago

Commit

a001a97

verified ·

1 Parent(s): a19eef8

Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

DATA_LICENSE +19 -0
LICENSE +21 -0
README.md +86 -108
docs/ARCHITECTURE.md +8 -12
docs/data-sources.md +31 -20
notebooks/showcase_sqlenv.ipynb +201 -97
scripts/download_spider_questions.py +106 -0
server/app.py +10 -1
sql_env.egg-info/PKG-INFO +2 -0
sql_env.egg-info/SOURCES.txt +2 -6

DATA_LICENSE ADDED Viewed

	@@ -0,0 +1,19 @@

+Data License Notice
+Data in data/ is adapted from the Spider dataset (Yu et al., 2018),
+distributed under CC BY-SA 4.0.
+We retrieved question/SQL pairs from the xlangai/spider HuggingFace mirror
+and SQLite databases from the taoyds/spider GitHub mirror, then curated a
+10-database subset, derived gold answers by executing the gold SQL, and
+generated SFT trajectories from those artifacts.
+Derived data in data/ is shared under CC BY-SA 4.0.
+Software code is licensed separately under MIT (see LICENSE).
+References:
+- Spider dataset: https://yale-lily.github.io/spider
+- Yu et al. (2018). Spider: A Large-Scale Human-Labeled Dataset for Complex
+  and Cross-Domain Semantic Parsing and Text-to-SQL Task. EMNLP.
+- xlangai/spider on HuggingFace: https://huggingface.co/datasets/xlangai/spider
+- taoyds/spider on GitHub: https://github.com/taoyds/spider

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 Adam Hjerpe
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -5,168 +5,146 @@ colorFrom: blue
 colorTo: green
 sdk: docker
 app_port: 8000
-pinned: false
 base_path: /web
 ---
-# SQLEnv: Teaching Agents to Explore Databases
 ![Python](https://img.shields.io/badge/python-3.12-blue.svg)
 ![License](https://img.shields.io/badge/license-MIT-green.svg)
-SQLEnv is an interactive RL environment for text-to-SQL reasoning. Instead of producing one-shot SQL, agents learn to think like data analysts: inspect schema, sample rows, run exploratory queries, and submit a final answer with confidence.
-Built for the [OpenEnv Challenge](https://github.com/meta-pytorch/OpenEnv), this project packages environment runtime, dense rewards, evaluation, and training hooks so others can reproduce results and iterate quickly.
-**[Read the blog post](https://hjerpe-sqlenv-blog.static.hf.space)** | **[Source code](https://github.com/hjerpe/sqlenv)**
 ## Quick Start
-Run these three commands to install, validate, and smoke-test the environment:
 ```bash
 uv sync
-uv run openenv validate --verbose
 uv run pytest tests/ -v
 ```
-Local server run:
 ```bash
 uv run uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
 ```
-Docker run:
 ```bash
-docker build -t sql-env:latest -f server/Dockerfile .
-docker run -p 8000:8000 sql-env:latest
-```
-## Why SQLEnv
-Static text-to-SQL benchmarks reward final outputs, not reasoning quality. SQLEnv turns SQL generation into an interactive decision process with feedback at each step, making it suitable for RL training and behavior analysis.
-## Architecture
-```text
-+-------------+      WebSocket       +----------------------+      SQLite
-| RL Agent    | <------------------> | SQLEnvClient         | <----------------+
-| (GRPO/TRL)  |                      | (client.py)          |                 |
-+-------------+                      +----------+-----------+                 |
-                                              HTTP/WebSocket                  |
-                                                     |                         |
-                                                     v                         |
-                                       +--------------------------+            |
-                                       | FastAPI Server           |            |
-                                       | (server.app:app)         |            |
-                                       +------------+-------------+            |
-                                                    |                          |
-                                                    v                          |
-                                       +--------------------------+            |
-                                       | SQLEnvironment           |------------+
-                                       | step/reset/reward/verify |
-                                       +--------------------------+
 ```
 ## How It Works
-Each episode begins with a natural language question mapped to a hidden Spider database. The agent acts through four environment actions:
-| Action | Purpose | Typical Output |
-|--------|---------|----------------|
-| `DESCRIBE table_name` | Inspect schema and column metadata | Column names, types, row count |
-| `SAMPLE table_name` | Inspect representative rows | Small row sample |
-| `QUERY sql_string` | Execute read-only SQL in sandbox | Query result rows or SQL error |
-| `ANSWER value` | Submit final answer | Terminal reward and completion |
-Episode flow:
-1. `reset()` returns question context and available tables.
-2. `step()` executes one exploration action at a time.
-3. `ANSWER` ends the episode with correctness-based terminal reward.
-## Train an Agent
-The environment exposes four tools (`describe`, `sample`, `query`, `answer`) that TRL's GRPOTrainer discovers automatically. The model learns to call these tools through GRPO — no custom rollout code needed.
-### Local test (Docker, CPU)
-Verify the training pipeline end-to-end in about 3 minutes:
 ```bash
 docker build -f Dockerfile.test -t sqlenv-test .
 docker run --rm sqlenv-test
 ```
-This runs 2 training steps with `configs/test_cpu.json` and prints per-step loss, reward, tool call frequency, and model completions.
-### Colab training (GPU)
-Open the notebook and select a GPU runtime (L4 recommended):
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hjerpe/sql-env/blob/main/notebooks/train_grpo.ipynb)
-The notebook uses `configs/colab_l4.json` settings: batch size 4, 4 generations per prompt, bf16 precision. Live reward plots and execution traces update during training.
-### What the model sees
-Each episode, TRL injects tool schemas into the prompt. The model generates structured tool calls:
 ```
-<tool_call>{"name": "describe", "arguments": {"table_name": "employee"}}</tool_call>
-```
-TRL parses this, calls `env.describe(table_name="employee")`, and appends the result. The model can then call more tools or submit an answer. Rewards accumulate from each interaction.
-### Configuration
-Training configs live in `configs/`:
-- `test_cpu.json` — 2 steps, 256 tokens, budget 3 (local validation)
-- `colab_l4.json` — full epoch, 512 tokens, budget 10, bf16 (L4 GPU)
-## HuggingFace Space
-- Live Space: `https://huggingface.co/spaces/<your-org-or-user>/sql-env` (update after push)
-- Health check: `curl https://<space-url>/health`
-- Deploy command: `uv run openenv push`
 ## Project Structure
-```text
-sql-env/
-|- __init__.py
-|- client.py
-|- models.py
-|- openenv.yaml
-|- server/
-|  |- app.py
-|  |- sql_environment.py
-|  |- reward.py
-|  |- verifier.py
-|  `- Dockerfile
-|- data/
-|  |- databases/
-|  `- questions/
-|- training/
-|- evaluation/
-|- notebooks/
-|  `- train_grpo.ipynb
-|- specs/
-|- docs/
-`- tests/
 ```
-## Deployment Checklist
-1. `uv run openenv validate --verbose`
-2. `uv run openenv build`
-3. `uv run openenv push`
-4. Verify `/health` and run one full episode through the client.
-## Links
-- OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
-- OpenEnv docs: https://meta-pytorch.org/OpenEnv/
-- Spider dataset: https://huggingface.co/datasets/xlangai/spider
-- TRL OpenEnv docs: https://huggingface.co/docs/trl/openenv
-- Verification plan: `specs/F007-VERIFICATION_SPEC.md`

 colorTo: green
 sdk: docker
 app_port: 8000
+pinned: true
 base_path: /web
 ---
+# SQLEnv: Teaching Small Models to Explore Databases
 ![Python](https://img.shields.io/badge/python-3.12-blue.svg)
 ![License](https://img.shields.io/badge/license-MIT-green.svg)
+![Data](https://img.shields.io/badge/data-CC%20BY--SA%204.0-orange.svg)
+SQLEnv is an RL environment for training small language models to answer questions about SQL databases through iterative exploration. Instead of producing one-shot SQL from a fully visible schema, the agent discovers the schema step by step using four tools: DESCRIBE, SAMPLE, QUERY, and ANSWER.
+Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and trained with [TRL](https://huggingface.co/docs/trl)'s GRPO implementation. A 0.6B parameter model trained in this environment goes from 0% to ~30% accuracy on a curated Spider subset, learning to explore schemas, recover from SQL errors, and format answers correctly.
+**[Blog post](https://hjerpe-sqlenv-blog.static.hf.space)** | **[Live environment](https://huggingface.co/spaces/hjerpe/sql_env)** | **[Training notebook](notebooks/train_grpo.ipynb)**
 ## Quick Start
 ```bash
 uv sync
 uv run pytest tests/ -v
 ```
+Run the environment locally:
 ```bash
 uv run uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
 ```
+Or with Docker:
 ```bash
+docker build -t sqlenv:latest -f server/Dockerfile .
+docker run -p 8000:8000 sqlenv:latest
 ```
 ## How It Works
+Each episode starts with a natural-language question and a list of table names. The schema (columns, types, relationships) is hidden. The agent uses four actions to explore:
+| Action | Purpose |
+|--------|---------|
+| `DESCRIBE table` | Reveal column names, types, and row count |
+| `SAMPLE table` | Preview representative rows |
+| `QUERY sql` | Execute read-only SQL |
+| `ANSWER value` | Submit a final answer (ends episode) |
+The environment provides dense reward at each step (operational feedback + progress toward the answer) and a terminal reward for correctness (+1.0 correct, 0.0 wrong). See the [blog post](https://hjerpe-sqlenv-blog.static.hf.space) for details on the reward architecture.
+```python
+from server.sql_environment import SQLEnvironment, SQLAction
+env = SQLEnvironment(questions_path="data/questions/questions_train.json",
+                     db_dir="data/databases", tokenizer=tok)
+obs = env.reset(seed=42)
+obs = env.step(SQLAction(action_type="DESCRIBE", argument="employee"))
+obs = env.step(SQLAction(action_type="QUERY", argument="SELECT COUNT(*) FROM employee"))
+obs = env.step(SQLAction(action_type="ANSWER", argument="10"))
+# obs.done=True, obs.reward=1.0
+```
+## Training
+We train [Qwen3-0.6B](https://arxiv.org/abs/2505.09388) using [GRPO](https://arxiv.org/abs/2402.03300) (from DeepSeekMath) through TRL's `environment_factory`. The full pipeline (SFT warmup + two-phase GRPO) runs in ~5 hours on a single Colab L4.
+**Notebooks:**
+- **[train_grpo.ipynb](notebooks/train_grpo.ipynb)** runs the full SFT + GRPO pipeline
+- **[compare_methods.ipynb](notebooks/compare_methods.ipynb)** evaluates base vs trained models
+- **[showcase_sqlenv.ipynb](notebooks/showcase_sqlenv.ipynb)** lets you explore the environment interactively
+**Local test (CPU, ~3 min):**
 ```bash
 docker build -f Dockerfile.test -t sqlenv-test .
 docker run --rm sqlenv-test
 ```
+## Evaluation
+All evaluation runs through the Green Agent evaluator:
+```python
+from sql_env.evaluation import evaluate, RandomPolicy, OraclePolicy
+result = evaluate(env, policy, n_episodes=50, seed=0)
+print(f"Accuracy: {result.success_rate:.1%}, Reward: {result.avg_reward:.3f}")
 ```
+Results on our curated 10-database Spider subset (N=50, 2 runs):
+| Method | Accuracy | Parse Rate | Avg Steps |
+|--------|----------|------------|-----------|
+| Zero-shot | 0% | 24-28% | 10.8-12.4 |
+| 1-shot | 0-2% | 16-17% | 14.0-14.8 |
+| 3-shot | 0% | 19-20% | 13.8-14.8 |
+| GRPO v1 (2 epochs) | 28-30% | 95-100% | 3.5-4.0 |
+| GRPO v2 (4 epochs) | 24-32% | 87-95% | 3.5-4.0 |
+This evaluation is not comparable to the official Spider leaderboard, which uses different scoring, full-schema input, and a broader database set. See the [blog post](https://hjerpe-sqlenv-blog.static.hf.space) for detailed analysis.
+## Data
+676 questions (473 train, 203 eval) across 10 Spider databases with difficulty labels, plus 120 multi-turn SFT warmup trajectories generated from gold SQL. See [docs/data-sources.md](docs/data-sources.md) for full details on provenance, curation, and regeneration.
+Data in `data/` is adapted from [Spider](https://yale-lily.github.io/spider) (Yu et al., 2018) and shared under CC BY-SA 4.0. See [DATA_LICENSE](DATA_LICENSE).
 ## Project Structure
+```
+sqlenv/
+├── __init__.py, client.py, models.py    # Core types and client
+├── server/
+│   ├── app.py                           # FastAPI server
+│   ├── sql_environment.py               # Environment implementation
+│   ├── reward.py                        # Three-layer reward function
+│   ├── verifier.py                      # Answer verification
+│   └── Dockerfile                       # HF Spaces deployment
+├── evaluation/                          # Green Agent evaluator, policies
+├── training/                            # TRL adapter, data loading
+├── scripts/                             # Data curation, SFT generation
+├── notebooks/                           # Training, evaluation, showcase
+├── data/
+│   ├── databases/                       # 10 Spider SQLite databases
+│   ├── questions/                       # Train/eval question sets
+│   └── sft/                             # SFT warmup trajectories
+├── configs/                             # Training configurations
+├── tests/                               # Unit and integration tests
+└── docs/
+    ├── data-sources.md                  # Data provenance
+    └── ARCHITECTURE.md                  # System architecture
 ```
+## References
+- Yu et al. (2018). [Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task](https://yale-lily.github.io/spider). EMNLP.
+- Shao et al. (2024). [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/2402.03300). (GRPO algorithm)
+- Ng, Harada, Russell (1999). [Policy Invariance Under Reward Transformations](https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf). ICML.
+- [OpenEnv framework](https://github.com/meta-pytorch/OpenEnv)
+- [TRL OpenEnv docs](https://huggingface.co/docs/trl/openenv)
+## License
+Code: [MIT](LICENSE). Data: [CC BY-SA 4.0](DATA_LICENSE).

docs/ARCHITECTURE.md CHANGED Viewed

@@ -2,7 +2,7 @@
 > Last updated: 2026-03-29
-System map for SQLEnv — an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.
 **Goals:**
 - Show how components connect (system map + key flows)
@@ -11,8 +11,8 @@ System map for SQLEnv — an RL environment where agents learn interactive SQL e
 - Keep invariants legible (what must stay true)
 **Non-goals:**
-- CLI reference (see `docs/RUNBOOK.md`)
-- Per-feature implementation details (link to specs)
 ---
@@ -45,7 +45,7 @@ System map for SQLEnv — an RL environment where agents learn interactive SQL e
   ──────────                             │  DESCRIBE → PRAGMA     │
   +──────────────+                       │  SAMPLE   → SELECT N   │
   │ evaluate()   │──> env.reset/step     │  QUERY    → SQL exec   │
-  │ policies  │                       │  ANSWER   → verifier   │
   │ .py          │                       +────────┬───────────────+
   +──────────────+                                │
         │                                         v
@@ -267,7 +267,7 @@ class EpisodeContext:
     cumulative_new_info_reward: float = 0.0
 ```
-**POMDP design:** The agent sees `SQLObservation`; the server holds `EpisodeContext`. The agent never sees gold answers, progress scores, or the full database. This separation forces exploration.
 ---
@@ -348,7 +348,7 @@ except ImportError:
 | `QUESTIONS_PATH` | No | `data/questions/student_assessment.json` | Questions JSON |
 | `DB_DIR` | No | `data/databases/` | SQLite database directory |
 | `TOKENIZER_NAME` | No | `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace tokenizer |
-| `PORT` | No | `8000` | Server port (HF Spaces uses 7860) |
 ---
@@ -431,7 +431,7 @@ uv run openenv build           # build Docker image
 uv run openenv push            # push to HF Spaces
 ```
-The Dockerfile uses multi-stage build with `openenv-base`, runs as non-root `appuser`, bundles Spider databases, and exposes `PORT` (default 7860 on HF Spaces).
 ---
@@ -451,7 +451,7 @@ The Dockerfile uses multi-stage build with `openenv-base`, runs as non-root `app
 |------|------------|
 | Episode | One question-answering session: reset -> N steps -> terminal |
 | Action type | One of: DESCRIBE, SAMPLE, QUERY, ANSWER |
-| POMDP | Partially observable MDP — agent acts under uncertainty |
 | Spider | Academic text-to-SQL benchmark dataset (10 DBs used) |
 | OpenEnv | Meta's RL environment framework (Environment, EnvClient) |
 | Green Agent | OpenEnv's evaluation wrapper pattern |
@@ -464,10 +464,6 @@ The Dockerfile uses multi-stage build with `openenv-base`, runs as non-root `app
 ## References
-- Docs index: `docs/README.md`
-- Operations: `docs/RUNBOOK.md`
-- Vision: `vision/VISION.md`
-- Feature specs: `specs/FEATURES.json`
 - OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
 - Spider dataset: https://huggingface.co/datasets/xlangai/spider
 - TRL OpenEnv docs: https://huggingface.co/docs/trl/openenv

 > Last updated: 2026-03-29
+System map for SQLEnv, an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.
 **Goals:**
 - Show how components connect (system map + key flows)
 - Keep invariants legible (what must stay true)
 **Non-goals:**
+- Exhaustive API reference
+- Training hyperparameter tuning guide
 ---
   ──────────                             │  DESCRIBE → PRAGMA     │
   +──────────────+                       │  SAMPLE   → SELECT N   │
   │ evaluate()   │──> env.reset/step     │  QUERY    → SQL exec   │
+  │ policies     │                       │  ANSWER   → verifier   │
   │ .py          │                       +────────┬───────────────+
   +──────────────+                                │
         │                                         v
     cumulative_new_info_reward: float = 0.0
 ```
+**POMDP design:** The agent sees `SQLObservation`. The server holds `EpisodeContext`. The agent never sees gold answers, progress scores, or the full database. This separation forces exploration.
 ---
 | `QUESTIONS_PATH` | No | `data/questions/student_assessment.json` | Questions JSON |
 | `DB_DIR` | No | `data/databases/` | SQLite database directory |
 | `TOKENIZER_NAME` | No | `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace tokenizer |
+| `PORT` | No | `8000` | Server port |
 ---
 uv run openenv push            # push to HF Spaces
 ```
+The Dockerfile uses multi-stage build with `openenv-base`, runs as non-root `appuser`, bundles Spider databases, and exposes port 8000.
 ---
 |------|------------|
 | Episode | One question-answering session: reset -> N steps -> terminal |
 | Action type | One of: DESCRIBE, SAMPLE, QUERY, ANSWER |
+| POMDP | Partially observable MDP. Agent acts under uncertainty |
 | Spider | Academic text-to-SQL benchmark dataset (10 DBs used) |
 | OpenEnv | Meta's RL environment framework (Environment, EnvClient) |
 | Green Agent | OpenEnv's evaluation wrapper pattern |
 ## References
 - OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
 - Spider dataset: https://huggingface.co/datasets/xlangai/spider
 - TRL OpenEnv docs: https://huggingface.co/docs/trl/openenv

docs/data-sources.md CHANGED Viewed

@@ -14,7 +14,7 @@ so a fresh clone works offline after `uv sync`.
 | DB allowlist | `data/questions/db_list.json` | hand-curated subset | 10 db_ids |
 | SFT trajectories | `data/sft/sft_trajectories.json` | generated from gold SQL | 120 trajectories |
-Total: ~676 questions across 10 Spider databases, plus 120 multi-turn SFT
 warmup trajectories.
 ## Upstream: Spider
@@ -26,7 +26,7 @@ gold SQL query, and a target database. We use two mirrors:
 1. **Questions** via HuggingFace Datasets: [`xlangai/spider`](https://huggingface.co/datasets/xlangai/spider)
    — loaded with `datasets.load_dataset("xlangai/spider", split=...)` in
-   `scripts/download_spider_data.py`.
 2. **SQLite databases** via the Spider GitHub mirror:
    - `https://raw.githubusercontent.com/taoyds/spider/master/database/{db_id}/{db_id}.sqlite`
    - Fallback: the official Google Drive archive
@@ -56,8 +56,8 @@ database. This prevents train/eval leakage at the schema level:
   dog_kennels, employee_hire_evaluation, flight_2, student_assessment`
 - **Eval databases** (4): `flight_2, pets_1, poker_player, world_1`
-`flight_2` appears in both; other eval DBs are schemas the model never
-saw during training. `sql_env.training.data_loading.validate_no_data_leak`
 asserts zero question-text overlap between the two files at load time.
 ## Question files
@@ -86,10 +86,13 @@ with this shape (actual sample from `car_1` train):
 | train | 473 | 435 | 32 | 6 |
 | eval  | 203 | 185 | 18 | 0 |
-The easy-heavy distribution is deliberate for the 0.6B capacity ceiling
-(see `docs/blog-material.md` — "The 0.6B Capacity Ceiling"). Medium and
-hard questions are kept in the mix for Phase 2 exposure but are not where
-this model size gains accuracy.
 ### Curation pipeline
@@ -126,18 +129,26 @@ one, runs the real `SQLEnvironment` programmatically:
 3. `answer(gold_answer)` — terminal step
 The captured sequence becomes an assistant-labelled trajectory. This is
-**not synthetic text** — the assistant turns wrap the actual environment
-responses the model will see at training and inference time, which is
-what lets GRPO's KL anchor point align with real env output.
-The 120-count is smaller than the 473 training questions because SFT
-samples a subset that exercises each database and difficulty bucket;
-see `scripts/generate_sft_data.py` for the selection logic.
 Why multi-turn matters: an earlier per-turn SFT (347 single-turn
-examples) taught the model to always call `describe` and nothing else.
-Multi-turn teaches the full `describe → query → answer` sequence. See
-`docs/blog-material.md` — "Multi-Turn SFT — Why It's Critical".
 ## How to regenerate from scratch
@@ -146,8 +157,8 @@ Multi-turn teaches the full `describe → query → answer` sequence. See
 uv run python scripts/download_spider_databases.py --db-id all
 # 2. Raw Spider questions (via HF Datasets)
-uv run python scripts/download_spider_data.py --db-id all --split train
-uv run python scripts/download_spider_data.py --db-id all --split validation
 # 3. Curate into questions_train.json / questions_eval.json
 uv run python scripts/curate_questions.py
@@ -164,7 +175,7 @@ snapshot.
 ## What we deliberately do not use
 - **BIRD** (Li et al., 2023) — larger, harder text-to-SQL benchmark. Out
-  of scope for a 0.6B model; revisit for a larger-model follow-up.
 - **WikiSQL** — single-table only, doesn't exercise the multi-turn
   exploration the environment is built for.
 - **Synthetic LLM-generated questions** — we want Spider's human-written

 | DB allowlist | `data/questions/db_list.json` | hand-curated subset | 10 db_ids |
 | SFT trajectories | `data/sft/sft_trajectories.json` | generated from gold SQL | 120 trajectories |
+Total: 676 questions across 10 Spider databases, plus 120 multi-turn SFT
 warmup trajectories.
 ## Upstream: Spider
 1. **Questions** via HuggingFace Datasets: [`xlangai/spider`](https://huggingface.co/datasets/xlangai/spider)
    — loaded with `datasets.load_dataset("xlangai/spider", split=...)` in
+   `scripts/download_spider_questions.py`.
 2. **SQLite databases** via the Spider GitHub mirror:
    - `https://raw.githubusercontent.com/taoyds/spider/master/database/{db_id}/{db_id}.sqlite`
    - Fallback: the official Google Drive archive
   dog_kennels, employee_hire_evaluation, flight_2, student_assessment`
 - **Eval databases** (4): `flight_2, pets_1, poker_player, world_1`
+`flight_2` appears in both. The other eval DBs are schemas the model
+never saw during training. `sql_env.training.data_loading.validate_no_data_leak`
 asserts zero question-text overlap between the two files at load time.
 ## Question files
 | train | 473 | 435 | 32 | 6 |
 | eval  | 203 | 185 | 18 | 0 |
+The easy-heavy distribution is deliberate for the 0.6B capacity ceiling.
+Extended GRPO training on harder questions produced identical accuracy,
+which indicates the ceiling comes from pretraining knowledge rather than
+training budget. Medium and hard questions stay in the mix for Phase 2
+exposure but are not where this model size gains accuracy. See the
+"Limitations at 0.6B Parameters" section of the
+[blog post](https://hjerpe-sqlenv-blog.static.hf.space).
 ### Curation pipeline
 3. `answer(gold_answer)` — terminal step
 The captured sequence becomes an assistant-labelled trajectory. This is
+**not synthetic text**. The assistant turns wrap the actual environment
+responses the model will see at training and inference time, so the
+SFT-warmed reference policy already expects real env output when GRPO
+takes over.
+SFT uses 120 trajectories rather than one per training question. The
+subset is chosen to cover each database and difficulty bucket. See
+`scripts/generate_sft_data.py` for the selection logic.
 Why multi-turn matters: an earlier per-turn SFT (347 single-turn
+examples) taught the model to always call `describe`. Half those
+examples were describe calls, so the model learned "when asked a
+question, call describe." Under a KL penalty during GRPO, every rollout
+stayed identical, the advantage between rollouts was zero, and no policy
+gradient could form. Multi-turn SFT (120 full trajectories trained with
+`assistant_only_loss`) instead teaches the full
+`describe → query → answer` sequence as a coherent strategy, which GRPO
+then refines into error recovery and answer formatting. See the
+"Training" section of the
+[blog post](https://hjerpe-sqlenv-blog.static.hf.space).
 ## How to regenerate from scratch
 uv run python scripts/download_spider_databases.py --db-id all
 # 2. Raw Spider questions (via HF Datasets)
+uv run python scripts/download_spider_questions.py --db-id all --split train
+uv run python scripts/download_spider_questions.py --db-id all --split validation
 # 3. Curate into questions_train.json / questions_eval.json
 uv run python scripts/curate_questions.py
 ## What we deliberately do not use
 - **BIRD** (Li et al., 2023) — larger, harder text-to-SQL benchmark. Out
+  of scope for a 0.6B model. Revisit for a larger-model follow-up.
 - **WikiSQL** — single-table only, doesn't exercise the multi-turn
   exploration the environment is built for.
 - **Synthetic LLM-generated questions** — we want Spider's human-written

notebooks/showcase_sqlenv.ipynb CHANGED Viewed

@@ -29,40 +29,83 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Project root: /Users/hjerp/Projects/sql-env\n"
-     ]
-    }
-   ],
    "source": [
     "import os\n",
     "import sys\n",
     "from pathlib import Path\n",
     "\n",
     "\n",
-    "def find_project_root() -> Path:\n",
-    "    \"\"\"Walk up from CWD until pyproject.toml is found.\"\"\"\n",
-    "    for parent in [Path.cwd(), *Path.cwd().parents]:\n",
-    "        if (parent / \"pyproject.toml\").exists():\n",
-    "            return parent\n",
-    "    raise FileNotFoundError(\"Could not locate project root (no pyproject.toml found)\")\n",
     "\n",
     "\n",
-    "PROJECT_ROOT = find_project_root()\n",
-    "os.chdir(PROJECT_ROOT)\n",
     "if str(PROJECT_ROOT) not in sys.path:\n",
     "    sys.path.insert(0, str(PROJECT_ROOT))\n",
     "\n",
-    "# In Colab, uncomment:\n",
-    "# !pip install -q git+https://github.com/hjerpe/sql-env.git\n",
-    "# !python scripts/download_spider_databases.py\n",
-    "\n",
     "print(f\"Project root: {PROJECT_ROOT}\")"
    ]
   },
@@ -553,62 +596,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Q: List the id of students who registered some courses and the number of their registered courses?\n",
-      "\n",
-      "  Step 1: DESCRIBE\n",
-      "    Action:  student_course_registrations\n",
-      "    Result:  Table 'Student_Course_Registrations' columns:\n",
-      "- student_id: INTEGER\n",
-      "- course_id: INTEGER\n",
-      "- registration_date: DATETIME\n",
-      "Row count: 9\n",
-      "    Reward:  +0.0150\n",
-      "\n",
-      "  Step 2: DESCRIBE\n",
-      "    Action:  students\n",
-      "    Result:  Table 'Students' columns:\n",
-      "- student_id: INTEGER\n",
-      "- student_details: VARCHAR(255)\n",
-      "Row count: 8\n",
-      "    Reward:  +0.0150\n",
-      "\n",
-      "  Step 3: QUERY\n",
-      "    SQL:\n",
-      "      SELECT T1.student_id ,  count(*) \n",
-      "        FROM students AS T1 \n",
-      "        JOIN student_course_registrations AS T2 \n",
-      "        ON T1.student_id = T2.student_id \n",
-      "        GROUP BY T1.student_id\n",
-      "    Result:  1. 111 | 1\n",
-      "2. 121 | 2\n",
-      "3. 131 | 1\n",
-      "4. 141 | 2\n",
-      "5. 151 | 1\n",
-      "6. 161 | 1\n",
-      "7. 171 | 1\n",
-      "    Reward:  +0.1500\n",
-      "\n",
-      "  Step 4: ANSWER\n",
-      "    Action:  [[111, 1], [121, 2], [131, 1], [141, 2], [151, 1], [161, 1], [171, 1]]\n",
-      "    Result:  Answer submitted: correct.\n",
-      "    Reward:  +1.0000\n",
-      "\n",
-      "Total reward: 1.180\n",
-      "  Exploration (L1+L2): 0.180  (3 steps)\n",
-      "  Terminal (L3):       1.000\n"
-     ]
-    }
-   ],
    "source": [
     "import re\n",
     "\n",
     "\n",
     "def format_sql(sql):\n",
     "    \"\"\"Simple SQL formatter for display.\"\"\"\n",
@@ -617,14 +617,58 @@
     "    return formatted\n",
     "\n",
     "\n",
-    "# Run one oracle episode and show per-step rewards\n",
     "obs = env.reset(seed=0)\n",
     "oracle = OraclePolicy(questions)\n",
     "step_rewards = []\n",
     "\n",
     "print(f\"Q: {obs.question}\\n\")\n",
     "while not obs.done:\n",
     "    action = oracle.select_action(obs)\n",
     "    obs = env.step(action)\n",
     "    reward = obs.reward or 0.0\n",
     "    step_rewards.append(reward)\n",
@@ -640,7 +684,7 @@
     "        print(f\"    Result:  {obs.result}\")\n",
     "    if obs.error:\n",
     "        print(f\"    Error:   {obs.error}\")\n",
-    "    print(f\"    Reward:  {reward:+.4f}\")\n",
     "    print()\n",
     "\n",
     "exploration = sum(step_rewards[:-1]) if len(step_rewards) > 1 else 0.0\n",
@@ -655,9 +699,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 8. Connect to a Deployed Space\n",
     "\n",
-    "The same environment runs as a Docker container on HuggingFace Spaces. The `SQLEnvClient` connects via WebSocket and provides the same `reset()`/`step()` interface."
    ]
   },
   {
@@ -666,25 +718,77 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Uncomment to connect to a running Space:\n",
-    "#\n",
-    "# from sql_env.client import SQLEnvClient\n",
-    "# from sql_env.models import SQLAction\n",
-    "#\n",
-    "# client = SQLEnvClient(base_url=\"wss://your-space.hf.space\")\n",
-    "# client.connect()\n",
-    "# result = client.reset(seed=42)\n",
-    "# obs = result.observation\n",
-    "# print(\"Question:\", obs.question)\n",
-    "# print(\"Schema:\", obs.schema_info)\n",
-    "#\n",
-    "# # Same actions work over the wire:\n",
-    "# step = client.step(SQLAction(action_type=\"DESCRIBE\", argument=\"employees\"))\n",
-    "# print(\"Result:\", step.observation.result)\n",
-    "#\n",
-    "# client.close()\n",
-    "\n",
-    "print(\"Uncomment the cell above and set your HF Space URL to connect remotely.\")"
    ]
   },
   {

   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
+   "outputs": [],
    "source": [
     "import os\n",
+    "import subprocess\n",
     "import sys\n",
     "from pathlib import Path\n",
     "\n",
+    "IN_COLAB = \"google.colab\" in sys.modules\n",
+    "\n",
+    "if IN_COLAB:\n",
+    "    # Colab: clone the repo, install the package, fetch Spider databases.\n",
+    "    # Requires a GITHUB_TOKEN in Colab userdata if the repo is private.\n",
+    "    from google.colab import userdata\n",
+    "\n",
+    "    token = userdata.get(\"GITHUB_TOKEN\")\n",
+    "    BRANCH = \"main\"  # @param {type:\"string\"}\n",
+    "    repo_url = f\"https://{token}@github.com/hjerpe/sql-env.git\"\n",
+    "\n",
+    "    if Path(\"sql-env\").exists():\n",
+    "        subprocess.check_call([\"git\", \"-C\", \"sql-env\", \"pull\", \"-q\"])\n",
+    "    else:\n",
+    "        subprocess.check_call([\"git\", \"clone\", \"-q\", \"-b\", BRANCH, repo_url])\n",
+    "    os.chdir(\"sql-env\")\n",
+    "\n",
+    "    print(\"Colab detected: installing dependencies...\")\n",
+    "    subprocess.check_call(\n",
+    "        [sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"--upgrade\", \"pip\"]\n",
+    "    )\n",
+    "    subprocess.check_call(\n",
+    "        [\n",
+    "            sys.executable,\n",
+    "            \"-m\",\n",
+    "            \"pip\",\n",
+    "            \"install\",\n",
+    "            \"-q\",\n",
+    "            \"--no-deps\",\n",
+    "            \"--force-reinstall\",\n",
+    "            \".\",\n",
+    "        ]\n",
+    "    )\n",
+    "    subprocess.check_call(\n",
+    "        [\n",
+    "            sys.executable,\n",
+    "            \"-m\",\n",
+    "            \"pip\",\n",
+    "            \"install\",\n",
+    "            \"-q\",\n",
+    "            \"openenv-core[core]>=0.2.1\",\n",
+    "            \"pydantic>=2.0.0\",\n",
+    "            \"jmespath\",\n",
+    "        ]\n",
+    "    )\n",
+    "    # Download Spider SQLite databases the notebook reads from\n",
+    "    subprocess.check_call(\n",
+    "        [sys.executable, \"scripts/download_spider_databases.py\", \"--db-id\", \"all\"]\n",
+    "    )\n",
     "\n",
+    "    PROJECT_ROOT = Path.cwd()\n",
+    "else:\n",
+    "    # Local: walk up from CWD to find the project root\n",
+    "    def find_project_root() -> Path:\n",
+    "        \"\"\"Walk up from CWD until pyproject.toml is found.\"\"\"\n",
+    "        for parent in [Path.cwd(), *Path.cwd().parents]:\n",
+    "            if (parent / \"pyproject.toml\").exists():\n",
+    "                return parent\n",
+    "        raise FileNotFoundError(\n",
+    "            \"Could not locate project root (no pyproject.toml found)\"\n",
+    "        )\n",
     "\n",
+    "    PROJECT_ROOT = find_project_root()\n",
+    "    os.chdir(PROJECT_ROOT)\n",
     "\n",
     "if str(PROJECT_ROOT) not in sys.path:\n",
     "    sys.path.insert(0, str(PROJECT_ROOT))\n",
     "\n",
     "print(f\"Project root: {PROJECT_ROOT}\")"
    ]
   },
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
+   "outputs": [],
    "source": [
     "import re\n",
     "\n",
+    "from server.reward import (\n",
+    "    _EXEC_OK_REWARD,\n",
+    "    _NEW_INFO_REWARD,\n",
+    "    _REPEAT_PENALTY,\n",
+    "    _STEP_COST,\n",
+    ")\n",
+    "\n",
     "\n",
     "def format_sql(sql):\n",
     "    \"\"\"Simple SQL formatter for display.\"\"\"\n",
     "    return formatted\n",
     "\n",
     "\n",
+    "def explain_reward(action_type, error, is_repeat_query, total_reward):\n",
+    "    \"\"\"Decompose a step reward into labeled components.\n",
+    "\n",
+    "    Layer 1 components (step_cost, exec_ok, new_info, repeat_penalty) are\n",
+    "    deterministic from action type + state, so we reconstruct them exactly\n",
+    "    from the reward constants imported above. Layer 2 (progress delta on\n",
+    "    QUERY) and Layer 3 (terminal on ANSWER) are not exposed in the\n",
+    "    observation, so we recover them as 'total reward minus L1 sum' and\n",
+    "    label them accordingly. The clip range [-0.10, +0.15] may adjust the\n",
+    "    final value — any residual after layer reconstruction is labeled\n",
+    "    'clip_adjust'.\n",
+    "    \"\"\"\n",
+    "    at = action_type.upper()\n",
+    "    parts = [(\"step_cost\", -_STEP_COST)]  # always applied\n",
+    "\n",
+    "    if error:\n",
+    "        pass  # no exec_ok when the action errored\n",
+    "    elif at == \"QUERY\" and is_repeat_query:\n",
+    "        parts.append((\"repeat_penalty\", -_REPEAT_PENALTY))\n",
+    "    else:\n",
+    "        parts.append((\"exec_ok\", +_EXEC_OK_REWARD))\n",
+    "        if at == \"QUERY\":\n",
+    "            parts.append((\"new_info\", +_NEW_INFO_REWARD))\n",
+    "\n",
+    "    l1_sum = sum(v for _, v in parts)\n",
+    "    remainder = total_reward - l1_sum\n",
+    "\n",
+    "    if abs(remainder) > 1e-9:\n",
+    "        if at == \"ANSWER\":\n",
+    "            parts.append((\"terminal\", remainder))\n",
+    "        elif at == \"QUERY\":\n",
+    "            parts.append((\"layer2_progress\", remainder))\n",
+    "        else:\n",
+    "            parts.append((\"clip_adjust\", remainder))\n",
+    "\n",
+    "    labels = \" + \".join(f\"{name}({v:+.3f})\" for name, v in parts)\n",
+    "    return f\"{labels} = {total_reward:+.4f}\"\n",
+    "\n",
+    "\n",
+    "# Run one oracle episode and show per-step rewards with component breakdown\n",
     "obs = env.reset(seed=0)\n",
     "oracle = OraclePolicy(questions)\n",
     "step_rewards = []\n",
+    "seen_queries: set[str] = set()\n",
     "\n",
     "print(f\"Q: {obs.question}\\n\")\n",
     "while not obs.done:\n",
     "    action = oracle.select_action(obs)\n",
+    "    is_repeat = action.action_type.upper() == \"QUERY\" and action.argument in seen_queries\n",
+    "    if action.action_type.upper() == \"QUERY\":\n",
+    "        seen_queries.add(action.argument)\n",
+    "\n",
     "    obs = env.step(action)\n",
     "    reward = obs.reward or 0.0\n",
     "    step_rewards.append(reward)\n",
     "        print(f\"    Result:  {obs.result}\")\n",
     "    if obs.error:\n",
     "        print(f\"    Error:   {obs.error}\")\n",
+    "    print(f\"    Reward:  {explain_reward(action.action_type, obs.error, is_repeat, reward)}\")\n",
     "    print()\n",
     "\n",
     "exploration = sum(step_rewards[:-1]) if len(step_rewards) > 1 else 0.0\n",
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## 8. Same Environment, Over the Wire\n",
+    "\n",
+    "The same `SQLEnvironment` runs as a Docker container on HuggingFace Spaces:\n",
+    "[**huggingface.co/spaces/hjerpe/sql_env**](https://huggingface.co/spaces/hjerpe/sql_env).\n",
+    "`SQLEnvClient` connects via WebSocket and provides the same `reset()` /\n",
+    "`step()` interface we used above — same action space, same observation shape,\n",
+    "same reward model. The only difference is that the SQLite database and\n",
+    "reward computation now live on a remote container instead of in this\n",
+    "Python process.\n",
     "\n",
+    "The cell below drives one full episode against the live Space."
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
+    "from sql_env.client import SQLEnvClient\n",
+    "from sql_env.models import SQLAction\n",
+    "\n",
+    "# Live hosted Space. This is the URL anyone in the world can point a client\n",
+    "# at — no local setup required. The first request may take ~30s if the\n",
+    "# container is cold-starting.\n",
+    "SPACE_URL = \"https://hjerpe-sql-env.hf.space\"\n",
+    "\n",
+    "print(f\"Connecting to {SPACE_URL} ...\\n\")\n",
+    "\n",
+    "# openenv-core's SQLEnvClient is sync-by-default in older versions but\n",
+    "# async-by-default in newer ones (the newer API exposes .sync() as an\n",
+    "# explicit synchronous wrapper). Detect at runtime so the cell works on\n",
+    "# both local dev installs and Colab's pinned >=0.2.1 version.\n",
+    "_remote_client = SQLEnvClient(base_url=SPACE_URL)\n",
+    "_remote_ctx = _remote_client.sync() if hasattr(_remote_client, \"sync\") else _remote_client\n",
+    "\n",
+    "try:\n",
+    "    with _remote_ctx as remote_env:\n",
+    "        # --- reset ---\n",
+    "        result = remote_env.reset()\n",
+    "        remote_obs = result.observation\n",
+    "        print(f\"Q: {remote_obs.question}\")\n",
+    "        tables = [\n",
+    "            line.lstrip(\"- \").strip()\n",
+    "            for line in remote_obs.schema_info.splitlines()[1:]\n",
+    "            if line.strip()\n",
+    "        ]\n",
+    "        print(f\"Tables: {tables}\\n\")\n",
+    "\n",
+    "        first_table = tables[0]\n",
+    "\n",
+    "        # --- describe ---\n",
+    "        result = remote_env.step(\n",
+    "            SQLAction(action_type=\"DESCRIBE\", argument=first_table)\n",
+    "        )\n",
+    "        print(f\"DESCRIBE {first_table}\")\n",
+    "        print(f\"  reward: {result.observation.reward:+.4f}\")\n",
+    "        # Line-based preview so truncation never cuts mid-word\n",
+    "        _lines = result.observation.result.splitlines()\n",
+    "        _preview = \"\\n          \".join(_lines[:6])\n",
+    "        _more = (\n",
+    "            f\"\\n          ... ({len(_lines) - 6} more lines)\"\n",
+    "            if len(_lines) > 6\n",
+    "            else \"\"\n",
+    "        )\n",
+    "        print(f\"  result: {_preview}{_more}\\n\")\n",
+    "\n",
+    "        # --- query ---\n",
+    "        query_sql = f\"SELECT COUNT(*) FROM {first_table}\"\n",
+    "        result = remote_env.step(\n",
+    "            SQLAction(action_type=\"QUERY\", argument=query_sql)\n",
+    "        )\n",
+    "        print(f\"QUERY {query_sql}\")\n",
+    "        print(f\"  reward: {result.observation.reward:+.4f}\")\n",
+    "        print(f\"  result: {result.observation.result}\\n\")\n",
+    "\n",
+    "        # --- answer (intentionally wrong — we're demoing plumbing, not correctness) ---\n",
+    "        result = remote_env.step(\n",
+    "            SQLAction(action_type=\"ANSWER\", argument=\"demo\")\n",
+    "        )\n",
+    "        print(f\"ANSWER demo\")\n",
+    "        print(f\"  done:   {result.observation.done}\")\n",
+    "        print(f\"  reward: {result.observation.reward:+.4f}\")\n",
+    "        print(\"\\nSame action space, same observation shape, same rewards — just running remotely.\")\n",
+    "except Exception as exc:  # noqa: BLE001 — demo cell should not crash the notebook\n",
+    "    print(f\"Remote call failed: {type(exc).__name__}: {exc}\")\n",
+    "    print(\n",
+    "        \"If the Space is sleeping, the first request usually wakes it. \"\n",
+    "        \"Retry in ~30s, or skip this cell to run the notebook fully offline.\"\n",
+    "    )"
    ]
   },
   {

scripts/download_spider_questions.py ADDED Viewed

	@@ -0,0 +1,106 @@

+"""
+Script to download Spider dataset questions for specific databases.
+Usage:
+    python download_spider_questions.py --db-id student_assessment
+    python download_spider_questions.py --db-id student_assessment --split validation
+    python download_spider_questions.py --db-id all  # downloads all db_ids
+"""
+import json
+import argparse
+from pathlib import Path
+from datasets import load_dataset
+def download_spider_questions(
+    db_id: str = "student_assessment",
+    split: str = "train",
+    output_dir: str = "data/questions",
+) -> None:
+    """Download Spider dataset questions for specified database(s).
+    Args:
+        db_id: Database ID to filter by, or "all" to get all databases
+        split: Dataset split ("train" or "validation")
+        output_dir: Directory to save JSON files
+    """
+    output_path = Path(output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+    print(f"Loading Spider dataset ({split} split)...")
+    dataset = load_dataset("xlangai/spider", split=split)
+    if db_id.lower() == "all":
+        # Group by db_id
+        grouped = {}
+        for item in dataset:
+            current_db_id = item.get("db_id")
+            if current_db_id not in grouped:
+                grouped[current_db_id] = []
+            grouped[current_db_id].append(item)
+        total_questions = 0
+        for current_db_id, questions in grouped.items():
+            filepath = output_path / f"{current_db_id}.json"
+            with open(filepath, "w") as f:
+                json.dump(questions, f, indent=2)
+            print(f"  {current_db_id}: {len(questions)} questions → {filepath}")
+            total_questions += len(questions)
+        print(f"\nTotal: {total_questions} questions across {len(grouped)} databases")
+    else:
+        # Filter for specific db_id
+        filtered_data = [item for item in dataset if item.get("db_id") == db_id]
+        if not filtered_data:
+            print(f"No questions found for db_id='{db_id}'")
+            return
+        filepath = output_path / f"{db_id}.json"
+        with open(filepath, "w") as f:
+            json.dump(filtered_data, f, indent=2)
+        print(f"Found {len(filtered_data)} questions for db_id='{db_id}'")
+        print(f"Saved to {filepath}")
+        # Print sample
+        if filtered_data:
+            sample = filtered_data[0]
+            print("\nFirst question sample:")
+            print(
+                json.dumps(
+                    {k: v for k, v in sample.items() if k != "evidence"}, indent=2
+                )
+            )
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Download Spider dataset questions for specific databases",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "--db-id",
+        type=str,
+        default="student_assessment",
+        help="Database ID to filter by (or 'all' for all databases)",
+    )
+    parser.add_argument(
+        "--split",
+        type=str,
+        default="train",
+        choices=["train", "validation"],
+        help="Dataset split to download",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default="data/questions",
+        help="Directory to save JSON files",
+    )
+    args = parser.parse_args()
+    download_spider_questions(
+        db_id=args.db_id, split=args.split, output_dir=args.output_dir
+    )

server/app.py CHANGED Viewed

@@ -80,7 +80,16 @@ def create_sql_environment():
     )
-# Create the FastAPI app
 app = create_app(
     create_sql_environment,
     SQLAction,

     )
+# Create the FastAPI app.
+#
+# Note: hosted Space is single-session. External users running TRL's
+# GRPOTrainer against https://hjerpe-sql-env.hf.space with
+# num_generations > 1 will hit openenv-core's default 1-session cap.
+# Fix requires (a) auditing SQLEnvironment for shared mutable state
+# across sessions, (b) declaring SUPPORTS_CONCURRENT_SESSIONS=True on
+# the class, (c) passing max_concurrent_envs=64 here. Deferred as a
+# post-launch follow-up. Our own training uses an in-process
+# SQLEnvironment via SQLEnvTRL, so this does not affect internal runs.
 app = create_app(
     create_sql_environment,
     SQLAction,

sql_env.egg-info/PKG-INFO CHANGED Viewed

@@ -3,6 +3,7 @@ Name: sql-env
 Version: 0.1.0
 Summary: Interactive SQL exploration RL environment for the OpenEnv Challenge
 Requires-Python: <3.13,>=3.11
 Requires-Dist: openenv-core[core]>=0.2.1
 Requires-Dist: pydantic>=2.0.0
 Requires-Dist: fastapi>=0.104.0
@@ -24,3 +25,4 @@ Requires-Dist: trl>=0.29.0; extra == "training"
 Requires-Dist: accelerate>=0.34.0; extra == "training"
 Requires-Dist: notebook>=7.5.5; extra == "training"
 Requires-Dist: matplotlib>=3.7.0; extra == "training"

 Version: 0.1.0
 Summary: Interactive SQL exploration RL environment for the OpenEnv Challenge
 Requires-Python: <3.13,>=3.11
+License-File: LICENSE
 Requires-Dist: openenv-core[core]>=0.2.1
 Requires-Dist: pydantic>=2.0.0
 Requires-Dist: fastapi>=0.104.0
 Requires-Dist: accelerate>=0.34.0; extra == "training"
 Requires-Dist: notebook>=7.5.5; extra == "training"
 Requires-Dist: matplotlib>=3.7.0; extra == "training"
+Dynamic: license-file

sql_env.egg-info/SOURCES.txt CHANGED Viewed

@@ -1,8 +1,5 @@
 README.md
-__init__.py
-client.py
-conftest.py
-models.py
 pyproject.toml
 ./__init__.py
 ./client.py
@@ -16,9 +13,9 @@ evaluation/oracle_policy.py
 evaluation/policies.py
 server/__init__.py
 server/app.py
 server/reward.py
 server/sql_environment.py
-server/test_sql_env.py
 server/verifier.py
 sql_env.egg-info/PKG-INFO
 sql_env.egg-info/SOURCES.txt
@@ -38,6 +35,5 @@ training/data_loading.py
 training/few_shot_examples.py
 training/notebook_pipeline.py
 training/prompts.py
-training/rewards.py
 training/trl_adapter.py
 training/visualization.py

+LICENSE
 README.md
 pyproject.toml
 ./__init__.py
 ./client.py
 evaluation/policies.py
 server/__init__.py
 server/app.py
+server/mock_tokenizer.py
 server/reward.py
 server/sql_environment.py
 server/verifier.py
 sql_env.egg-info/PKG-INFO
 sql_env.egg-info/SOURCES.txt
 training/few_shot_examples.py
 training/notebook_pipeline.py
 training/prompts.py
 training/trl_adapter.py
 training/visualization.py