Spaces:
Sleeping
title: SQL Debug RL Environment
emoji: ποΈ
colorFrom: purple
colorTo: indigo
sdk: docker
app_port: 7860
short_description: SQL RLVE β 7 tasks with live DuckDB verification
tags:
- reinforcement-learning
- sql
- duckdb
- data-engineering
- openenv
- rlve
- agent
license: apache-2.0
ποΈ SQL Debug Environment (OpenEnv)
An execution-based Reinforcement Learning Sandbox for Data Engineering AI Models
π The Problem
Traditional Large Language Models (LLMs) are primarily trained on static datasets to imitate code syntax. While they can often produce code that looks right, they frequently hallucinate logic or fail on semantic edge cases in rigorous data tasks like SQL generation and ETL pipelines.
When a model generates a bad SQL query during standard training, the pipeline only knows if it's an exact string match to an answer key. This is a fundamentally flawed signal: many different SQL queries can yield the exact same correct data, and conversely, a completely wrong string could be functionally correct. AI models need verifiable, execution-based feedback loops to improve their logic.
π‘ The Solution
This project provides a state-of-the-art execution-based Reinforcement Learning (RL) environment built specifically for training AI agents on database operations and SQL debugging.
Instead of relying on static string matching, this environment wraps an ephemeral, in-memory DuckDB instance. When an AI agent submits a SQL script, the system:
- Dynamically generates mock tables, schemas, and live data in DuckDB.
- Sandboxes and executes the AI's generated SQL query natively.
- Performs structural AST validation and execution validation.
- Computes a continuous, dense fractional reward comparing the AI's output dataframe against the ground-truth dataframe down to the cell level.
This project strictly adheres to the OpenEnv Specifications, making it instantly compatible with agentic frameworks and standard RL algorithms (e.g., PPO or GRPO via HuggingFace's TRL).
π QuickStart & Installation
1. Requirements
You will need Python 3.10+ installed on your system. It's recommended to use a virtual environment.
2. Setup the Environment
You can install dependencies using either pip or modern tools like uv:
# Clone the repository
git clone https://github.com/Sairishwanth89/sql-debug-env.git
cd sql-debug-env
# Install dependencies (DuckDB, FastAPI, Pandas, etc.)
pip install -e .
3. Initialize the Server
Since this is an OpenEnv server, you simply run it using uvicorn. This boots up the DuckDB evaluation engine and opens the REST endpoints.
uvicorn app:app --host 0.0.0.0 --port 7860
The server will be live at http://localhost:7860. You can test it by visiting the Swagger UI documentation at http://localhost:7860/docs.
ποΈ Project Architecture
sql_env/
βββ openenv.yaml # π§ Manifest: Defines environment capabilities, tasks, and reward structure
βββ app.py # π§ Server: Core OpenEnv FastAPI application & DuckDB execution logic
βββ models.py # π¦ Schemas: Pydantic models for API interfaces (State, Reset, Step)
βββ client.py # π€ Client: Python wrapper to cleanly interact with the local environment
βββ inference.py # π€ Agent Loop: Example of an AI agent "playing" the environment
βββ train_grpo.py # π Training: Example of hooking the env into RL algorithms (TRL/GRPO)
βββ pyproject.toml / uv.lock # βοΈ Config: Modern Python packaging and strict dependency locking
βββ Dockerfile # π³ Deployment: Container configuration for production
βββ deploy_hf_space.md # βοΈ Hugging Face Spaces deployment instructions
βββ README.md # π Documentation
π― Supported Tasks
The environment supports 7 tasks β 4 foundational and 3 advanced RLVE challenges. Initialize any task via POST /reset with the task_id.
Foundational Tasks
| Task ID | Difficulty | Objective |
|---|---|---|
task_1_easy |
Easy | Fix a SQL query with a missing comma between column names. |
task_2_medium |
Medium | Add a missing GROUP BY clause to an aggregation query. |
task_3_hard |
Hard | Add PARTITION BY to a window function that ranks globally instead of per-department. |
task_4_expert |
Expert | Fix an invalid date literal (month 13) inside a CTE. |
Advanced RLVE Tasks (Live DuckDB Verifier)
| Task ID | Difficulty | Verifier Logic |
|---|---|---|
task_5_optimization |
Advanced | Rewrite a CROSS JOIN query to use INNER JOIN. Reward only if output matches baseline and EXPLAIN shows no CROSS_PRODUCT. |
task_6_migration |
Advanced | Normalize a denormalized messy_dump table into 3NF (users + orders). Destructive early DROP triggers -0.3 penalty and ends episode. |
task_7_chaos |
Advanced | Live ETL corrupts data every step (duplicate IDs + NULL emails). Apply a patch and UNIQUE index before the pipeline contaminates the DB again. |
π Dense Reward System and Anti-Cheating
To prevent the "sparse gradient" problem where RL agents receive flat zero-rewards until they randomly achieve perfection, we implement a dense multi-stepped reward function.
A maximum score is 1.0. Here is how an agent is graded (Tasks 1, 2, 4):
+0.10: Parser Validation - Did the SQL successfully parse via AST (no syntax errors)?+0.20: Execution Validation - Did DuckDB successfully run the query against the schema?+0.10: Column Accuracy - Do the returned columns match the expected datatypes and shape?+0.30: Data Similarity (Jaccard) - Fractional reward given based on how closely the dataframe matches the ground-truth data.+0.30: Exact Match Bonus - Strict cell-for-cell match.
π‘οΈ Penalties
The environment also automatically deducts points via server-side execution analysis to enforce best practices:
-0.10: Submitting a duplicate query already attempted in the episode.-0.20: Efficiency penalties (excessive joins or full table scans).-0.30: Destructive actions (DROP,DELETEclauses).-0.50: Hardcoding values to bypass logic.