Spaces:

sai1912
/

SQL_debug_env_v1

Sleeping

App Files Files Community

SQL_debug_env_v1 / README.md

sai1912

Upload folder using huggingface_hub

8ea9b64 verified 14 days ago

preview code

raw

history blame contribute delete

6.78 kB

metadata

title: SQL Debug RL Environment
emoji: 🗄️
colorFrom: purple
colorTo: indigo
sdk: docker
app_port: 7860
short_description: SQL RLVE — 7 tasks with live DuckDB verification
tags:
  - reinforcement-learning
  - sql
  - duckdb
  - data-engineering
  - openenv
  - rlve
  - agent
license: apache-2.0

🗄️ SQL Debug Environment (OpenEnv)

An execution-based Reinforcement Learning Sandbox for Data Engineering AI Models

📌 The Problem

Traditional Large Language Models (LLMs) are primarily trained on static datasets to imitate code syntax. While they can often produce code that looks right, they frequently hallucinate logic or fail on semantic edge cases in rigorous data tasks like SQL generation and ETL pipelines.

When a model generates a bad SQL query during standard training, the pipeline only knows if it's an exact string match to an answer key. This is a fundamentally flawed signal: many different SQL queries can yield the exact same correct data, and conversely, a completely wrong string could be functionally correct. AI models need verifiable, execution-based feedback loops to improve their logic.

💡 The Solution

This project provides a state-of-the-art execution-based Reinforcement Learning (RL) environment built specifically for training AI agents on database operations and SQL debugging.

Instead of relying on static string matching, this environment wraps an ephemeral, in-memory DuckDB instance. When an AI agent submits a SQL script, the system:

Dynamically generates mock tables, schemas, and live data in DuckDB.
Sandboxes and executes the AI's generated SQL query natively.
Performs structural AST validation and execution validation.
Computes a continuous, dense fractional reward comparing the AI's output dataframe against the ground-truth dataframe down to the cell level.

This project strictly adheres to the OpenEnv Specifications, making it instantly compatible with agentic frameworks and standard RL algorithms (e.g., PPO or GRPO via HuggingFace's TRL).

🚀 QuickStart & Installation

1. Requirements

You will need Python 3.10+ installed on your system. It's recommended to use a virtual environment.

2. Setup the Environment

You can install dependencies using either pip or modern tools like uv:

# Clone the repository
git clone https://github.com/Sairishwanth89/sql-debug-env.git
cd sql-debug-env

# Install dependencies (DuckDB, FastAPI, Pandas, etc.)
pip install -e .

3. Initialize the Server

Since this is an OpenEnv server, you simply run it using uvicorn. This boots up the DuckDB evaluation engine and opens the REST endpoints.

uvicorn app:app --host 0.0.0.0 --port 7860

The server will be live at http://localhost:7860. You can test it by visiting the Swagger UI documentation at http://localhost:7860/docs.

🏗️ Project Architecture

sql_env/
├── openenv.yaml               # 🔧 Manifest: Defines environment capabilities, tasks, and reward structure
├── app.py                     # 🧠 Server: Core OpenEnv FastAPI application & DuckDB execution logic
├── models.py                  # 📦 Schemas: Pydantic models for API interfaces (State, Reset, Step)
├── client.py                  # 🤝 Client: Python wrapper to cleanly interact with the local environment
├── inference.py               # 🤖 Agent Loop: Example of an AI agent "playing" the environment
├── train_grpo.py              # 📈 Training: Example of hooking the env into RL algorithms (TRL/GRPO)
├── pyproject.toml / uv.lock   # ⚙️ Config: Modern Python packaging and strict dependency locking
├── Dockerfile                 # 🐳 Deployment: Container configuration for production
├── deploy_hf_space.md         # ☁️ Hugging Face Spaces deployment instructions
└── README.md                  # 📖 Documentation

🎯 Supported Tasks

The environment supports 7 tasks — 4 foundational and 3 advanced RLVE challenges. Initialize any task via POST /reset with the task_id.

Foundational Tasks

Task ID	Difficulty	Objective
`task_1_easy`	Easy	Fix a SQL query with a missing comma between column names.
`task_2_medium`	Medium	Add a missing `GROUP BY` clause to an aggregation query.
`task_3_hard`	Hard	Add `PARTITION BY` to a window function that ranks globally instead of per-department.
`task_4_expert`	Expert	Fix an invalid date literal (month 13) inside a CTE.

Advanced RLVE Tasks (Live DuckDB Verifier)

Task ID	Difficulty	Verifier Logic
`task_5_optimization`	Advanced	Rewrite a CROSS JOIN query to use `INNER JOIN`. Reward only if output matches baseline and `EXPLAIN` shows no `CROSS_PRODUCT`.
`task_6_migration`	Advanced	Normalize a denormalized `messy_dump` table into 3NF (`users` + `orders`). Destructive early `DROP` triggers -0.3 penalty and ends episode.
`task_7_chaos`	Advanced	Live ETL corrupts data every step (duplicate IDs + NULL emails). Apply a patch and `UNIQUE` index before the pipeline contaminates the DB again.

🏆 Dense Reward System and Anti-Cheating

To prevent the "sparse gradient" problem where RL agents receive flat zero-rewards until they randomly achieve perfection, we implement a dense multi-stepped reward function.

A maximum score is 1.0. Here is how an agent is graded (Tasks 1, 2, 4):

+0.10: Parser Validation - Did the SQL successfully parse via AST (no syntax errors)?
+0.20: Execution Validation - Did DuckDB successfully run the query against the schema?
+0.10: Column Accuracy - Do the returned columns match the expected datatypes and shape?
+0.30: Data Similarity (Jaccard) - Fractional reward given based on how closely the dataframe matches the ground-truth data.
+0.30: Exact Match Bonus - Strict cell-for-cell match.

🛡️ Penalties

The environment also automatically deducts points via server-side execution analysis to enforce best practices:

-0.10: Submitting a duplicate query already attempted in the episode.
-0.20: Efficiency penalties (excessive joins or full table scans).
-0.30: Destructive actions (DROP, DELETE clauses).
-0.50: Hardcoding values to bypass logic.