SQL_debug_env_v1 / README.md
sai1912's picture
Upload folder using huggingface_hub
8ea9b64 verified
metadata
title: SQL Debug RL Environment
emoji: πŸ—„οΈ
colorFrom: purple
colorTo: indigo
sdk: docker
app_port: 7860
short_description: SQL RLVE β€” 7 tasks with live DuckDB verification
tags:
  - reinforcement-learning
  - sql
  - duckdb
  - data-engineering
  - openenv
  - rlve
  - agent
license: apache-2.0

πŸ—„οΈ SQL Debug Environment (OpenEnv)

An execution-based Reinforcement Learning Sandbox for Data Engineering AI Models

OpenEnv Standard DuckDB Built Python 3.10+


πŸ“Œ The Problem

Traditional Large Language Models (LLMs) are primarily trained on static datasets to imitate code syntax. While they can often produce code that looks right, they frequently hallucinate logic or fail on semantic edge cases in rigorous data tasks like SQL generation and ETL pipelines.

When a model generates a bad SQL query during standard training, the pipeline only knows if it's an exact string match to an answer key. This is a fundamentally flawed signal: many different SQL queries can yield the exact same correct data, and conversely, a completely wrong string could be functionally correct. AI models need verifiable, execution-based feedback loops to improve their logic.

πŸ’‘ The Solution

This project provides a state-of-the-art execution-based Reinforcement Learning (RL) environment built specifically for training AI agents on database operations and SQL debugging.

Instead of relying on static string matching, this environment wraps an ephemeral, in-memory DuckDB instance. When an AI agent submits a SQL script, the system:

  1. Dynamically generates mock tables, schemas, and live data in DuckDB.
  2. Sandboxes and executes the AI's generated SQL query natively.
  3. Performs structural AST validation and execution validation.
  4. Computes a continuous, dense fractional reward comparing the AI's output dataframe against the ground-truth dataframe down to the cell level.

This project strictly adheres to the OpenEnv Specifications, making it instantly compatible with agentic frameworks and standard RL algorithms (e.g., PPO or GRPO via HuggingFace's TRL).


πŸš€ QuickStart & Installation

1. Requirements

You will need Python 3.10+ installed on your system. It's recommended to use a virtual environment.

2. Setup the Environment

You can install dependencies using either pip or modern tools like uv:

# Clone the repository
git clone https://github.com/Sairishwanth89/sql-debug-env.git
cd sql-debug-env

# Install dependencies (DuckDB, FastAPI, Pandas, etc.)
pip install -e .

3. Initialize the Server

Since this is an OpenEnv server, you simply run it using uvicorn. This boots up the DuckDB evaluation engine and opens the REST endpoints.

uvicorn app:app --host 0.0.0.0 --port 7860

The server will be live at http://localhost:7860. You can test it by visiting the Swagger UI documentation at http://localhost:7860/docs.


πŸ—οΈ Project Architecture

sql_env/
β”œβ”€β”€ openenv.yaml               # πŸ”§ Manifest: Defines environment capabilities, tasks, and reward structure
β”œβ”€β”€ app.py                     # 🧠 Server: Core OpenEnv FastAPI application & DuckDB execution logic
β”œβ”€β”€ models.py                  # πŸ“¦ Schemas: Pydantic models for API interfaces (State, Reset, Step)
β”œβ”€β”€ client.py                  # 🀝 Client: Python wrapper to cleanly interact with the local environment
β”œβ”€β”€ inference.py               # πŸ€– Agent Loop: Example of an AI agent "playing" the environment
β”œβ”€β”€ train_grpo.py              # πŸ“ˆ Training: Example of hooking the env into RL algorithms (TRL/GRPO)
β”œβ”€β”€ pyproject.toml / uv.lock   # βš™οΈ Config: Modern Python packaging and strict dependency locking
β”œβ”€β”€ Dockerfile                 # 🐳 Deployment: Container configuration for production
β”œβ”€β”€ deploy_hf_space.md         # ☁️ Hugging Face Spaces deployment instructions
└── README.md                  # πŸ“– Documentation

🎯 Supported Tasks

The environment supports 7 tasks β€” 4 foundational and 3 advanced RLVE challenges. Initialize any task via POST /reset with the task_id.

Foundational Tasks

Task ID Difficulty Objective
task_1_easy Easy Fix a SQL query with a missing comma between column names.
task_2_medium Medium Add a missing GROUP BY clause to an aggregation query.
task_3_hard Hard Add PARTITION BY to a window function that ranks globally instead of per-department.
task_4_expert Expert Fix an invalid date literal (month 13) inside a CTE.

Advanced RLVE Tasks (Live DuckDB Verifier)

Task ID Difficulty Verifier Logic
task_5_optimization Advanced Rewrite a CROSS JOIN query to use INNER JOIN. Reward only if output matches baseline and EXPLAIN shows no CROSS_PRODUCT.
task_6_migration Advanced Normalize a denormalized messy_dump table into 3NF (users + orders). Destructive early DROP triggers -0.3 penalty and ends episode.
task_7_chaos Advanced Live ETL corrupts data every step (duplicate IDs + NULL emails). Apply a patch and UNIQUE index before the pipeline contaminates the DB again.

πŸ† Dense Reward System and Anti-Cheating

To prevent the "sparse gradient" problem where RL agents receive flat zero-rewards until they randomly achieve perfection, we implement a dense multi-stepped reward function.

A maximum score is 1.0. Here is how an agent is graded (Tasks 1, 2, 4):

  • +0.10: Parser Validation - Did the SQL successfully parse via AST (no syntax errors)?
  • +0.20: Execution Validation - Did DuckDB successfully run the query against the schema?
  • +0.10: Column Accuracy - Do the returned columns match the expected datatypes and shape?
  • +0.30: Data Similarity (Jaccard) - Fractional reward given based on how closely the dataframe matches the ground-truth data.
  • +0.30: Exact Match Bonus - Strict cell-for-cell match.

πŸ›‘οΈ Penalties

The environment also automatically deducts points via server-side execution analysis to enforce best practices:

  • -0.10: Submitting a duplicate query already attempted in the episode.
  • -0.20: Efficiency penalties (excessive joins or full table scans).
  • -0.30: Destructive actions (DROP, DELETE clauses).
  • -0.50: Hardcoding values to bypass logic.