Spaces:
Sleeping
Sleeping
SQL Data Analyst β OpenEnv Environment
An RL training environment where an AI agent learns to answer business intelligence questions by writing and executing SQL queries against a live database.
Motivation
Data analysts spend significant time translating business questions into SQL queries. This environment trains agents to do exactly that β iteratively exploring a database schema, writing queries, observing results, and submitting final answers.
Quick Start
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v
Observation Space
| Field | Type | Description |
|---|---|---|
schema_summary |
string | Compact DB schema (one line per table) |
question |
string | Natural language business question |
last_query |
string | null | Most recent SQL query |
last_result |
object | null | Query result: columns, rows (max 50), error |
last_error |
string | null | SQL error if last query failed |
step |
int | Current step number |
max_steps |
int | Episode step limit |
hints |
string[] | Progressive hints (revealed after step 5, 10, 15) |
done |
bool | Whether episode is complete |
Action Space
Agent must submit exactly one of:
| Action | Type | Description |
|---|---|---|
sql_query |
string | A SELECT or WITH SQL query to execute |
submit_answer |
string | Final answer β ends the episode |
Tasks
| Task | Difficulty | Max Steps | Description |
|---|---|---|---|
monthly_signups |
Easy | 10 | Count signups in the last 30 days |
top_revenue_category |
Medium | 15 | Find highest revenue product category in Q3 |
churn_analysis |
Hard | 20 | Find emails of users who churned after 3 purchases |
Reward Function
Rewards are given at every step (not just episode end):
+0.15β Query executes without error+0.10β Query references a relevant table+0.05β Result has at least one row+0.05β Result is a sensible size-0.02per step beyond step 3 (efficiency penalty)-0.10if agent repeats the same query 3+ times+0.00β0.60on final submission (task grader Γ 0.60)
Usage
Python API
from env import SQLAnalystEnv, Action
env = SQLAnalystEnv(task_id="monthly_signups")
result = env.reset()
print(result.observation.question)
# Agent takes a step
result = env.step(Action(sql_query="SELECT COUNT(*) FROM users WHERE created_at >= DATE('now', '-30 days')"))
print(result.reward)
FastAPI Server
python -m uvicorn env.server:app --host 0.0.0.0 --port 7860
REST endpoints:
POST /resetβ Reset environmentPOST /stepβ Execute actionPOST /stateβ Get current stateWebSocket /wsβ WebSocket for low-latency training
Baseline Inference
export OPENAI_API_KEY=sk-...
python baseline/run_baseline.py
Docker
docker build -t sql-analyst-env .
docker run -p 7860:7860 sql-analyst-env
Tests
pytest tests/ -v
test_env.pyβ OpenEnv contract teststest_graders.pyβ Task grader unit teststest_reward.pyβ Reward calculator tests
All 46 tests pass.
Baseline Scores
| Task | Score | Model |
|---|---|---|
| monthly_signups | ~0.85 | gpt-4o-mini |
| top_revenue_category | ~0.65 | gpt-4o-mini |
| churn_analysis | ~0.40 | gpt-4o-mini |
| Average | ~0.63 | gpt-4o-mini |
File Structure
sql-data-analyst/
βββ env/
β βββ __init__.py
β βββ models.py # Pydantic models
β βββ database.py # SQLite + seeding
β βββ environment.py # Core environment
β βββ reward.py # Reward calculator
β βββ utils.py # Helpers
β βββ server.py # FastAPI server
β βββ tasks/
β βββ __init__.py
β βββ base.py
β βββ easy.py
β βββ medium.py
β βββ hard.py
βββ baseline/
β βββ __init__.py
β βββ run_baseline.py
β βββ prompts.py
βββ tests/
β βββ __init__.py
β βββ test_env.py
β βββ test_graders.py
β βββ test_reward.py
βββ openenv.yaml
βββ Dockerfile
βββ requirements.txt
βββ README.md
License
MIT