Spaces:
Sleeping
Sleeping
| title: SQL Data Analyst | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: green | |
| sdk: docker | |
| sdk_version: "0.1" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| An RL training environment where an AI agent learns to answer business intelligence questions by writing and executing SQL queries against a live database. | |
| An RL training environment where an AI agent learns to answer business intelligence questions by writing and executing SQL queries against a live database. | |
| ## Motivation | |
| Data analysts spend significant time translating business questions into SQL queries. This environment trains agents to do exactly that β iteratively exploring a database schema, writing queries, observing results, and submitting final answers. | |
| ## Quick Start | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run tests | |
| pytest tests/ -v | |
| ``` | |
| ## Observation Space | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `schema_summary` | string | Compact DB schema (one line per table) | | |
| | `question` | string | Natural language business question | | |
| | `last_query` | string \| null | Most recent SQL query | | |
| | `last_result` | object \| null | Query result: columns, rows (max 50), error | | |
| | `last_error` | string \| null | SQL error if last query failed | | |
| | `step` | int | Current step number | | |
| | `max_steps` | int | Episode step limit | | |
| | `hints` | string[] | Progressive hints (revealed after step 5, 10, 15) | | |
| | `done` | bool | Whether episode is complete | | |
| ## Action Space | |
| Agent must submit exactly one of: | |
| | Action | Type | Description | | |
| |--------|------|-------------| | |
| | `sql_query` | string | A SELECT or WITH SQL query to execute | | |
| | `submit_answer` | string | Final answer β ends the episode | | |
| ## Tasks | |
| | Task | Difficulty | Max Steps | Description | | |
| |------|------------|-----------|--------------| | |
| | `monthly_signups` | Easy | 10 | Count signups in the last 30 days | | |
| | `top_revenue_category` | Medium | 15 | Find highest revenue product category in Q3 | | |
| | `churn_analysis` | Hard | 20 | Find emails of users who churned after 3 purchases | | |
| ## Reward Function | |
| Rewards are given at every step (not just episode end): | |
| - `+0.15` β Query executes without error | |
| - `+0.10` β Query references a relevant table | |
| - `+0.05` β Result has at least one row | |
| - `+0.05` β Result is a sensible size | |
| - `-0.02` per step beyond step 3 (efficiency penalty) | |
| - `-0.10` if agent repeats the same query 3+ times | |
| - `+0.00β0.60` on final submission (task grader Γ 0.60) | |
| ## Usage | |
| ### Python API | |
| ```python | |
| from env import SQLAnalystEnv, Action | |
| env = SQLAnalystEnv(task_id="monthly_signups") | |
| result = env.reset() | |
| print(result.observation.question) | |
| # Agent takes a step | |
| result = env.step(Action(sql_query="SELECT COUNT(*) FROM users WHERE created_at >= DATE('now', '-30 days')")) | |
| print(result.reward) | |
| ``` | |
| ### FastAPI Server | |
| ```bash | |
| python -m uvicorn env.server:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| REST endpoints: | |
| - `POST /reset` β Reset environment | |
| - `POST /step` β Execute action | |
| - `POST /state` β Get current state | |
| - `WebSocket /ws` β WebSocket for low-latency training | |
| ### Baseline Inference | |
| ```bash | |
| export OPENAI_API_KEY=sk-... | |
| python baseline/run_baseline.py | |
| ``` | |
| ### Docker | |
| ```bash | |
| docker build -t sql-analyst-env . | |
| docker run -p 7860:7860 sql-analyst-env | |
| ``` | |
| ## Tests | |
| ```bash | |
| pytest tests/ -v | |
| ``` | |
| - `test_env.py` β OpenEnv contract tests | |
| - `test_graders.py` β Task grader unit tests | |
| - `test_reward.py` β Reward calculator tests | |
| **All 46 tests pass.** | |
| ## Baseline Scores | |
| | Task | Score | Model | | |
| |------|-------|-------| | |
| | monthly_signups | ~0.85 | gpt-4o-mini | | |
| | top_revenue_category | ~0.65 | gpt-4o-mini | | |
| | churn_analysis | ~0.40 | gpt-4o-mini | | |
| | **Average** | **~0.63** | gpt-4o-mini | | |
| ## File Structure | |
| ``` | |
| sql-data-analyst/ | |
| βββ env/ | |
| β βββ __init__.py | |
| β βββ models.py # Pydantic models | |
| β βββ database.py # SQLite + seeding | |
| β βββ environment.py # Core environment | |
| β βββ reward.py # Reward calculator | |
| β βββ utils.py # Helpers | |
| β βββ server.py # FastAPI server | |
| β βββ tasks/ | |
| β βββ __init__.py | |
| β βββ base.py | |
| β βββ easy.py | |
| β βββ medium.py | |
| β βββ hard.py | |
| βββ baseline/ | |
| β βββ __init__.py | |
| β βββ run_baseline.py | |
| β βββ prompts.py | |
| βββ tests/ | |
| β βββ __init__.py | |
| β βββ test_env.py | |
| β βββ test_graders.py | |
| β βββ test_reward.py | |
| βββ openenv.yaml | |
| βββ Dockerfile | |
| βββ requirements.txt | |
| βββ README.md | |
| ``` | |
| ## License | |
| MIT |