---
title: SQL Data Analyst
emoji: 📊
colorFrom: gray
colorTo: green
sdk: docker
sdk_version: "0.1"
app_file: app.py
pinned: false
license: mit
---

An RL training environment where an AI agent learns to answer business intelligence questions by writing and executing SQL queries against a live database.

An RL training environment where an AI agent learns to answer business intelligence questions by writing and executing SQL queries against a live database.

## Motivation

Data analysts spend significant time translating business questions into SQL queries. This environment trains agents to do exactly that — iteratively exploring a database schema, writing queries, observing results, and submitting final answers.

## Quick Start

```bash
# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/ -v
```

## Observation Space

| Field | Type | Description |
|-------|------|-------------|
| `schema_summary` | string | Compact DB schema (one line per table) |
| `question` | string | Natural language business question |
| `last_query` | string \| null | Most recent SQL query |
| `last_result` | object \| null | Query result: columns, rows (max 50), error |
| `last_error` | string \| null | SQL error if last query failed |
| `step` | int | Current step number |
| `max_steps` | int | Episode step limit |
| `hints` | string[] | Progressive hints (revealed after step 5, 10, 15) |
| `done` | bool | Whether episode is complete |

## Action Space

Agent must submit exactly one of:

| Action | Type | Description |
|--------|------|-------------|
| `sql_query` | string | A SELECT or WITH SQL query to execute |
| `submit_answer` | string | Final answer — ends the episode |

## Tasks

| Task | Difficulty | Max Steps | Description |
|------|------------|-----------|--------------|
| `monthly_signups` | Easy | 10 | Count signups in the last 30 days |
| `top_revenue_category` | Medium | 15 | Find highest revenue product category in Q3 |
| `churn_analysis` | Hard | 20 | Find emails of users who churned after 3 purchases |

## Reward Function

Rewards are given at every step (not just episode end):

- `+0.15` — Query executes without error
- `+0.10` — Query references a relevant table
- `+0.05` — Result has at least one row
- `+0.05` — Result is a sensible size
- `-0.02` per step beyond step 3 (efficiency penalty)
- `-0.10` if agent repeats the same query 3+ times
- `+0.00–0.60` on final submission (task grader × 0.60)

## Usage

### Python API

```python
from env import SQLAnalystEnv, Action

env = SQLAnalystEnv(task_id="monthly_signups")
result = env.reset()
print(result.observation.question)

# Agent takes a step
result = env.step(Action(sql_query="SELECT COUNT(*) FROM users WHERE created_at >= DATE('now', '-30 days')"))
print(result.reward)
```

### FastAPI Server

```bash
python -m uvicorn env.server:app --host 0.0.0.0 --port 7860
```

REST endpoints:
- `POST /reset` — Reset environment
- `POST /step` — Execute action
- `POST /state` — Get current state
- `WebSocket /ws` — WebSocket for low-latency training

### Baseline Inference

```bash
export OPENAI_API_KEY=sk-...
python baseline/run_baseline.py
```

### Docker

```bash
docker build -t sql-analyst-env .
docker run -p 7860:7860 sql-analyst-env
```

## Tests

```bash
pytest tests/ -v
```

- `test_env.py` — OpenEnv contract tests
- `test_graders.py` — Task grader unit tests  
- `test_reward.py` — Reward calculator tests

**All 46 tests pass.**

## Baseline Scores

| Task | Score | Model |
|------|-------|-------|
| monthly_signups | ~0.85 | gpt-4o-mini |
| top_revenue_category | ~0.65 | gpt-4o-mini |
| churn_analysis | ~0.40 | gpt-4o-mini |
| **Average** | **~0.63** | gpt-4o-mini |

## File Structure

```
sql-data-analyst/
├── env/
│   ├── __init__.py
│   ├── models.py           # Pydantic models
│   ├── database.py         # SQLite + seeding
│   ├── environment.py      # Core environment
│   ├── reward.py           # Reward calculator
│   ├── utils.py            # Helpers
│   ├── server.py           # FastAPI server
│   └── tasks/
│       ├── __init__.py
│       ├── base.py
│       ├── easy.py
│       ├── medium.py
│       └── hard.py
├── baseline/
│   ├── __init__.py
│   ├── run_baseline.py
│   └── prompts.py
├── tests/
│   ├── __init__.py
│   ├── test_env.py
│   ├── test_graders.py
│   └── test_reward.py
├── openenv.yaml
├── Dockerfile
├── requirements.txt
└── README.md
```

## License

MIT