---
title: Self-Improving SQL Agent
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
  - sql
  - reinforcement-learning
  - contextual-bandit
  - linucb
  - gepa
---

# Self-Improving SQL Agent

> **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
> **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)

A SQL agent that gets better the more you use it. Ask questions in plain English — the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.

---

## What it does

1. **Natural language → SQL** — type a question, get a query
2. **Self-repair loop** — if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
3. **Reinforcement learning** — a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
4. **Prompt evolution (GEPA)** — every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
5. **Connect your own DB** — drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically

---

## Quickstart

### Run locally

```bash
# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv

# 2. Install backend dependencies
cd backend
pip install -r requirements.txt

# 3. Set environment variables
export HF_TOKEN=your_huggingface_token        # required — no default
export API_BASE_URL=https://router.huggingface.co/v1   # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # optional

# 4. Build the frontend
cd ../frontend
npm install && npm run build

# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000
```

Open [http://localhost:8000](http://localhost:8000).

### Run with Docker

```bash
docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
  -e HF_TOKEN=your_token \
  self-improving-sql-agent
```

### Environment variables

| Variable | Default | Required |
|---|---|---|
| `HF_TOKEN` | — | **Yes** |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | No |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No |
| `GEPA_OPTIMIZE_EVERY` | `4` | No |
| `DATA_DIR` | `./data` | No |

---

## Using the UI

### Chat tab
Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.

- **Correct / Wrong buttons** — rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
- **Retry differently** — re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach

### ER Diagram tab
Visual schema explorer showing all tables, columns, and foreign key relationships.

### Benchmark tab *(built-in DB only)*
Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.

### Right sidebar — System Prompt & GEPA
See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 · optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.

### Connect your own database
Click **Connect DB** in the top-right:

- **SQLite:** `/path/to/database.db` or `:memory:`
- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`

The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.

---

## OpenEnv API

The environment exposes a standard OpenEnv interface for agent training:

| Endpoint | Method | Description |
|---|---|---|
| `POST /reset` | — | Start a new episode, returns `Observation` |
| `POST /step` | — | Execute one repair action, returns `{observation, reward}` |
| `GET /state` | — | Current episode state |
| `GET /env/tasks` | — | List all tasks and questions |
| `GET /env/info` | — | Environment metadata (action/observation space) |

**Stdout** emits structured logs for each episode:
```
[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP]  {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END]   {"success": true, "attempts": 1, "total_reward": 0.8}
```

**Action space** — 8 discrete repair strategies:
`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`

---

## Architecture

```
frontend/          React + Vite (served as static files by FastAPI)
backend/
  main.py          FastAPI entry point
  api/
    demo.py        SSE streaming endpoints (chat, benchmark, GEPA events)
    openenv.py     OpenEnv spec routes (/reset, /step, /state)
  env/
    sql_env.py     SQLAgentEnv — episode management, LLM calls
    database.py    SQLite + PostgreSQL abstraction
    tasks.py       Benchmark task definitions and grader
  rl/
    types.py       RepairAction enum, RLState, featurize()
    bandit.py      LinUCB contextual bandit
    repair_strategies.py  8 repair prompt templates
    grader.py      Shaped reward function
  gepa/
    optimizer.py   GEPA: reflect → mutate → score → pareto front
```

---

## Background

> **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format

The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.

1. **Self-debug loop** — the agent critiques and fixes its own SQL errors without human intervention
2. **GEPA prompt evolution** — after user feedback, an LLM reflects on failures and evolves the system prompt
3. **Mini-RL environment** — a LinUCB contextual bandit learns which repair strategy works best for each error class

### Key differences from the original

| | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) |
|---|---|---|
| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
| Repair context | Single-shot per attempt | Multi-turn — full failure history passed to each retry |