Spaces:
Running
title: Self-Improving SQL Agent
emoji: π§
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
- sql
- reinforcement-learning
- contextual-bandit
- linucb
- gepa
Self-Improving SQL Agent
Live demo: huggingface.co/spaces/ar9av/sql-agent-openenv GitHub: Ar9av/sql-agent-openenv
A SQL agent that gets better the more you use it. Ask questions in plain English β the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.
What it does
- Natural language β SQL β type a question, get a query
- Self-repair loop β if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
- Reinforcement learning β a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
- Prompt evolution (GEPA) β every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
- Connect your own DB β drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically
Quickstart
Run locally
# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv
# 2. Install backend dependencies
cd backend
pip install -r requirements.txt
# 3. Set environment variables
export HF_TOKEN=your_huggingface_token # required β no default
export API_BASE_URL=https://router.huggingface.co/v1 # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional
# 4. Build the frontend
cd ../frontend
npm install && npm run build
# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000
Open http://localhost:8000.
Run with Docker
docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
-e HF_TOKEN=your_token \
self-improving-sql-agent
Environment variables
| Variable | Default | Required |
|---|---|---|
HF_TOKEN |
β | Yes |
API_BASE_URL |
https://router.huggingface.co/v1 |
No |
MODEL_NAME |
Qwen/Qwen2.5-72B-Instruct |
No |
GEPA_OPTIMIZE_EVERY |
4 |
No |
DATA_DIR |
./data |
No |
Using the UI
Chat tab
Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.
- Correct / Wrong buttons β rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
- Retry differently β re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach
ER Diagram tab
Visual schema explorer showing all tables, columns, and foreign key relationships.
Benchmark tab (built-in DB only)
Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.
Right sidebar β System Prompt & GEPA
See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. 2/4 Β· optimizes every 4 queries). After each cycle the prompt is rewritten and the generation badge updates.
Connect your own database
Click Connect DB in the top-right:
- SQLite:
/path/to/database.dbor:memory: - PostgreSQL:
postgresql://user:password@host:5432/dbname
The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.
OpenEnv API
The environment exposes a standard OpenEnv interface for agent training:
| Endpoint | Method | Description |
|---|---|---|
POST /reset |
β | Start a new episode, returns Observation |
POST /step |
β | Execute one repair action, returns {observation, reward} |
GET /state |
β | Current episode state |
GET /env/tasks |
β | List all tasks and questions |
GET /env/info |
β | Environment metadata (action/observation space) |
Stdout emits structured logs for each episode:
[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END] {"success": true, "attempts": 1, "total_reward": 0.8}
Action space β 8 discrete repair strategies:
generate, rewrite_full, fix_column, fix_table, add_groupby, rewrite_cte, fix_syntax, change_dialect, relax_filter
Architecture
frontend/ React + Vite (served as static files by FastAPI)
backend/
main.py FastAPI entry point
api/
demo.py SSE streaming endpoints (chat, benchmark, GEPA events)
openenv.py OpenEnv spec routes (/reset, /step, /state)
env/
sql_env.py SQLAgentEnv β episode management, LLM calls
database.py SQLite + PostgreSQL abstraction
tasks.py Benchmark task definitions and grader
rl/
types.py RepairAction enum, RLState, featurize()
bandit.py LinUCB contextual bandit
repair_strategies.py 8 repair prompt templates
grader.py Shaped reward function
gepa/
optimizer.py GEPA: reflect β mutate β score β pareto front
Background
Origin: This is a port of gepa-tuned-sql-agent initally was trying to make it in typescript later to realise we need submission in specific format
The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.
- Self-debug loop β the agent critiques and fixes its own SQL errors without human intervention
- GEPA prompt evolution β after user feedback, an LLM reflects on failures and evolves the system prompt
- Mini-RL environment β a LinUCB contextual bandit learns which repair strategy works best for each error class
Key differences from the original
| gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) | |
|---|---|---|
| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
| Repair context | Single-shot per attempt | Multi-turn β full failure history passed to each retry |