sql-agent-openenv / README.md
ar9avg's picture
fix
c293dc3
metadata
title: Self-Improving SQL Agent
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
  - sql
  - reinforcement-learning
  - contextual-bandit
  - linucb
  - gepa

Self-Improving SQL Agent

Live demo: huggingface.co/spaces/ar9av/sql-agent-openenv GitHub: Ar9av/sql-agent-openenv

A SQL agent that gets better the more you use it. Ask questions in plain English β€” the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.


What it does

  1. Natural language β†’ SQL β€” type a question, get a query
  2. Self-repair loop β€” if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
  3. Reinforcement learning β€” a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
  4. Prompt evolution (GEPA) β€” every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
  5. Connect your own DB β€” drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically

Quickstart

Run locally

# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv

# 2. Install backend dependencies
cd backend
pip install -r requirements.txt

# 3. Set environment variables
export HF_TOKEN=your_huggingface_token        # required β€” no default
export API_BASE_URL=https://router.huggingface.co/v1   # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # optional

# 4. Build the frontend
cd ../frontend
npm install && npm run build

# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000

Open http://localhost:8000.

Run with Docker

docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
  -e HF_TOKEN=your_token \
  self-improving-sql-agent

Environment variables

Variable Default Required
HF_TOKEN β€” Yes
API_BASE_URL https://router.huggingface.co/v1 No
MODEL_NAME Qwen/Qwen2.5-72B-Instruct No
GEPA_OPTIMIZE_EVERY 4 No
DATA_DIR ./data No

Using the UI

Chat tab

Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.

  • Correct / Wrong buttons β€” rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
  • Retry differently β€” re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach

ER Diagram tab

Visual schema explorer showing all tables, columns, and foreign key relationships.

Benchmark tab (built-in DB only)

Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.

Right sidebar β€” System Prompt & GEPA

See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. 2/4 Β· optimizes every 4 queries). After each cycle the prompt is rewritten and the generation badge updates.

Connect your own database

Click Connect DB in the top-right:

  • SQLite: /path/to/database.db or :memory:
  • PostgreSQL: postgresql://user:password@host:5432/dbname

The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.


OpenEnv API

The environment exposes a standard OpenEnv interface for agent training:

Endpoint Method Description
POST /reset β€” Start a new episode, returns Observation
POST /step β€” Execute one repair action, returns {observation, reward}
GET /state β€” Current episode state
GET /env/tasks β€” List all tasks and questions
GET /env/info β€” Environment metadata (action/observation space)

Stdout emits structured logs for each episode:

[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP]  {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END]   {"success": true, "attempts": 1, "total_reward": 0.8}

Action space β€” 8 discrete repair strategies: generate, rewrite_full, fix_column, fix_table, add_groupby, rewrite_cte, fix_syntax, change_dialect, relax_filter


Architecture

frontend/          React + Vite (served as static files by FastAPI)
backend/
  main.py          FastAPI entry point
  api/
    demo.py        SSE streaming endpoints (chat, benchmark, GEPA events)
    openenv.py     OpenEnv spec routes (/reset, /step, /state)
  env/
    sql_env.py     SQLAgentEnv β€” episode management, LLM calls
    database.py    SQLite + PostgreSQL abstraction
    tasks.py       Benchmark task definitions and grader
  rl/
    types.py       RepairAction enum, RLState, featurize()
    bandit.py      LinUCB contextual bandit
    repair_strategies.py  8 repair prompt templates
    grader.py      Shaped reward function
  gepa/
    optimizer.py   GEPA: reflect β†’ mutate β†’ score β†’ pareto front

Background

Origin: This is a port of gepa-tuned-sql-agent initally was trying to make it in typescript later to realise we need submission in specific format

The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.

  1. Self-debug loop β€” the agent critiques and fixes its own SQL errors without human intervention
  2. GEPA prompt evolution β€” after user feedback, an LLM reflects on failures and evolves the system prompt
  3. Mini-RL environment β€” a LinUCB contextual bandit learns which repair strategy works best for each error class

Key differences from the original

gepa-tuned-sql-agent Self-Improving SQL Agent (this repo)
Backend Next.js API routes (TypeScript) FastAPI (Python)
Frontend Next.js pages React + Vite (static, served by FastAPI)
LLM Azure OpenAI HF Router (Qwen 2.5-72B)
Deployment Vercel / local Hugging Face Spaces (Docker)
DB support SQLite, PostgreSQL, MySQL SQLite file + PostgreSQL DSN
Repair context Single-shot per attempt Multi-turn β€” full failure history passed to each retry