Spaces:

ar9av
/

sql-agent-openenv

Running

App Files Files Community

sql-agent-openenv / README.md

ar9avg

fix

c293dc3 6 days ago

preview code

raw

history blame contribute delete

7.14 kB

metadata

title: Self-Improving SQL Agent
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
  - sql
  - reinforcement-learning
  - contextual-bandit
  - linucb
  - gepa

Self-Improving SQL Agent

Live demo: huggingface.co/spaces/ar9av/sql-agent-openenv GitHub: Ar9av/sql-agent-openenv

A SQL agent that gets better the more you use it. Ask questions in plain English — the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.

What it does

Natural language → SQL — type a question, get a query
Self-repair loop — if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
Reinforcement learning — a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
Prompt evolution (GEPA) — every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
Connect your own DB — drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically

Quickstart

Run locally

# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv

# 2. Install backend dependencies
cd backend
pip install -r requirements.txt

# 3. Set environment variables
export HF_TOKEN=your_huggingface_token        # required — no default
export API_BASE_URL=https://router.huggingface.co/v1   # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # optional

# 4. Build the frontend
cd ../frontend
npm install && npm run build

# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000

Open http://localhost:8000.

Run with Docker

docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
  -e HF_TOKEN=your_token \
  self-improving-sql-agent

Environment variables

Variable	Default	Required
`HF_TOKEN`	—	Yes
`API_BASE_URL`	`https://router.huggingface.co/v1`	No
`MODEL_NAME`	`Qwen/Qwen2.5-72B-Instruct`	No
`GEPA_OPTIMIZE_EVERY`	`4`	No
`DATA_DIR`	`./data`	No

Using the UI

Chat tab

Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.

Correct / Wrong buttons — rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
Retry differently — re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach

ER Diagram tab

Visual schema explorer showing all tables, columns, and foreign key relationships.

Benchmark tab (built-in DB only)

Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.

Right sidebar — System Prompt & GEPA

See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. 2/4 · optimizes every 4 queries). After each cycle the prompt is rewritten and the generation badge updates.

Connect your own database

Click Connect DB in the top-right:

SQLite: /path/to/database.db or :memory:
PostgreSQL: postgresql://user:password@host:5432/dbname

The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.

OpenEnv API

The environment exposes a standard OpenEnv interface for agent training:

Endpoint	Method	Description
`POST /reset`	—	Start a new episode, returns `Observation`
`POST /step`	—	Execute one repair action, returns `{observation, reward}`
`GET /state`	—	Current episode state
`GET /env/tasks`	—	List all tasks and questions
`GET /env/info`	—	Environment metadata (action/observation space)

Stdout emits structured logs for each episode:

[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP]  {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END]   {"success": true, "attempts": 1, "total_reward": 0.8}

Action space — 8 discrete repair strategies: generate, rewrite_full, fix_column, fix_table, add_groupby, rewrite_cte, fix_syntax, change_dialect, relax_filter

Architecture

frontend/          React + Vite (served as static files by FastAPI)
backend/
  main.py          FastAPI entry point
  api/
    demo.py        SSE streaming endpoints (chat, benchmark, GEPA events)
    openenv.py     OpenEnv spec routes (/reset, /step, /state)
  env/
    sql_env.py     SQLAgentEnv — episode management, LLM calls
    database.py    SQLite + PostgreSQL abstraction
    tasks.py       Benchmark task definitions and grader
  rl/
    types.py       RepairAction enum, RLState, featurize()
    bandit.py      LinUCB contextual bandit
    repair_strategies.py  8 repair prompt templates
    grader.py      Shaped reward function
  gepa/
    optimizer.py   GEPA: reflect → mutate → score → pareto front

Background

Origin: This is a port of gepa-tuned-sql-agent initally was trying to make it in typescript later to realise we need submission in specific format

The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.

Self-debug loop — the agent critiques and fixes its own SQL errors without human intervention
GEPA prompt evolution — after user feedback, an LLM reflects on failures and evolves the system prompt
Mini-RL environment — a LinUCB contextual bandit learns which repair strategy works best for each error class

Key differences from the original

	gepa-tuned-sql-agent	Self-Improving SQL Agent (this repo)
Backend	Next.js API routes (TypeScript)	FastAPI (Python)
Frontend	Next.js pages	React + Vite (static, served by FastAPI)
LLM	Azure OpenAI	HF Router (Qwen 2.5-72B)
Deployment	Vercel / local	Hugging Face Spaces (Docker)
DB support	SQLite, PostgreSQL, MySQL	SQLite file + PostgreSQL DSN
Repair context	Single-shot per attempt	Multi-turn — full failure history passed to each retry