--- title: Self-Improving SQL Agent emoji: ๐Ÿง  colorFrom: blue colorTo: purple sdk: docker pinned: false tags: - sql - reinforcement-learning - contextual-bandit - linucb - gepa --- # Self-Improving SQL Agent > **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv) > **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv) A SQL agent that gets better the more you use it. Ask questions in plain English โ€” the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter. --- ## What it does 1. **Natural language โ†’ SQL** โ€” type a question, get a query 2. **Self-repair loop** โ€” if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake 3. **Reinforcement learning** โ€” a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.) 4. **Prompt evolution (GEPA)** โ€” every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward 5. **Connect your own DB** โ€” drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically --- ## Quickstart ### Run locally ```bash # 1. Clone git clone https://github.com/Ar9av/sql-agent-openenv cd sql-agent-openenv # 2. Install backend dependencies cd backend pip install -r requirements.txt # 3. Set environment variables export HF_TOKEN=your_huggingface_token # required โ€” no default export API_BASE_URL=https://router.huggingface.co/v1 # optional export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional # 4. Build the frontend cd ../frontend npm install && npm run build # 5. Start the server cd ../backend uvicorn main:app --host 0.0.0.0 --port 8000 ``` Open [http://localhost:8000](http://localhost:8000). ### Run with Docker ```bash docker build -t self-improving-sql-agent . docker run -p 7860:7860 \ -e HF_TOKEN=your_token \ self-improving-sql-agent ``` ### Environment variables | Variable | Default | Required | |---|---|---| | `HF_TOKEN` | โ€” | **Yes** | | `API_BASE_URL` | `https://router.huggingface.co/v1` | No | | `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No | | `GEPA_OPTIMIZE_EVERY` | `4` | No | | `DATA_DIR` | `./data` | No | --- ## Using the UI ### Chat tab Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy. - **Correct / Wrong buttons** โ€” rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle - **Retry differently** โ€” re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach ### ER Diagram tab Visual schema explorer showing all tables, columns, and foreign key relationships. ### Benchmark tab *(built-in DB only)* Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score. ### Right sidebar โ€” System Prompt & GEPA See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 ยท optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates. ### Connect your own database Click **Connect DB** in the top-right: - **SQLite:** `/path/to/database.db` or `:memory:` - **PostgreSQL:** `postgresql://user:password@host:5432/dbname` The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases. --- ## OpenEnv API The environment exposes a standard OpenEnv interface for agent training: | Endpoint | Method | Description | |---|---|---| | `POST /reset` | โ€” | Start a new episode, returns `Observation` | | `POST /step` | โ€” | Execute one repair action, returns `{observation, reward}` | | `GET /state` | โ€” | Current episode state | | `GET /env/tasks` | โ€” | List all tasks and questions | | `GET /env/info` | โ€” | Environment metadata (action/observation space) | **Stdout** emits structured logs for each episode: ``` [START] {"task_id": "...", "question": "...", "max_attempts": 5} [STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true} [END] {"success": true, "attempts": 1, "total_reward": 0.8} ``` **Action space** โ€” 8 discrete repair strategies: `generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter` --- ## Architecture ``` frontend/ React + Vite (served as static files by FastAPI) backend/ main.py FastAPI entry point api/ demo.py SSE streaming endpoints (chat, benchmark, GEPA events) openenv.py OpenEnv spec routes (/reset, /step, /state) env/ sql_env.py SQLAgentEnv โ€” episode management, LLM calls database.py SQLite + PostgreSQL abstraction tasks.py Benchmark task definitions and grader rl/ types.py RepairAction enum, RLState, featurize() bandit.py LinUCB contextual bandit repair_strategies.py 8 repair prompt templates grader.py Shaped reward function gepa/ optimizer.py GEPA: reflect โ†’ mutate โ†’ score โ†’ pareto front ``` --- ## Background > **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated. 1. **Self-debug loop** โ€” the agent critiques and fixes its own SQL errors without human intervention 2. **GEPA prompt evolution** โ€” after user feedback, an LLM reflects on failures and evolves the system prompt 3. **Mini-RL environment** โ€” a LinUCB contextual bandit learns which repair strategy works best for each error class ### Key differences from the original | | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) | |---|---|---| | Backend | Next.js API routes (TypeScript) | FastAPI (Python) | | Frontend | Next.js pages | React + Vite (static, served by FastAPI) | | LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) | | Deployment | Vercel / local | Hugging Face Spaces (Docker) | | DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN | | Repair context | Single-shot per attempt | Multi-turn โ€” full failure history passed to each retry |