Spaces:

ar9av
/

sql-agent-openenv

Sleeping

App Files Files Community

ar9avg commited on 9 days ago

Commit

9f7dd14

1 Parent(s): 965a112

fix

Browse files

Files changed (3) hide show

README.md +159 -31
backend/main.py +4 -3
frontend/src/components/Header.tsx +2 -2

README.md CHANGED Viewed

@@ -1,51 +1,179 @@
-# SQL Agent OpenEnv
-An OpenEnv-compliant RL environment for SQL generation, featuring:
-- **LinUCB contextual bandit** — selects repair strategies based on error context
-- **GEPA (Generative Evolutionary Prompt Adaptation)** — evolves the system prompt from failure patterns
-- **Multi-turn repair loop** — LLM receives full failure history so each retry learns from the previous error
-- **3 difficulty tiers** — easy / medium / hard benchmark tasks on a built-in e-commerce schema
-- **Shaped reward function** — success bonus, attempt penalty, error severity signal
-- **External database support** — connect any SQLite file or PostgreSQL database via connection string
-## Background
-The original [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) explored three research ideas in a Next.js stack which i started keeping this hackathon in mind 1 week back and relaising the submission criteria had to migrate to python:
-1. **Self-debug loop** — the agent critiques and fixes its own SQL errors without human intervention
-2. **GEPA prompt evolution** — after user feedback, an LLM reflects on failures and evolves the system prompt
-3. **Mini-RL environment** — a LinUCB contextual bandit learns which repair strategy works best for each error class
-This repo preserves all three ideas and adapts them to a Python backend so the environment can be packaged as a Docker container and hosted on Hugging Face Spaces.
-## Key differences from the original
-| | gepa-tuned-sql-agent | sql-agent-openenv (this repo) |
 |---|---|---|
-| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
-| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
-| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
-| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
-| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
 ## OpenEnv API
 | Endpoint | Method | Description |
 |---|---|---|
-| `/reset` | POST | Start a new episode |
-| `/step` | POST | Execute one repair action |
-| `/state` | GET | Current environment state |
-## Connect your own database
-In the UI click **Connect DB** (top-right) and enter:
-- **SQLite:** `/path/to/your/database.db` or `:memory:`
-- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`
-The agent auto-detects the dialect and adjusts its prompt accordingly.
-## Demo
-Click **Demo** in the top-right to watch the agent fail, self-repair via RL, then improve through two GEPA prompt-evolution cycles (42% → 91%).

+---
+title: Self-Improving SQL Agent
+emoji: 🧠
+colorFrom: blue
+colorTo: purple
+sdk: docker
+pinned: false
+tags:
+  - sql
+  - reinforcement-learning
+  - contextual-bandit
+  - linucb
+  - gepa
+---
+# Self-Improving SQL Agent
+> **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
+> **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)
+A SQL agent that gets better the more you use it. Ask questions in plain English — the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.
+---
+## What it does
+1. **Natural language → SQL** — type a question, get a query
+2. **Self-repair loop** — if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
+3. **Reinforcement learning** — a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
+4. **Prompt evolution (GEPA)** — every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
+5. **Connect your own DB** — drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically
+---
+## Quickstart
+### Run locally
+```bash
+# 1. Clone
+git clone https://github.com/Ar9av/sql-agent-openenv
+cd sql-agent-openenv
+# 2. Install backend dependencies
+cd backend
+pip install -r requirements.txt
+# 3. Set environment variables
+export HF_TOKEN=your_huggingface_token        # required — no default
+export API_BASE_URL=https://router.huggingface.co/v1   # optional
+export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # optional
+# 4. Build the frontend
+cd ../frontend
+npm install && npm run build
+# 5. Start the server
+cd ../backend
+uvicorn main:app --host 0.0.0.0 --port 8000
+```
+Open [http://localhost:8000](http://localhost:8000).
+### Run with Docker
+```bash
+docker build -t self-improving-sql-agent .
+docker run -p 7860:7860 \
+  -e HF_TOKEN=your_token \
+  self-improving-sql-agent
+```
+### Environment variables
+| Variable | Default | Required |
 |---|---|---|
+| `HF_TOKEN` | — | **Yes** |
+| `API_BASE_URL` | `https://router.huggingface.co/v1` | No |
+| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No |
+| `GEPA_OPTIMIZE_EVERY` | `4` | No |
+| `DATA_DIR` | `./data` | No |
+---
+## Using the UI
+### Chat tab
+Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.
+- **Correct / Wrong buttons** — rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
+- **Retry differently** — re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach
+### ER Diagram tab
+Visual schema explorer showing all tables, columns, and foreign key relationships.
+### Benchmark tab *(built-in DB only)*
+Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.
+### Right sidebar — System Prompt & GEPA
+See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 · optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.
+### Connect your own database
+Click **Connect DB** in the top-right:
+- **SQLite:** `/path/to/database.db` or `:memory:`
+- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`
+The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.
+---
 ## OpenEnv API
+The environment exposes a standard OpenEnv interface for agent training:
 | Endpoint | Method | Description |
 |---|---|---|
+| `POST /reset` | — | Start a new episode, returns `Observation` |
+| `POST /step` | — | Execute one repair action, returns `{observation, reward}` |
+| `GET /state` | — | Current episode state |
+| `GET /env/tasks` | — | List all tasks and questions |
+| `GET /env/info` | — | Environment metadata (action/observation space) |
+**Stdout** emits structured logs for each episode:
+```
+[START] {"task_id": "...", "question": "...", "max_attempts": 5}
+[STEP]  {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
+[END]   {"success": true, "attempts": 1, "total_reward": 0.8}
+```
+**Action space** — 8 discrete repair strategies:
+`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`
+---
+## Architecture
+```
+frontend/          React + Vite (served as static files by FastAPI)
+backend/
+  main.py          FastAPI entry point
+  api/
+    demo.py        SSE streaming endpoints (chat, benchmark, GEPA events)
+    openenv.py     OpenEnv spec routes (/reset, /step, /state)
+  env/
+    sql_env.py     SQLAgentEnv — episode management, LLM calls
+    database.py    SQLite + PostgreSQL abstraction
+    tasks.py       Benchmark task definitions and grader
+  rl/
+    types.py       RepairAction enum, RLState, featurize()
+    bandit.py      LinUCB contextual bandit
+    repair_strategies.py  8 repair prompt templates
+    grader.py      Shaped reward function
+  gepa/
+    optimizer.py   GEPA: reflect → mutate → score → pareto front
+```
+---
+## Background
+> **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) — originally built as a TypeScript/Next.js application — rewritten in Python (FastAPI + React/Vite) to match the OpenEnv format and deploy on Hugging Face Spaces.
+The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.
+1. **Self-debug loop** — the agent critiques and fixes its own SQL errors without human intervention
+2. **GEPA prompt evolution** — after user feedback, an LLM reflects on failures and evolves the system prompt
+3. **Mini-RL environment** — a LinUCB contextual bandit learns which repair strategy works best for each error class
+### Key differences from the original
+| | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) |
+|---|---|---|
+| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
+| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
+| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
+| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
+| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
+| Repair context | Single-shot per attempt | Multi-turn — full failure history passed to each retry |

backend/main.py CHANGED Viewed

@@ -28,10 +28,11 @@ from api.openenv import router as openenv_router, ResetRequest, StepRequest, env
 from env.database import ensure_seeded
 app = FastAPI(
-    title="SQL Agent OpenEnv",
     description=(
-        "A SQL generation environment powered by a LinUCB contextual bandit "
-        "and GEPA prompt evolution, built for the Meta + Hugging Face OpenEnv hackathon."
     ),
     version="1.0.0",
 )

 from env.database import ensure_seeded
 app = FastAPI(
+    title="Self-Improving SQL Agent",
     description=(
+        "A SQL generation environment that learns from its own mistakes. "
+        "Powered by a LinUCB contextual bandit for repair strategy selection "
+        "and GEPA prompt evolution for continuous self-improvement."
     ),
     version="1.0.0",
 )

frontend/src/components/Header.tsx CHANGED Viewed

@@ -37,10 +37,10 @@ export function Header({ onToggleLeft, onToggleRight, onDemo, onConnectDb }: Hea
         {/* Title */}
         <div>
           <h1 className="text-sm font-bold text-white tracking-tight leading-none">
-            SQL Agent OpenEnv
           </h1>
           <p className="text-[10px] text-gray-600 hidden sm:block mt-0.5">
-            Reinforcement Learning Environment
           </p>
         </div>
       </div>

         {/* Title */}
         <div>
           <h1 className="text-sm font-bold text-white tracking-tight leading-none">
+            Self-Improving SQL Agent
           </h1>
           <p className="text-[10px] text-gray-600 hidden sm:block mt-0.5">
+            LinUCB · GEPA · OpenEnv
           </p>
         </div>
       </div>