sql-arena / README.md
rahul2124's picture
Upload folder using huggingface_hub
99aa2be verified
metadata
title: SQL Arena
emoji: 🏟️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

SQL Arena - OpenEnv Environment

An interactive SQL query challenge environment where AI agents learn to write SQL by iteratively querying databases and receiving execution feedback with partial credit scoring.

Real-World Utility

Text-to-SQL is one of the most valuable capabilities for AI agents:

  • Used by data analysts, business users, and developers daily
  • Evaluates reasoning, schema understanding, and query composition
  • Directly applicable to production AI assistants and copilots

Tasks

Task Difficulty Description Max Steps
basic_select Easy SELECT, WHERE, ORDER BY 5
join_aggregate Medium JOINs, GROUP BY, HAVING 7
complex_analysis Hard CTEs, window functions 10

Each difficulty has 3 unique problems with deterministic grading.

Action Space

The agent sends a SQL query each step:

{"sql_query": "SELECT name, salary FROM employees WHERE salary > 80000"}

Observation Space

The agent receives back:

  • schema_description: Database schema text
  • question: Natural language question to answer
  • query_result: Result table from last query
  • error_message: Error if query failed
  • feedback: Scoring feedback with hints
  • expected_columns: Expected column names
  • attempts_remaining: Steps left
  • difficulty: Task difficulty level
  • task_id: Problem identifier

Reward Function (0.0 to 1.0)

Component Weight Description
Execution 0.10 Query runs without error
Columns 0.20 Correct column names
Row Count 0.20 Correct number of rows
Values 0.50 Correct data values

Setup

pip install -r requirements.txt

Run Server

uvicorn src.sql_arena.server:app --host 0.0.0.0 --port 7860

Run Inference

set HF_TOKEN=your_token python inference.py

Docker

docker build -t sql-arena . docker run -p 7860:7860 sql-arena

Run Tests

pytest tests/ -v

Project Structure

sql_arena/

  • openenv.yaml (Environment metadata)
  • Dockerfile (Container deployment)
  • inference.py (Baseline inference script)
  • src/sql_arena/
    • models.py (Typed Pydantic models)
    • environment.py (Core environment logic)
    • tasks.py (9 SQL challenges)
    • graders.py (Partial credit scoring)
    • database.py (SQLite management)
    • server.py (FastAPI server)
  • tests/
    • test_env.py (Test suite)

API Endpoints

Method Endpoint Description
POST /reset Start new episode
POST /step Submit SQL query
GET /state Get current state
GET /tasks List available tasks
WS /ws WebSocket sessions

License

MIT