--- title: SQL Arena emoji: 🏟️ colorFrom: blue colorTo: purple sdk: docker pinned: false --- # SQL Arena - OpenEnv Environment An interactive SQL query challenge environment where AI agents learn to write SQL by iteratively querying databases and receiving execution feedback with partial credit scoring. ## Real-World Utility Text-to-SQL is one of the most valuable capabilities for AI agents: - Used by data analysts, business users, and developers daily - Evaluates reasoning, schema understanding, and query composition - Directly applicable to production AI assistants and copilots ## Tasks | Task | Difficulty | Description | Max Steps | |------|-----------|-------------|-----------| | basic_select | Easy | SELECT, WHERE, ORDER BY | 5 | | join_aggregate | Medium | JOINs, GROUP BY, HAVING | 7 | | complex_analysis | Hard | CTEs, window functions | 10 | Each difficulty has 3 unique problems with deterministic grading. ## Action Space The agent sends a SQL query each step: {"sql_query": "SELECT name, salary FROM employees WHERE salary > 80000"} ## Observation Space The agent receives back: - schema_description: Database schema text - question: Natural language question to answer - query_result: Result table from last query - error_message: Error if query failed - feedback: Scoring feedback with hints - expected_columns: Expected column names - attempts_remaining: Steps left - difficulty: Task difficulty level - task_id: Problem identifier ## Reward Function (0.0 to 1.0) | Component | Weight | Description | |-----------|--------|-------------| | Execution | 0.10 | Query runs without error | | Columns | 0.20 | Correct column names | | Row Count | 0.20 | Correct number of rows | | Values | 0.50 | Correct data values | ## Setup pip install -r requirements.txt ## Run Server uvicorn src.sql_arena.server:app --host 0.0.0.0 --port 7860 ## Run Inference set HF_TOKEN=your_token python inference.py ## Docker docker build -t sql-arena . docker run -p 7860:7860 sql-arena ## Run Tests pytest tests/ -v ## Project Structure sql_arena/ - openenv.yaml (Environment metadata) - Dockerfile (Container deployment) - inference.py (Baseline inference script) - src/sql_arena/ - models.py (Typed Pydantic models) - environment.py (Core environment logic) - tasks.py (9 SQL challenges) - graders.py (Partial credit scoring) - database.py (SQLite management) - server.py (FastAPI server) - tests/ - test_env.py (Test suite) ## API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | POST | /reset | Start new episode | | POST | /step | Submit SQL query | | GET | /state | Get current state | | GET | /tasks | List available tasks | | WS | /ws | WebSocket sessions | ## License MIT