Spaces:
Running
Running
| name: sql-agent-openenv | |
| version: "1.0.0" | |
| description: > | |
| A SQL generation and repair environment where an AI agent learns to write | |
| correct SQL queries through a self-debugging loop powered by a LinUCB | |
| contextual bandit and GEPA prompt evolution. Models real-world data analyst | |
| workflows β querying databases with natural language, handling errors, and | |
| iteratively improving. | |
| author: sql-agent-openenv-team | |
| tags: | |
| - openenv | |
| - sql | |
| - rl | |
| - nlp | |
| - contextual-bandit | |
| # ββ Endpoints ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| api: | |
| reset: /reset | |
| step: /step | |
| state: /state | |
| # ββ Action Space βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| action_space: | |
| type: discrete | |
| n: 9 | |
| actions: | |
| - name: generate | |
| description: "Generate SQL from scratch (first attempt)" | |
| - name: rewrite_full | |
| description: "Completely rewrite the query from scratch" | |
| - name: fix_column | |
| description: "Fix wrong column name references using schema" | |
| - name: fix_table | |
| description: "Fix wrong table name references or JOIN structure" | |
| - name: add_groupby | |
| description: "Add or fix GROUP BY / aggregation clauses" | |
| - name: rewrite_cte | |
| description: "Restructure CTEs or subqueries" | |
| - name: fix_syntax | |
| description: "Fix syntax errors (brackets, commas, keywords)" | |
| - name: change_dialect | |
| description: "Convert to SQLite-compatible functions" | |
| - name: relax_filter | |
| description: "Broaden or remove overly strict WHERE conditions" | |
| # ββ Observation Space ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| observation_space: | |
| type: dict | |
| fields: | |
| - name: question | |
| type: string | |
| description: "Natural language question to answer with SQL" | |
| - name: schema_info | |
| type: string | |
| description: "Full database schema (tables, columns, types, FK relationships)" | |
| - name: current_sql | |
| type: string | |
| nullable: true | |
| description: "The SQL generated on the last attempt (null on first step)" | |
| - name: error_message | |
| type: string | |
| nullable: true | |
| description: "SQLite error message from the last attempt (null on success)" | |
| - name: error_class | |
| type: string | |
| nullable: true | |
| description: "Classified error type (e.g. no_such_column, syntax_error)" | |
| - name: attempt_number | |
| type: integer | |
| description: "Current attempt number (0 at reset, increments each step)" | |
| - name: max_attempts | |
| type: integer | |
| description: "Maximum allowed attempts per episode (5)" | |
| - name: task_id | |
| type: string | |
| description: "Active task identifier" | |
| - name: task_difficulty | |
| type: string | |
| description: "Task difficulty level: easy | medium | hard" | |
| # ββ Reward βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| reward: | |
| range: [0.05, 0.95] | |
| description: > | |
| Task score is the grader output clamped strictly inside (0, 1). Graders | |
| score partial progress (column correctness, row-count match) and apply | |
| attempt penalties for multi-step repair episodes. | |
| # ββ Tasks ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| tasks: | |
| - id: simple_queries | |
| name: Simple SQL Queries | |
| difficulty: easy | |
| description: > | |
| Single-table SELECT queries. Agent must retrieve correct rows by applying | |
| basic filters and projections on the marketplace database. | |
| question_count: 5 | |
| grader: > | |
| Checks that required output columns are present and row count falls | |
| within expected bounds. Attempt penalty not applied. | |
| - id: join_queries | |
| name: SQL Join Queries | |
| difficulty: medium | |
| description: > | |
| Multi-table JOIN queries with GROUP BY and aggregation. Agent must | |
| correctly join tables and compute aggregates over the marketplace data. | |
| question_count: 5 | |
| grader: > | |
| Correct columns + row count score multiplied by (1.0 - 0.1*(attempts-1)). | |
| Rewards efficient, first-try solutions. | |
| - id: complex_queries | |
| name: Complex SQL Queries | |
| difficulty: hard | |
| description: > | |
| Advanced queries using CTEs, window functions, nested aggregations, and | |
| multi-level joins. Requires precise SQLite syntax knowledge. | |
| question_count: 5 | |
| grader: > | |
| Strict correctness required. Score capped at 0.8 without first-attempt | |
| bonus. Attempt penalty of 0.1*(attempts-1) applied. Hard tasks genuinely | |
| challenge frontier models. | |
| # ββ Environment Metadata βββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| metadata: | |
| max_steps_per_episode: 5 | |
| database: SQLite (marketplace schema β users, products, orders, reviews, sellers) | |
| rl_algorithm: LinUCB contextual bandit (feature_dim=20, 8 repair actions) | |
| prompt_optimizer: GEPA (Generative Evolutionary Prompt Adaptation) | |
| runtime_estimate_minutes: 5 | |
| compute_requirements: | |
| vcpu: 2 | |
| memory_gb: 4 | |