YashashMathur's picture
SQL Data Analyst OpenEnv - Initial commit
d103a0f verified
|
raw
history blame
4.22 kB

PRD: SQL Data Analyst Agent Environment (OpenEnv)1. Executive SummaryThe SQL Data Analyst Agent environment is a production-grade reinforcement learning (RL) space designed to train agents in autonomous data retrieval and analysis. Unlike toy simulations, this environment subjects agents to "messy" real-world database schemas and natural language business queries, requiring them to perform multi-step reasoning, join operations, and query optimization. Success is measured by the agent's ability to return correct data subsets through valid, efficient SQL.2. Core Specification & ArchitectureThe environment follows the 3-component pattern (Models, Client, Server) and the 3-method interface (reset, step, state) mandated by the OpenEnv specification.2.1 Technical StackFramework: OpenEnv v0.2.1+.Server: FastAPI with Uvicorn (WebSocket-enabled via /ws for low-latency training).Database: SQLite or DuckDB (container-local for zero network overhead).Isolation: Docker-based containerization for secure execution of arbitrary SQL.2.2 OpenEnv Interfacereset(task_id: str): Initializes a fresh instance of the "messy" database and returns the schema and business question.step(action: SQLAction): Executes the agent's SQL, captures the output/errors, and returns the next observation and reward.state(): Provides internal episode metadata, including episode_id and step_count, for debugging.3. Data Models (Type-Safe Contracts)All interactions are governed by Pydantic models to ensure schema enforcement and tool reliability.ModelFieldTypeDescriptionSQLActionsql_querystrThe SQL command to execute against the database.is_doneboolFlag to signal the agent has completed the task.SQLObservationschemaDictJSON representation of tables, columns, and types.last_resultListThe first $5$ rows of the previous query result.error_messageOptional[str]Raw SQL error trace if the query failed.step_historyList[str]The last $4$ actions taken to prevent infinite loops [Image 1].4. Multi-Level Task CurriculumThe environment implements a 3-tier curriculum with deterministic graders scoring from $0.0$ to $1.0$.Task 1: Warmup (Easy) - Fix Broken JoinScenario: A query uses a comma-separated cross-join causing a Cartesian product.Goal: Rewrite using INNER JOIN... ON.Grader: Binary match of the resulting dataset count.Task 2: Intermediate (Medium) - Category RevenueScenario: Calculate highest revenue in a specific quarter across messy product/sales tables.Goal: Use JOIN, SUM(), GROUP BY, and ORDER BY.Grader: $0.5$ for correct join + $0.5$ for matching final revenue value.Task 3: Advanced (Hard) - Churn Analysis & OptimizationScenario: Find users who churned after their 3rd purchase using subqueries or window functions.Goal: Optimize a slow, redundant query by removing DISTINCT and replacing LIKE with sargable predicates.Grader: $0.6$ for data accuracy + $0.4$ for reducing query execution cost.5. Reward Design (Partial Progress)To avoid sparse reward pitfalls, the environment provides dense feedback via shaped signals.The total step reward $R_{step}$ is calculated as:$$R_{step} = \text{Delta_Reward} + \text{Invalid_Penalty} + \text{Efficiency_Penalty}$$Delta Reward: $+0.0–0.50 \times \Delta \text{grader_score}$. Positive signal when the agent's SQL results move closer to the ground truth.Completion Bonus: $+0.50$ when is_done=True and the grader score is $\geq 0.80$.Invalid Penalty: $-0.10$ for unparseable queries or SQL syntax errors to discourage brute-forcing.Efficiency Penalty: $-0.02$ per step after the episode midpoint to encourage concise solutions.6. Implementation & Compliance ChecklistTo be eligible for the Meta Hackathon, the following technical requirements must be met :Infrastructure: Must run on $2$ vCPU, $8$GB RAM.Deployment: One-command push to Hugging Face Spaces via openenv push.Validation: Must pass openenv validate for spec compliance.Baseline (inference.py):Must use the OpenAI Client for all LLM calls.Runtime must be $< 20$ minutes for all three tasks.Must emit structured logs to stdout following the , , and format exactly as specified.Log TagRequired Fieldstask_id, task_name, difficulty step_count, action, reward, done total_steps, final_reward, task_score