File size: 4,237 Bytes
9e64e71 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # SQLEnv Project Meeting: Key Discussion Points
## Meeting Tomorrow: Short List of Focus Areas
Based on our goal to develop a Minimum Viable Product (MVP) quickly, focusing on data ingestion/curation and a simple end-to-end workflow, here are some questions to guide our discussion and help in creating actionable tickets/user stories:
### 1. Data Ingestion & Curation
* **Initial Dataset Strategy:**
* **What:** Sourcing/organizing the Spider dataset and selecting MVP questions.
* **Why:** Dataset quality impacts agent learning and evaluation validity, streamlining Phase 2/3.
* **Challenge:** Balancing diversity, difficulty, and efficient curation within MVP timeline.
* What is the team's preferred approach for acquiring and organizing the initial 50-100 Spider development set databases and their corresponding curated questions? Are there specific tools or processes we should leverage for this?
* How should we prioritize the selection of "easy," "medium," and "hard" questions to ensure a balanced and progressive learning curriculum for the agent in the MVP? Should we manually select for diversity, or randomly sample to ensure unbiased evaluation?
* **Answer Verification & Metadata:**
* **What:** Robustly verifying agent answers across data types and defining essential question metadata.
* **Why:** Accurate verification is crucial for correct reward computation, and rich metadata supports future analysis.
* **Challenge:** Ensuring comprehensive, scalable verification logic within a lean JSON format for MVP.
* Given the multiple answer types (integer, float, string, list, table) and the need for robust verification, are there any specific concerns or complexities we should address early in the implementation of the `verify_answer` function?
* Are there any additional metadata fields we should consider including in our question JSON format to facilitate future curriculum learning or analysis beyond what is currently proposed?
### 2. Simple End-to-End Workflow (Phases 1 & 2)
* **Environment Scaffolding (Phase 1):**
* **What:** Assigning ownership for initial OpenEnv setup, ensuring fundamental connectivity/responsiveness.
* **Why:** A stable environment foundation is critical for all subsequent database integration and training.
* **Challenge:** Quickly establishing a working Dockerized environment with reliable WebSocket communication.
* Who will take the lead on the initial OpenEnv environment setup, including running `openenv init sql_env`, customizing the Pydantic models (`SQLAction`, `SQLObservation`), and implementing the stub `reset()` and `step()` functions to achieve a passing `openenv validate`?
* What are our immediate priorities for establishing reliable WebSocket communication between the client and server components during this initial phase?
* **Core Loop Implementation (Phase 2):**
* **What:** Efficiently integrating SQLite databases and implementing core agent actions for a complete, simply rewarded episode.
* **Why:** This delivers a manually playable environment, a crucial MVP milestone for reward shaping and training.
* **Challenge:** Robust database interactions (loading, sandboxing) and accurate action handling within budget.
* For wiring up real SQLite databases and implementing the core actions (`DESCRIBE`, `SAMPLE`, `QUERY`, `ANSWER`), what are the critical dependencies or potential roadblocks we anticipate, and how can we proactively address them?
* To achieve a manually playable episode with terminal-only reward, what are the key components of the `SQLEnvironment` (e.g., SQLite loader, action handlers, budget tracking, sandboxing) that we should focus on first to demonstrate functionality quickly?
* How will we define and implement the "hardcoded cheat policy" to ensure a 100% success rate during manual testing, confirming the environment's basic functionality? While the primary goal is correctness (achieving the terminal reward), how might we later demonstrate a policy that *maximizes total reward* by strategically exploring before providing the correct answer, given the capped exploration rewards?
|