Clarification Questions: F001 - Core Environment Loop
Generated: 2026-03-24 Research Summary: specs/F001-RESEARCH_SUMMARY.md Status: Answered
Questions
| # | Category | Question | Default Assumption | Impact if Wrong | Answer |
|---|---|---|---|---|---|
| 1 | Dependencies | Research found no .sqlite database files anywhere in the repo, and download_spider_data.py only downloads question JSON (not databases). The ORM models in data/databases/models.py define the schema but no data exists. Should we generate the SQLite database from ORM models + seed with synthetic data, or download the actual Spider SQLite databases from HuggingFace? |
Generate from ORM models using Base.metadata.create_all() and seed with minimal synthetic data (enough for 53 questions to produce results). This avoids a new download dependency and keeps the repo self-contained. |
High | Download the actual Spider SQLite databases. Synthetic data won't match gold SQL answers. Synthetic data generation saved as a separate future feature for robustness/metamorphic testing. |
| 2 | Scope | Research found that SQLObservation currently carries only messages and tokens, while the v1 spec (Section 2.2) and the commented-out fields in models.py (lines 88-103) define rich fields: question, schema_info, result, error, step_count, budget_remaining, action_history. Should F001 uncomment and populate the rich observation fields, or continue with messages-only? |
Uncomment and populate the rich observation fields. This is what the v1 spec defines and what an RL agent needs for clean state representation. Keep messages and tokens as well for backward compatibility. |
High | Yes, uncomment and populate rich observation fields. This matches the v1 spec and is what the reward system needs. |
| 3 | Scope | Research found that SQLAction.action_description is currently used for NL text (e.g., "show students table"), but the v1 spec (Section 2.2) defines a separate argument field for structured input (table name or SQL string). Should we add an argument field to SQLAction, or repurpose action_description as the structured argument? |
Repurpose action_description as the structured argument (table name for DESCRIBE/SAMPLE, SQL for QUERY, answer value for ANSWER). This avoids breaking the Pydantic model schema and the client serialization. Rename to argument only if a clean break is acceptable. |
Medium | Using action_description for structured data is semantically confusing but functionally correct. Choosing wrong means either a confusing API (if we keep the name) or a breaking change to client + tests (if we rename). Contained rework either way. |
| 4 | Scope | Research found message_to_action() and _detect_action_type() implement NL keyword-based action detection (lines 455-545). With structured actions, the agent sends action_type directly. These methods also append messages to history and tokenize -- tightly coupling NL parsing with state management. Should we remove/deprecate these methods, or keep them as an alternative input path? |
Remove _detect_action_type() entirely. Refactor message_to_action() to be a thin adapter that extracts structured fields from the message without NL keyword detection, if OpenEnv requires this method. If OpenEnv does not require it, remove it too. |
Low | This is purely about internal code hygiene. The structured action path works regardless of whether these methods exist. Easily changed in a follow-up. |
Categories
- Scope: What's in/out of the feature boundary
- Constraints: Technical, performance, or compatibility limits
- Edge Cases: Unusual inputs or states that need handling
- Priorities: What to optimize for when trade-offs arise
- Dependencies: External systems, libraries, or features required
Instructions for Human
- Answer any questions where the default assumption does not match your intent
- Leave blank to accept the default assumption
- Type "skip" to skip all questions and proceed with all defaults