sql_env / specs /F001-CLARIFICATION_QUESTIONS.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified

Clarification Questions: F001 - Core Environment Loop

Generated: 2026-03-24 Research Summary: specs/F001-RESEARCH_SUMMARY.md Status: Answered


Questions

# Category Question Default Assumption Impact if Wrong Answer
1 Dependencies Research found no .sqlite database files anywhere in the repo, and download_spider_data.py only downloads question JSON (not databases). The ORM models in data/databases/models.py define the schema but no data exists. Should we generate the SQLite database from ORM models + seed with synthetic data, or download the actual Spider SQLite databases from HuggingFace? Generate from ORM models using Base.metadata.create_all() and seed with minimal synthetic data (enough for 53 questions to produce results). This avoids a new download dependency and keeps the repo self-contained. High Download the actual Spider SQLite databases. Synthetic data won't match gold SQL answers. Synthetic data generation saved as a separate future feature for robustness/metamorphic testing.
2 Scope Research found that SQLObservation currently carries only messages and tokens, while the v1 spec (Section 2.2) and the commented-out fields in models.py (lines 88-103) define rich fields: question, schema_info, result, error, step_count, budget_remaining, action_history. Should F001 uncomment and populate the rich observation fields, or continue with messages-only? Uncomment and populate the rich observation fields. This is what the v1 spec defines and what an RL agent needs for clean state representation. Keep messages and tokens as well for backward compatibility. High Yes, uncomment and populate rich observation fields. This matches the v1 spec and is what the reward system needs.
3 Scope Research found that SQLAction.action_description is currently used for NL text (e.g., "show students table"), but the v1 spec (Section 2.2) defines a separate argument field for structured input (table name or SQL string). Should we add an argument field to SQLAction, or repurpose action_description as the structured argument? Repurpose action_description as the structured argument (table name for DESCRIBE/SAMPLE, SQL for QUERY, answer value for ANSWER). This avoids breaking the Pydantic model schema and the client serialization. Rename to argument only if a clean break is acceptable. Medium Using action_description for structured data is semantically confusing but functionally correct. Choosing wrong means either a confusing API (if we keep the name) or a breaking change to client + tests (if we rename). Contained rework either way.
4 Scope Research found message_to_action() and _detect_action_type() implement NL keyword-based action detection (lines 455-545). With structured actions, the agent sends action_type directly. These methods also append messages to history and tokenize -- tightly coupling NL parsing with state management. Should we remove/deprecate these methods, or keep them as an alternative input path? Remove _detect_action_type() entirely. Refactor message_to_action() to be a thin adapter that extracts structured fields from the message without NL keyword detection, if OpenEnv requires this method. If OpenEnv does not require it, remove it too. Low This is purely about internal code hygiene. The structured action path works regardless of whether these methods exist. Easily changed in a follow-up.

Categories

  • Scope: What's in/out of the feature boundary
  • Constraints: Technical, performance, or compatibility limits
  • Edge Cases: Unusual inputs or states that need handling
  • Priorities: What to optimize for when trade-offs arise
  • Dependencies: External systems, libraries, or features required

Instructions for Human

  • Answer any questions where the default assumption does not match your intent
  • Leave blank to accept the default assumption
  • Type "skip" to skip all questions and proceed with all defaults