# Onboarding: `action-feature` Branch > What the `action-feature` branch adds compared to `main`. > Last updated: 2026-02-28 > Focus: Branch delta — new components, model changes, data flow, and gaps. ## What This Branch Does The `action-feature` branch transforms SQLEnv from a **scaffold with well-designed Pydantic models** into a **partially working environment** with real action dispatch (describe/sample/query), Ollama-based SQL generation, a WebSocket client, SQLAlchemy ORM models for the `student_assessment` database, and Spider question data. It implements the core `message → action → step → observation` loop that the RL training pipeline will eventually drive. --- ## Branch Overview ``` ┌─────────────────────────────────────────────────────────────────────┐ │ action-feature: New/Changed Components │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ Training Code / Notebook │ │ ┌──────────────────────┐ │ │ │ test_env.ipynb NEW │ Interactive walkthrough (5 test cells) │ │ └──────────┬───────────┘ │ │ │ imports │ │ ┌──────────▼───────────┐ ┌──────────────────────────┐ │ │ │ client.py NEW │────▶│ models.py CHANGED │ │ │ │ SQLEnvClient │ │ SQLAction (+ tokens, │ │ │ │ _step_payload() │ │ action_desc) │ │ │ │ _parse_result() │ │ SQLObservation │ │ │ │ _parse_state() │ │ (messages + tokens) │ │ │ │ message_to_action() │ │ SQLState │ │ │ └──────────────────────┘ │ (history + tokens) │ │ │ │ WebSocket └──────────────────────────┘ │ │ ┌──────────▼───────────┐ │ │ │ server/app.py CHG │ FastAPI bootstrap + tokenizer factory │ │ │ create_sql_env() │ │ │ └──────────┬───────────┘ │ │ │ creates │ │ ┌──────────▼───────────────────────────────────────────────┐ │ │ │ server/sql_environment.py NEW │ │ │ │ SQLEnvironment(Environment) │ │ │ │ ├── reset() → clear state, return obs │ │ │ │ ├── step(action) → dispatch on action_type │ │ │ │ │ ├── "describe" → Ollama selects table → ORM info │ │ │ │ │ ├── "sample" → Ollama selects table → SQL gen │ │ │ │ │ └── "query" → Ollama generates SQL from NL │ │ │ │ ├── message_to_action() → detect type, tokenize │ │ │ │ └── _detect_action_type() → keyword classifier │ │ │ └──────────┬───────────────────────┬───────────────────────┘ │ │ │ introspects │ HTTP calls │ │ ┌──────────▼───────────┐ ┌───────▼────────────────┐ │ │ │ data/databases/ │ │ Ollama (external) │ │ │ │ models.py NEW │ │ /api/generate │ │ │ │ 9 SQLAlchemy tables │ │ qwen2 (default) │ │ │ └──────────────────────┘ └────────────────────────┘ │ │ │ │ ┌──────────────────────┐ ┌────────────────────────────────┐ │ │ │ data/questions/ │ │ scripts/ NEW │ │ │ │ student_assessment │ │ download_spider_data.py │ │ │ │ .json NEW │ │ generate_models_from_schema.py│ │ │ │ (30+ Q&A pairs) │ └────────────────────────────────┘ │ │ └──────────────────────┘ │ │ │ │ ┌──────────────────────┐ ┌────────────────────────┐ │ │ │ server/test_sql_env │ │ server/install_deps.sh │ │ │ │ .py MockTokenizer │ │ Docker setup NEW │ │ │ │ NEW │ └────────────────────────┘ │ │ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Files Changed/Added | File | Status | Purpose | |------|--------|---------| | `envs/sql_env/models.py` | **Changed** | Rewired `SQLAction`, `SQLObservation`, `SQLState` for message+token paradigm | | `envs/sql_env/__init__.py` | **Changed** | Exports `SQLAction`, `SQLObservation`, `SQLState`; lazy client import | | `envs/sql_env/client.py` | **New** | `SQLEnvClient(EnvClient)` — WebSocket client with tensor serialization | | `envs/sql_env/server/sql_environment.py` | **New** | `SQLEnvironment(Environment)` — core environment logic (463 lines) | | `envs/sql_env/server/app.py` | **Changed** | FastAPI bootstrap with tokenizer factory + MockTokenizer fallback | | `envs/sql_env/server/__init__.py` | **Changed** | Exports `SQLEnvironment` | | `envs/sql_env/server/test_sql_env.py` | **New** | `MockTokenizer` for testing without `transformers` library | | `envs/sql_env/server/install_deps.sh` | **New** | Docker setup script: pip install + pre-download GPT-2 tokenizer | | `envs/sql_env/server/requirements.txt` | **New** | Server-side pip deps for Docker (fastapi, torch, transformers, etc.) | | `envs/sql_env/data/databases/models.py` | **New** | SQLAlchemy ORM for `student_assessment` DB (9 model classes) | | `envs/sql_env/data/questions/student_assessment.json` | **New** | 30+ Spider questions with gold SQL, tokenized queries | | `envs/sql_env/scripts/download_spider_data.py` | **New** | Downloads Spider questions from HuggingFace by `db_id` | | `envs/sql_env/scripts/generate_models_from_schema.py` | **New** | Auto-generates SQLAlchemy models from Spider schema dataset | | `envs/sql_env/pyproject.toml` | **Changed** | Python constrained to `>=3.11,<3.13`; added `requests>=2.31.0` | | `envs/sql_env/uv.lock` | **Changed** | Lock file updated for new dependencies | | `README.md` | **Changed** | Added "Current Package State" section with pinned dependency rationale | | `envs/sql_env/server/environment.py` | **Emptied** | Replaced by `sql_environment.py` | | `test_env.ipynb` | **New** | Jupyter notebook with 5 interactive test scenarios | **Total:** 18 files changed, +5702 / -412 lines. --- ## Key Components Introduced ### 1. `SQLEnvironment` — `envs/sql_env/server/sql_environment.py` The heart of the branch. Implements the OpenEnv `Environment` interface with three action types: | Action Type | Dispatch Flow | Output | |-------------|--------------|--------| | `describe` | Ollama selects table → `_get_table_schema()` introspects SQLAlchemy model | Column names + natural language types | | `sample` | Ollama selects table → `_generate_sample_query()` | `SELECT * FROM LIMIT 10;` | | `query` | `_call_ollama_for_sql()` sends NL + schema to Ollama | Generated SQL string | Key methods: - **`reset()`** — Clears conversation history, re-initializes system prompt message + tokens. Returns initial `SQLObservation`. - **`step(action)`** — Dispatches on `action.action_type`. Appends assistant response to `history_messages`, stores action tokens in `history_tokens`. Returns flattened observation. - **`message_to_action(message)`** — Server-side conversion of `Message` dict → `SQLAction`. Detects action type via keywords, appends message to state history, tokenizes full conversation. - **`_detect_action_type(content)`** — Keyword classifier: checks for "describe"/"schema"/"columns" → `describe`, "sample"/"example"/"rows" → `sample`, default → `query`. - **`_create_observation()`** — Builds `SQLObservation` from current state. Flattens all `history_tokens` into a single 1D tensor via `torch.cat`. - **`_get_table_schema(table_name)`** — Introspects SQLAlchemy model columns, converts types to natural language. - **`_call_ollama_for_sql(query)`** / **`_call_ollama_to_select_table(request)`** — HTTP POST to Ollama `/api/generate`. **Constructor params:** `tokenizer` (must have `apply_chat_template`), optional `system_prompt`, optional `transform`. **Environment variables:** `OLLAMA_MODEL` (default: `qwen2`), `OLLAMA_BASE_URL` (default: `http://localhost:11434`). ### 2. `SQLEnvClient` — `envs/sql_env/client.py` WebSocket client extending OpenEnv's `EnvClient[SQLAction, SQLObservation, SQLState]`. Handles tensor↔list serialization for JSON transport: - **`_step_payload(action)`** — Converts `action.tokens` (Tensor) to Python list for JSON. - **`_parse_result(payload)`** — Deserializes response → `StepResult[SQLObservation]`, converting token lists back to tensors. - **`_parse_state(payload)`** — Deserializes state → `SQLState` with tensor reconstruction. - **`message_to_action(message, tokenizer, history_messages)`** — Client-side version of action creation (mirrors server logic). Requires passing a tokenizer explicitly. ### 3. `server/app.py` — FastAPI Bootstrap Changed from a stub to a working application: - **`get_tokenizer()`** — Loads HuggingFace tokenizer from `TOKENIZER_NAME` env var (default: `mistralai/Mistral-7B-Instruct-v0.1`). Falls back to `MockTokenizer` from `test_sql_env.py` if `transformers` is not installed. - **`create_sql_environment()`** — Factory function creating `SQLEnvironment` per WebSocket session. - **`app = create_app(create_sql_environment, SQLAction, SQLObservation, env_name="sql_env")`** — Wires up WebSocket endpoints. ### 4. SQLAlchemy ORM — `envs/sql_env/data/databases/models.py` 9 model classes for the `student_assessment` database: | Model | Table | Key Columns | |-------|-------|-------------| | `Address` | Addresses | address_id, line_1, city, country | | `Person` | People | person_id, first_name, last_name, email_address | | `Student` | Students | student_id, student_details | | `Course` | Courses | course_id (String PK), course_name | | `PersonAddress` | People_Addresses | person_id (FK), address_id (FK), date_from/to | | `StudentCourseRegistration` | Student_Course_Registrations | student_id (FK), course_id (FK), registration_date | | `StudentCourseAttendance` | Student_Course_Attendance | student_id (FK), course_id (FK), date_of_attendance | | `Candidate` | Candidates | candidate_id, candidate_details | | `CandidateAssessment` | Candidate_Assessments | candidate_id (FK), qualification, assessment_date | All models include proper foreign key relationships with `back_populates`. ### 5. Spider Question Data — `envs/sql_env/data/questions/student_assessment.json` 3,355-line JSON file containing 30+ question-answer pairs from the Spider dataset. Each entry includes: - `db_id` — always `student_assessment` - `question` — natural language question (e.g., "which course has most number of registered students?") - `query` — gold SQL (e.g., `SELECT T1.course_name FROM courses AS T1 JOIN student_course_registrations...`) - `query_toks` / `query_toks_no_value` / `question_toks` — tokenized versions ### 6. Data Preparation Scripts — `envs/sql_env/scripts/` - **`download_spider_data.py`** — CLI tool to download Spider questions from HuggingFace. Supports `--db-id` filtering and `--split` selection. - **`generate_models_from_schema.py`** — Auto-generates SQLAlchemy ORM models from the `richardr1126/spider-schema` HuggingFace dataset. Maps Spider types to SQLAlchemy types, handles foreign keys. ### 7. `MockTokenizer` — `envs/sql_env/server/test_sql_env.py` Deterministic tokenizer for testing without `transformers`: - **`apply_chat_template()`** — Converts message text to token IDs via `ord(c) % 256`. - **`decode()`** — Reverses the encoding back to characters. - Imported by `app.py` as a fallback when `transformers` is not installed. --- ## Model Changes (Main → Action-Feature) ### `SQLAction` | Field | Main | Action-Feature | Notes | |-------|------|----------------|-------| | `action_type` | `"DESCRIBE, SAMPLE, QUERY, ANSWER"` | `"describe, sample, query"` | Lowercase, ANSWER removed | | `argument` | Table name / SQL / answer value | **Removed** | — | | `action_description` | — | **Added**: description string | Replaces `argument` | | `tokens` | — | **Added**: `torch.Tensor` | Tokenized conversation | ### `SQLObservation` | Field | Main | Action-Feature | Notes | |-------|------|----------------|-------| | `question` | NL question string | **Commented out** | — | | `schema_info` | DB schema description | **Commented out** | — | | `result` | Last action result | **Commented out** | — | | `error` | Error message | **Commented out** | — | | `step_count` | Current step number | **Commented out** | — | | `budget_remaining` | Steps left | **Commented out** | — | | `action_history` | Summary of actions | **Commented out** | — | | `messages` | — | **Added**: `list[Message]` | Full conversation history | | `tokens` | — | **Added**: `torch.Tensor` | Flattened token tensor | The original observation fields are **commented out, not deleted** — they're expected to return in a future phase. ### `SQLState` | Field | Main | Action-Feature | Notes | |-------|------|----------------|-------| | `game_name` | `"sql_env"` | **Commented out** | — | | `history_messages` | — | **Added**: `list[Message]` | Full conversation history | | `history_tokens` | — | **Added**: `list[torch.Tensor]` | Per-message token tensors | | `current_action_type` | — | **Added**: `str` (default `"query"`) | Tracks current action | **Design shift:** The branch moves from a **structured observation** (question + schema + result fields) to a **chat-based observation** (raw messages + tokens). This aligns with how LLM-based agents naturally consume conversational context. --- ## Data Flow ``` User Message (dict: {role: "user", content: "Show me the Student schema"}) │ ▼ message_to_action(message) [SQLEnvironment or SQLEnvClient] ├── Detect action type via keywords │ "schema" found → action_type = "describe" ├── Append message to _state.history_messages ← MUTATES STATE ├── Tokenize FULL conversation via tokenizer.apply_chat_template() └── Return SQLAction(action_type="describe", │ action_description="Show me the Student schema", │ tokens=) │ ▼ step(action) [SQLEnvironment] ├── Dispatch on action.action_type: │ "describe" → _call_ollama_to_select_table("Show me the Student schema") │ → returns "Student" │ → _get_table_schema("Student") │ → introspects SQLAlchemy model columns │ → "Table 'Student' has: student_id: integer, ..." ├── Create assistant Message with schema info ├── Append assistant message to _state.history_messages ├── Append action.tokens to _state.history_tokens └── _create_observation() ├── Flatten all history_tokens via torch.cat → single 1D tensor ├── Copy history_messages ├── Apply transform (if configured) └── Return SQLObservation(messages=[...], tokens=) ``` --- ## External Dependencies Added | Dependency | Version | Purpose | Integration Point | |------------|---------|---------|-------------------| | Ollama (local service) | — | LLM inference for SQL generation + table selection | `sql_environment.py:_call_ollama_for_sql()`, `_call_ollama_to_select_table()` | | `requests` | >=2.31.0 | HTTP client for Ollama API | `sql_environment.py` | | `torch` | ==2.2.2 | Tensor operations for tokenized representations | `models.py`, `client.py`, `sql_environment.py` | | `transformers` | <5 | HuggingFace tokenizers (chat template support) | `app.py:get_tokenizer()` | | `numpy` | <2 | Torch dependency constraint | `pyproject.toml` | | `sqlalchemy` | (transitive) | ORM for database schema introspection | `data/databases/models.py` | | `datasets` | (scripts only) | HuggingFace `load_dataset` for Spider data download | `scripts/download_spider_data.py`, `scripts/generate_models_from_schema.py` | **Environment variables:** | Variable | Default | Purpose | |----------|---------|---------| | `TOKENIZER_NAME` | `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace tokenizer model | | `SYSTEM_PROMPT` | Built-in schema description | Custom system prompt override | | `OLLAMA_MODEL` | `qwen2` | Ollama model for SQL generation | | `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API endpoint | --- ## Known Gaps (Not Yet Implemented) | Feature | Status | Notes | |---------|--------|-------| | `ANSWER` action type | Not implemented | Designed in main-branch models but removed from action-feature | | Real database execution | Not implemented | `step()` generates SQL text via Ollama but never executes it against SQLite | | Reward computation | Not implemented | `reward.py` is empty; 3-layer design exists in README only | | Answer verification | Not implemented | `verifier.py` is empty | | Budget tracking | Not implemented | No step limit enforcement | | Episode question selection | Not implemented | Environment uses hardcoded schema; `student_assessment.json` is present but not loaded by the environment | | Dockerfile | Not implemented | File is empty; `install_deps.sh` is ready | | `openenv.yaml` manifest | Not implemented | Empty file | | Formal test suite | Not implemented | No `tests/` directory; only `MockTokenizer` and notebook tests | --- ## Gotchas - **`message_to_action()` mutates state:** On the server side, `message_to_action()` appends the message to `_state.history_messages` *before* tokenizing. This means calling it has a side effect — it's not a pure function. If you call it twice with the same message, you'll get duplicate entries in history. - **Client vs Server `message_to_action` diverge:** The server version (`sql_environment.py:message_to_action`) manages state internally and mutates `_state`. The client version (`client.py:message_to_action`) requires passing `history_messages` explicitly and does not manage state. They have different signatures. - **Schema description is hardcoded in `sql_environment.py`:** The `_build_schema_description()` function returns a fixed string with table/column names that don't perfectly match the SQLAlchemy ORM models. For example, the schema description says `Students (student_id, person_id, student_acc_status)` but the ORM model has `Students (student_id, student_details)`. - **Ollama failure mode is silent:** If Ollama is unreachable, `_call_ollama_to_select_table()` catches all exceptions and returns the *first table in the dict* (`Address`). No error is surfaced to the caller. `_call_ollama_for_sql()` returns an error string, but it's treated as a normal assistant message. - **Original observation fields are commented out, not deleted:** `SQLObservation` still has `question`, `schema_info`, `result`, `error`, `step_count`, `budget_remaining`, and `action_history` as comments. They're intended to return in a later phase. - **`MockTokenizer` is imported by production code:** `app.py` imports `MockTokenizer` from `test_sql_env.py` at runtime when `transformers` is missing. This couples test utilities to production bootstrap. - **`test_env.ipynb` lives at project root:** Not inside `tests/` or `envs/`. Easy to miss when exploring the codebase. - **Pydantic + torch.Tensor:** `SQLAction`, `SQLObservation`, and `SQLState` use `torch.Tensor` fields with Pydantic. This requires `arbitrary_types_allowed = True` in the Pydantic model config (inherited from OpenEnv base classes). Standard Pydantic serialization (`.model_dump()`) won't work out of the box with tensors. --- ## Entry Points for Reading | What You Want to Understand | Start Here | Then Read | |----------------------------|------------|-----------| | How actions are processed | `envs/sql_env/server/sql_environment.py:step()` | `_detect_action_type()`, `_call_ollama_for_sql()` | | How messages become actions | `envs/sql_env/server/sql_environment.py:message_to_action()` | `envs/sql_env/client.py:message_to_action()` | | Data contracts | `envs/sql_env/models.py` | Compare with `git show main:envs/sql_env/models.py` | | Server bootstrap | `envs/sql_env/server/app.py` | `get_tokenizer()`, `create_sql_environment()` | | Database schema | `envs/sql_env/data/databases/models.py` | `envs/sql_env/data/questions/student_assessment.json` | | Client-side usage | `envs/sql_env/client.py` | `test_env.ipynb` | | Data preparation | `envs/sql_env/scripts/download_spider_data.py` | `scripts/generate_models_from_schema.py` | --- *This document covers only the `action-feature` branch delta. For the overall project design (POMDP architecture, reward layers, episode lifecycle), see [README.md](README.md).* These issues are also changed as of now, check when we modify. Known Issues Discovered 1. sqlalchemy is missing from pyproject.toml on the branch 2. Pydantic/TypedDict incompatibility on Python < 3.12 (demo auto-patches) 3. Hardcoded schema description in sql_environment.py doesn't match ORM models 4. Silent Ollama fallback to first table on connection failure Please check the latest remote branch action-feature