# Onboarding: `action-feature` Branch
> What the `action-feature` branch adds compared to `main`.
> Last updated: 2026-02-28
> Focus: Branch delta — new components, model changes, data flow, and gaps.
## What This Branch Does
The `action-feature` branch transforms SQLEnv from a **scaffold with well-designed Pydantic models** into a **partially working environment** with real action dispatch (describe/sample/query), Ollama-based SQL generation, a WebSocket client, SQLAlchemy ORM models for the `student_assessment` database, and Spider question data. It implements the core `message → action → step → observation` loop that the RL training pipeline will eventually drive.
---
## Branch Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ action-feature: New/Changed Components │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Training Code / Notebook │
│ ┌──────────────────────┐ │
│ │ test_env.ipynb NEW │ Interactive walkthrough (5 test cells) │
│ └──────────┬───────────┘ │
│ │ imports │
│ ┌──────────▼───────────┐ ┌──────────────────────────┐ │
│ │ client.py NEW │────▶│ models.py CHANGED │ │
│ │ SQLEnvClient │ │ SQLAction (+ tokens, │ │
│ │ _step_payload() │ │ action_desc) │ │
│ │ _parse_result() │ │ SQLObservation │ │
│ │ _parse_state() │ │ (messages + tokens) │ │
│ │ message_to_action() │ │ SQLState │ │
│ └──────────────────────┘ │ (history + tokens) │ │
│ │ WebSocket └──────────────────────────┘ │
│ ┌──────────▼───────────┐ │
│ │ server/app.py CHG │ FastAPI bootstrap + tokenizer factory │
│ │ create_sql_env() │ │
│ └──────────┬───────────┘ │
│ │ creates │
│ ┌──────────▼───────────────────────────────────────────────┐ │
│ │ server/sql_environment.py NEW │ │
│ │ SQLEnvironment(Environment) │ │
│ │ ├── reset() → clear state, return obs │ │
│ │ ├── step(action) → dispatch on action_type │ │
│ │ │ ├── "describe" → Ollama selects table → ORM info │ │
│ │ │ ├── "sample" → Ollama selects table → SQL gen │ │
│ │ │ └── "query" → Ollama generates SQL from NL │ │
│ │ ├── message_to_action() → detect type, tokenize │ │
│ │ └── _detect_action_type() → keyword classifier │ │
│ └──────────┬───────────────────────┬───────────────────────┘ │
│ │ introspects │ HTTP calls │
│ ┌──────────▼───────────┐ ┌───────▼────────────────┐ │
│ │ data/databases/ │ │ Ollama (external) │ │
│ │ models.py NEW │ │ /api/generate │ │
│ │ 9 SQLAlchemy tables │ │ qwen2 (default) │ │
│ └──────────────────────┘ └────────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌────────────────────────────────┐ │
│ │ data/questions/ │ │ scripts/ NEW │ │
│ │ student_assessment │ │ download_spider_data.py │ │
│ │ .json NEW │ │ generate_models_from_schema.py│ │
│ │ (30+ Q&A pairs) │ └────────────────────────────────┘ │
│ └──────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌────────────────────────┐ │
│ │ server/test_sql_env │ │ server/install_deps.sh │ │
│ │ .py MockTokenizer │ │ Docker setup NEW │ │
│ │ NEW │ └────────────────────────┘ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## Files Changed/Added
| File | Status | Purpose |
|------|--------|---------|
| `envs/sql_env/models.py` | **Changed** | Rewired `SQLAction`, `SQLObservation`, `SQLState` for message+token paradigm |
| `envs/sql_env/__init__.py` | **Changed** | Exports `SQLAction`, `SQLObservation`, `SQLState`; lazy client import |
| `envs/sql_env/client.py` | **New** | `SQLEnvClient(EnvClient)` — WebSocket client with tensor serialization |
| `envs/sql_env/server/sql_environment.py` | **New** | `SQLEnvironment(Environment)` — core environment logic (463 lines) |
| `envs/sql_env/server/app.py` | **Changed** | FastAPI bootstrap with tokenizer factory + MockTokenizer fallback |
| `envs/sql_env/server/__init__.py` | **Changed** | Exports `SQLEnvironment` |
| `envs/sql_env/server/test_sql_env.py` | **New** | `MockTokenizer` for testing without `transformers` library |
| `envs/sql_env/server/install_deps.sh` | **New** | Docker setup script: pip install + pre-download GPT-2 tokenizer |
| `envs/sql_env/server/requirements.txt` | **New** | Server-side pip deps for Docker (fastapi, torch, transformers, etc.) |
| `envs/sql_env/data/databases/models.py` | **New** | SQLAlchemy ORM for `student_assessment` DB (9 model classes) |
| `envs/sql_env/data/questions/student_assessment.json` | **New** | 30+ Spider questions with gold SQL, tokenized queries |
| `envs/sql_env/scripts/download_spider_data.py` | **New** | Downloads Spider questions from HuggingFace by `db_id` |
| `envs/sql_env/scripts/generate_models_from_schema.py` | **New** | Auto-generates SQLAlchemy models from Spider schema dataset |
| `envs/sql_env/pyproject.toml` | **Changed** | Python constrained to `>=3.11,<3.13`; added `requests>=2.31.0` |
| `envs/sql_env/uv.lock` | **Changed** | Lock file updated for new dependencies |
| `README.md` | **Changed** | Added "Current Package State" section with pinned dependency rationale |
| `envs/sql_env/server/environment.py` | **Emptied** | Replaced by `sql_environment.py` |
| `test_env.ipynb` | **New** | Jupyter notebook with 5 interactive test scenarios |
**Total:** 18 files changed, +5702 / -412 lines.
---
## Key Components Introduced
### 1. `SQLEnvironment` — `envs/sql_env/server/sql_environment.py`
The heart of the branch. Implements the OpenEnv `Environment` interface with three action types:
| Action Type | Dispatch Flow | Output |
|-------------|--------------|--------|
| `describe` | Ollama selects table → `_get_table_schema()` introspects SQLAlchemy model | Column names + natural language types |
| `sample` | Ollama selects table → `_generate_sample_query()` | `SELECT * FROM
LIMIT 10;` |
| `query` | `_call_ollama_for_sql()` sends NL + schema to Ollama | Generated SQL string |
Key methods:
- **`reset()`** — Clears conversation history, re-initializes system prompt message + tokens. Returns initial `SQLObservation`.
- **`step(action)`** — Dispatches on `action.action_type`. Appends assistant response to `history_messages`, stores action tokens in `history_tokens`. Returns flattened observation.
- **`message_to_action(message)`** — Server-side conversion of `Message` dict → `SQLAction`. Detects action type via keywords, appends message to state history, tokenizes full conversation.
- **`_detect_action_type(content)`** — Keyword classifier: checks for "describe"/"schema"/"columns" → `describe`, "sample"/"example"/"rows" → `sample`, default → `query`.
- **`_create_observation()`** — Builds `SQLObservation` from current state. Flattens all `history_tokens` into a single 1D tensor via `torch.cat`.
- **`_get_table_schema(table_name)`** — Introspects SQLAlchemy model columns, converts types to natural language.
- **`_call_ollama_for_sql(query)`** / **`_call_ollama_to_select_table(request)`** — HTTP POST to Ollama `/api/generate`.
**Constructor params:** `tokenizer` (must have `apply_chat_template`), optional `system_prompt`, optional `transform`.
**Environment variables:** `OLLAMA_MODEL` (default: `qwen2`), `OLLAMA_BASE_URL` (default: `http://localhost:11434`).
### 2. `SQLEnvClient` — `envs/sql_env/client.py`
WebSocket client extending OpenEnv's `EnvClient[SQLAction, SQLObservation, SQLState]`. Handles tensor↔list serialization for JSON transport:
- **`_step_payload(action)`** — Converts `action.tokens` (Tensor) to Python list for JSON.
- **`_parse_result(payload)`** — Deserializes response → `StepResult[SQLObservation]`, converting token lists back to tensors.
- **`_parse_state(payload)`** — Deserializes state → `SQLState` with tensor reconstruction.
- **`message_to_action(message, tokenizer, history_messages)`** — Client-side version of action creation (mirrors server logic). Requires passing a tokenizer explicitly.
### 3. `server/app.py` — FastAPI Bootstrap
Changed from a stub to a working application:
- **`get_tokenizer()`** — Loads HuggingFace tokenizer from `TOKENIZER_NAME` env var (default: `mistralai/Mistral-7B-Instruct-v0.1`). Falls back to `MockTokenizer` from `test_sql_env.py` if `transformers` is not installed.
- **`create_sql_environment()`** — Factory function creating `SQLEnvironment` per WebSocket session.
- **`app = create_app(create_sql_environment, SQLAction, SQLObservation, env_name="sql_env")`** — Wires up WebSocket endpoints.
### 4. SQLAlchemy ORM — `envs/sql_env/data/databases/models.py`
9 model classes for the `student_assessment` database:
| Model | Table | Key Columns |
|-------|-------|-------------|
| `Address` | Addresses | address_id, line_1, city, country |
| `Person` | People | person_id, first_name, last_name, email_address |
| `Student` | Students | student_id, student_details |
| `Course` | Courses | course_id (String PK), course_name |
| `PersonAddress` | People_Addresses | person_id (FK), address_id (FK), date_from/to |
| `StudentCourseRegistration` | Student_Course_Registrations | student_id (FK), course_id (FK), registration_date |
| `StudentCourseAttendance` | Student_Course_Attendance | student_id (FK), course_id (FK), date_of_attendance |
| `Candidate` | Candidates | candidate_id, candidate_details |
| `CandidateAssessment` | Candidate_Assessments | candidate_id (FK), qualification, assessment_date |
All models include proper foreign key relationships with `back_populates`.
### 5. Spider Question Data — `envs/sql_env/data/questions/student_assessment.json`
3,355-line JSON file containing 30+ question-answer pairs from the Spider dataset. Each entry includes:
- `db_id` — always `student_assessment`
- `question` — natural language question (e.g., "which course has most number of registered students?")
- `query` — gold SQL (e.g., `SELECT T1.course_name FROM courses AS T1 JOIN student_course_registrations...`)
- `query_toks` / `query_toks_no_value` / `question_toks` — tokenized versions
### 6. Data Preparation Scripts — `envs/sql_env/scripts/`
- **`download_spider_data.py`** — CLI tool to download Spider questions from HuggingFace. Supports `--db-id` filtering and `--split` selection.
- **`generate_models_from_schema.py`** — Auto-generates SQLAlchemy ORM models from the `richardr1126/spider-schema` HuggingFace dataset. Maps Spider types to SQLAlchemy types, handles foreign keys.
### 7. `MockTokenizer` — `envs/sql_env/server/test_sql_env.py`
Deterministic tokenizer for testing without `transformers`:
- **`apply_chat_template()`** — Converts message text to token IDs via `ord(c) % 256`.
- **`decode()`** — Reverses the encoding back to characters.
- Imported by `app.py` as a fallback when `transformers` is not installed.
---
## Model Changes (Main → Action-Feature)
### `SQLAction`
| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `action_type` | `"DESCRIBE, SAMPLE, QUERY, ANSWER"` | `"describe, sample, query"` | Lowercase, ANSWER removed |
| `argument` | Table name / SQL / answer value | **Removed** | — |
| `action_description` | — | **Added**: description string | Replaces `argument` |
| `tokens` | — | **Added**: `torch.Tensor` | Tokenized conversation |
### `SQLObservation`
| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `question` | NL question string | **Commented out** | — |
| `schema_info` | DB schema description | **Commented out** | — |
| `result` | Last action result | **Commented out** | — |
| `error` | Error message | **Commented out** | — |
| `step_count` | Current step number | **Commented out** | — |
| `budget_remaining` | Steps left | **Commented out** | — |
| `action_history` | Summary of actions | **Commented out** | — |
| `messages` | — | **Added**: `list[Message]` | Full conversation history |
| `tokens` | — | **Added**: `torch.Tensor` | Flattened token tensor |
The original observation fields are **commented out, not deleted** — they're expected to return in a future phase.
### `SQLState`
| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `game_name` | `"sql_env"` | **Commented out** | — |
| `history_messages` | — | **Added**: `list[Message]` | Full conversation history |
| `history_tokens` | — | **Added**: `list[torch.Tensor]` | Per-message token tensors |
| `current_action_type` | — | **Added**: `str` (default `"query"`) | Tracks current action |
**Design shift:** The branch moves from a **structured observation** (question + schema + result fields) to a **chat-based observation** (raw messages + tokens). This aligns with how LLM-based agents naturally consume conversational context.
---
## Data Flow
```
User Message (dict: {role: "user", content: "Show me the Student schema"})
│
▼
message_to_action(message) [SQLEnvironment or SQLEnvClient]
├── Detect action type via keywords
│ "schema" found → action_type = "describe"
├── Append message to _state.history_messages ← MUTATES STATE
├── Tokenize FULL conversation via tokenizer.apply_chat_template()
└── Return SQLAction(action_type="describe",
│ action_description="Show me the Student schema",
│ tokens=)
│
▼
step(action) [SQLEnvironment]
├── Dispatch on action.action_type:
│ "describe" → _call_ollama_to_select_table("Show me the Student schema")
│ → returns "Student"
│ → _get_table_schema("Student")
│ → introspects SQLAlchemy model columns
│ → "Table 'Student' has: student_id: integer, ..."
├── Create assistant Message with schema info
├── Append assistant message to _state.history_messages
├── Append action.tokens to _state.history_tokens
└── _create_observation()
├── Flatten all history_tokens via torch.cat → single 1D tensor
├── Copy history_messages
├── Apply transform (if configured)
└── Return SQLObservation(messages=[...], tokens=)
```
---
## External Dependencies Added
| Dependency | Version | Purpose | Integration Point |
|------------|---------|---------|-------------------|
| Ollama (local service) | — | LLM inference for SQL generation + table selection | `sql_environment.py:_call_ollama_for_sql()`, `_call_ollama_to_select_table()` |
| `requests` | >=2.31.0 | HTTP client for Ollama API | `sql_environment.py` |
| `torch` | ==2.2.2 | Tensor operations for tokenized representations | `models.py`, `client.py`, `sql_environment.py` |
| `transformers` | <5 | HuggingFace tokenizers (chat template support) | `app.py:get_tokenizer()` |
| `numpy` | <2 | Torch dependency constraint | `pyproject.toml` |
| `sqlalchemy` | (transitive) | ORM for database schema introspection | `data/databases/models.py` |
| `datasets` | (scripts only) | HuggingFace `load_dataset` for Spider data download | `scripts/download_spider_data.py`, `scripts/generate_models_from_schema.py` |
**Environment variables:**
| Variable | Default | Purpose |
|----------|---------|---------|
| `TOKENIZER_NAME` | `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace tokenizer model |
| `SYSTEM_PROMPT` | Built-in schema description | Custom system prompt override |
| `OLLAMA_MODEL` | `qwen2` | Ollama model for SQL generation |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API endpoint |
---
## Known Gaps (Not Yet Implemented)
| Feature | Status | Notes |
|---------|--------|-------|
| `ANSWER` action type | Not implemented | Designed in main-branch models but removed from action-feature |
| Real database execution | Not implemented | `step()` generates SQL text via Ollama but never executes it against SQLite |
| Reward computation | Not implemented | `reward.py` is empty; 3-layer design exists in README only |
| Answer verification | Not implemented | `verifier.py` is empty |
| Budget tracking | Not implemented | No step limit enforcement |
| Episode question selection | Not implemented | Environment uses hardcoded schema; `student_assessment.json` is present but not loaded by the environment |
| Dockerfile | Not implemented | File is empty; `install_deps.sh` is ready |
| `openenv.yaml` manifest | Not implemented | Empty file |
| Formal test suite | Not implemented | No `tests/` directory; only `MockTokenizer` and notebook tests |
---
## Gotchas
- **`message_to_action()` mutates state:** On the server side, `message_to_action()` appends the message to `_state.history_messages` *before* tokenizing. This means calling it has a side effect — it's not a pure function. If you call it twice with the same message, you'll get duplicate entries in history.
- **Client vs Server `message_to_action` diverge:** The server version (`sql_environment.py:message_to_action`) manages state internally and mutates `_state`. The client version (`client.py:message_to_action`) requires passing `history_messages` explicitly and does not manage state. They have different signatures.
- **Schema description is hardcoded in `sql_environment.py`:** The `_build_schema_description()` function returns a fixed string with table/column names that don't perfectly match the SQLAlchemy ORM models. For example, the schema description says `Students (student_id, person_id, student_acc_status)` but the ORM model has `Students (student_id, student_details)`.
- **Ollama failure mode is silent:** If Ollama is unreachable, `_call_ollama_to_select_table()` catches all exceptions and returns the *first table in the dict* (`Address`). No error is surfaced to the caller. `_call_ollama_for_sql()` returns an error string, but it's treated as a normal assistant message.
- **Original observation fields are commented out, not deleted:** `SQLObservation` still has `question`, `schema_info`, `result`, `error`, `step_count`, `budget_remaining`, and `action_history` as comments. They're intended to return in a later phase.
- **`MockTokenizer` is imported by production code:** `app.py` imports `MockTokenizer` from `test_sql_env.py` at runtime when `transformers` is missing. This couples test utilities to production bootstrap.
- **`test_env.ipynb` lives at project root:** Not inside `tests/` or `envs/`. Easy to miss when exploring the codebase.
- **Pydantic + torch.Tensor:** `SQLAction`, `SQLObservation`, and `SQLState` use `torch.Tensor` fields with Pydantic. This requires `arbitrary_types_allowed = True` in the Pydantic model config (inherited from OpenEnv base classes). Standard Pydantic serialization (`.model_dump()`) won't work out of the box with tensors.
---
## Entry Points for Reading
| What You Want to Understand | Start Here | Then Read |
|----------------------------|------------|-----------|
| How actions are processed | `envs/sql_env/server/sql_environment.py:step()` | `_detect_action_type()`, `_call_ollama_for_sql()` |
| How messages become actions | `envs/sql_env/server/sql_environment.py:message_to_action()` | `envs/sql_env/client.py:message_to_action()` |
| Data contracts | `envs/sql_env/models.py` | Compare with `git show main:envs/sql_env/models.py` |
| Server bootstrap | `envs/sql_env/server/app.py` | `get_tokenizer()`, `create_sql_environment()` |
| Database schema | `envs/sql_env/data/databases/models.py` | `envs/sql_env/data/questions/student_assessment.json` |
| Client-side usage | `envs/sql_env/client.py` | `test_env.ipynb` |
| Data preparation | `envs/sql_env/scripts/download_spider_data.py` | `scripts/generate_models_from_schema.py` |
---
*This document covers only the `action-feature` branch delta. For the overall project design (POMDP architecture, reward layers, episode lifecycle), see [README.md](README.md).*
These issues are also changed as of now, check when we modify.
Known Issues Discovered
1. sqlalchemy is missing from pyproject.toml on the branch
2. Pydantic/TypedDict incompatibility on Python < 3.12 (demo auto-patches)
3. Hardcoded schema description in sql_environment.py doesn't match ORM models
4. Silent Ollama fallback to first table on connection failure
Please check the latest remote branch action-feature