# Onboarding: `action-feature` Branch

> What the `action-feature` branch adds compared to `main`.
> Last updated: 2026-02-28
> Focus: Branch delta — new components, model changes, data flow, and gaps.

## What This Branch Does

The `action-feature` branch transforms SQLEnv from a **scaffold with well-designed Pydantic models** into a **partially working environment** with real action dispatch (describe/sample/query), Ollama-based SQL generation, a WebSocket client, SQLAlchemy ORM models for the `student_assessment` database, and Spider question data. It implements the core `message → action → step → observation` loop that the RL training pipeline will eventually drive.

---

## Branch Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│  action-feature: New/Changed Components                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Training Code / Notebook                                          │
│   ┌──────────────────────┐                                          │
│   │  test_env.ipynb  NEW │   Interactive walkthrough (5 test cells) │
│   └──────────┬───────────┘                                          │
│              │ imports                                               │
│   ┌──────────▼───────────┐     ┌──────────────────────────┐         │
│   │  client.py       NEW │────▶│ models.py       CHANGED  │         │
│   │  SQLEnvClient        │     │ SQLAction  (+ tokens,    │         │
│   │  _step_payload()     │     │            action_desc)  │         │
│   │  _parse_result()     │     │ SQLObservation           │         │
│   │  _parse_state()      │     │   (messages + tokens)    │         │
│   │  message_to_action() │     │ SQLState                 │         │
│   └──────────────────────┘     │   (history + tokens)     │         │
│              │ WebSocket       └──────────────────────────┘         │
│   ┌──────────▼───────────┐                                          │
│   │  server/app.py  CHG  │   FastAPI bootstrap + tokenizer factory  │
│   │  create_sql_env()    │                                          │
│   └──────────┬───────────┘                                          │
│              │ creates                                               │
│   ┌──────────▼───────────────────────────────────────────────┐      │
│   │  server/sql_environment.py                           NEW │      │
│   │  SQLEnvironment(Environment)                             │      │
│   │  ├── reset()          → clear state, return obs          │      │
│   │  ├── step(action)     → dispatch on action_type          │      │
│   │  │   ├── "describe"   → Ollama selects table → ORM info │      │
│   │  │   ├── "sample"     → Ollama selects table → SQL gen  │      │
│   │  │   └── "query"      → Ollama generates SQL from NL    │      │
│   │  ├── message_to_action() → detect type, tokenize        │      │
│   │  └── _detect_action_type() → keyword classifier         │      │
│   └──────────┬───────────────────────┬───────────────────────┘      │
│              │ introspects           │ HTTP calls                    │
│   ┌──────────▼───────────┐  ┌───────▼────────────────┐             │
│   │  data/databases/     │  │  Ollama (external)     │             │
│   │  models.py       NEW │  │  /api/generate         │             │
│   │  9 SQLAlchemy tables │  │  qwen2 (default)       │             │
│   └──────────────────────┘  └────────────────────────┘             │
│                                                                     │
│   ┌──────────────────────┐  ┌────────────────────────────────┐      │
│   │  data/questions/     │  │  scripts/                  NEW │      │
│   │  student_assessment  │  │  download_spider_data.py       │      │
│   │  .json           NEW │  │  generate_models_from_schema.py│      │
│   │  (30+ Q&A pairs)    │  └────────────────────────────────┘      │
│   └──────────────────────┘                                          │
│                                                                     │
│   ┌──────────────────────┐  ┌────────────────────────┐              │
│   │ server/test_sql_env  │  │ server/install_deps.sh │              │
│   │ .py    MockTokenizer │  │ Docker setup       NEW │              │
│   │                  NEW │  └────────────────────────┘              │
│   └──────────────────────┘                                          │
└─────────────────────────────────────────────────────────────────────┘
```

---

## Files Changed/Added

| File | Status | Purpose |
|------|--------|---------|
| `envs/sql_env/models.py` | **Changed** | Rewired `SQLAction`, `SQLObservation`, `SQLState` for message+token paradigm |
| `envs/sql_env/__init__.py` | **Changed** | Exports `SQLAction`, `SQLObservation`, `SQLState`; lazy client import |
| `envs/sql_env/client.py` | **New** | `SQLEnvClient(EnvClient)` — WebSocket client with tensor serialization |
| `envs/sql_env/server/sql_environment.py` | **New** | `SQLEnvironment(Environment)` — core environment logic (463 lines) |
| `envs/sql_env/server/app.py` | **Changed** | FastAPI bootstrap with tokenizer factory + MockTokenizer fallback |
| `envs/sql_env/server/__init__.py` | **Changed** | Exports `SQLEnvironment` |
| `envs/sql_env/server/test_sql_env.py` | **New** | `MockTokenizer` for testing without `transformers` library |
| `envs/sql_env/server/install_deps.sh` | **New** | Docker setup script: pip install + pre-download GPT-2 tokenizer |
| `envs/sql_env/server/requirements.txt` | **New** | Server-side pip deps for Docker (fastapi, torch, transformers, etc.) |
| `envs/sql_env/data/databases/models.py` | **New** | SQLAlchemy ORM for `student_assessment` DB (9 model classes) |
| `envs/sql_env/data/questions/student_assessment.json` | **New** | 30+ Spider questions with gold SQL, tokenized queries |
| `envs/sql_env/scripts/download_spider_data.py` | **New** | Downloads Spider questions from HuggingFace by `db_id` |
| `envs/sql_env/scripts/generate_models_from_schema.py` | **New** | Auto-generates SQLAlchemy models from Spider schema dataset |
| `envs/sql_env/pyproject.toml` | **Changed** | Python constrained to `>=3.11,<3.13`; added `requests>=2.31.0` |
| `envs/sql_env/uv.lock` | **Changed** | Lock file updated for new dependencies |
| `README.md` | **Changed** | Added "Current Package State" section with pinned dependency rationale |
| `envs/sql_env/server/environment.py` | **Emptied** | Replaced by `sql_environment.py` |
| `test_env.ipynb` | **New** | Jupyter notebook with 5 interactive test scenarios |

**Total:** 18 files changed, +5702 / -412 lines.

---

## Key Components Introduced

### 1. `SQLEnvironment` — `envs/sql_env/server/sql_environment.py`

The heart of the branch. Implements the OpenEnv `Environment` interface with three action types:

| Action Type | Dispatch Flow | Output |
|-------------|--------------|--------|
| `describe` | Ollama selects table → `_get_table_schema()` introspects SQLAlchemy model | Column names + natural language types |
| `sample` | Ollama selects table → `_generate_sample_query()` | `SELECT * FROM <table> LIMIT 10;` |
| `query` | `_call_ollama_for_sql()` sends NL + schema to Ollama | Generated SQL string |

Key methods:

- **`reset()`** — Clears conversation history, re-initializes system prompt message + tokens. Returns initial `SQLObservation`.
- **`step(action)`** — Dispatches on `action.action_type`. Appends assistant response to `history_messages`, stores action tokens in `history_tokens`. Returns flattened observation.
- **`message_to_action(message)`** — Server-side conversion of `Message` dict → `SQLAction`. Detects action type via keywords, appends message to state history, tokenizes full conversation.
- **`_detect_action_type(content)`** — Keyword classifier: checks for "describe"/"schema"/"columns" → `describe`, "sample"/"example"/"rows" → `sample`, default → `query`.
- **`_create_observation()`** — Builds `SQLObservation` from current state. Flattens all `history_tokens` into a single 1D tensor via `torch.cat`.
- **`_get_table_schema(table_name)`** — Introspects SQLAlchemy model columns, converts types to natural language.
- **`_call_ollama_for_sql(query)`** / **`_call_ollama_to_select_table(request)`** — HTTP POST to Ollama `/api/generate`.

**Constructor params:** `tokenizer` (must have `apply_chat_template`), optional `system_prompt`, optional `transform`.

**Environment variables:** `OLLAMA_MODEL` (default: `qwen2`), `OLLAMA_BASE_URL` (default: `http://localhost:11434`).

### 2. `SQLEnvClient` — `envs/sql_env/client.py`

WebSocket client extending OpenEnv's `EnvClient[SQLAction, SQLObservation, SQLState]`. Handles tensor↔list serialization for JSON transport:

- **`_step_payload(action)`** — Converts `action.tokens` (Tensor) to Python list for JSON.
- **`_parse_result(payload)`** — Deserializes response → `StepResult[SQLObservation]`, converting token lists back to tensors.
- **`_parse_state(payload)`** — Deserializes state → `SQLState` with tensor reconstruction.
- **`message_to_action(message, tokenizer, history_messages)`** — Client-side version of action creation (mirrors server logic). Requires passing a tokenizer explicitly.

### 3. `server/app.py` — FastAPI Bootstrap

Changed from a stub to a working application:

- **`get_tokenizer()`** — Loads HuggingFace tokenizer from `TOKENIZER_NAME` env var (default: `mistralai/Mistral-7B-Instruct-v0.1`). Falls back to `MockTokenizer` from `test_sql_env.py` if `transformers` is not installed.
- **`create_sql_environment()`** — Factory function creating `SQLEnvironment` per WebSocket session.
- **`app = create_app(create_sql_environment, SQLAction, SQLObservation, env_name="sql_env")`** — Wires up WebSocket endpoints.

### 4. SQLAlchemy ORM — `envs/sql_env/data/databases/models.py`

9 model classes for the `student_assessment` database:

| Model | Table | Key Columns |
|-------|-------|-------------|
| `Address` | Addresses | address_id, line_1, city, country |
| `Person` | People | person_id, first_name, last_name, email_address |
| `Student` | Students | student_id, student_details |
| `Course` | Courses | course_id (String PK), course_name |
| `PersonAddress` | People_Addresses | person_id (FK), address_id (FK), date_from/to |
| `StudentCourseRegistration` | Student_Course_Registrations | student_id (FK), course_id (FK), registration_date |
| `StudentCourseAttendance` | Student_Course_Attendance | student_id (FK), course_id (FK), date_of_attendance |
| `Candidate` | Candidates | candidate_id, candidate_details |
| `CandidateAssessment` | Candidate_Assessments | candidate_id (FK), qualification, assessment_date |

All models include proper foreign key relationships with `back_populates`.

### 5. Spider Question Data — `envs/sql_env/data/questions/student_assessment.json`

3,355-line JSON file containing 30+ question-answer pairs from the Spider dataset. Each entry includes:
- `db_id` — always `student_assessment`
- `question` — natural language question (e.g., "which course has most number of registered students?")
- `query` — gold SQL (e.g., `SELECT T1.course_name FROM courses AS T1 JOIN student_course_registrations...`)
- `query_toks` / `query_toks_no_value` / `question_toks` — tokenized versions

### 6. Data Preparation Scripts — `envs/sql_env/scripts/`

- **`download_spider_data.py`** — CLI tool to download Spider questions from HuggingFace. Supports `--db-id` filtering and `--split` selection.
- **`generate_models_from_schema.py`** — Auto-generates SQLAlchemy ORM models from the `richardr1126/spider-schema` HuggingFace dataset. Maps Spider types to SQLAlchemy types, handles foreign keys.

### 7. `MockTokenizer` — `envs/sql_env/server/test_sql_env.py`

Deterministic tokenizer for testing without `transformers`:
- **`apply_chat_template()`** — Converts message text to token IDs via `ord(c) % 256`.
- **`decode()`** — Reverses the encoding back to characters.
- Imported by `app.py` as a fallback when `transformers` is not installed.

---

## Model Changes (Main → Action-Feature)

### `SQLAction`

| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `action_type` | `"DESCRIBE, SAMPLE, QUERY, ANSWER"` | `"describe, sample, query"` | Lowercase, ANSWER removed |
| `argument` | Table name / SQL / answer value | **Removed** | — |
| `action_description` | — | **Added**: description string | Replaces `argument` |
| `tokens` | — | **Added**: `torch.Tensor` | Tokenized conversation |

### `SQLObservation`

| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `question` | NL question string | **Commented out** | — |
| `schema_info` | DB schema description | **Commented out** | — |
| `result` | Last action result | **Commented out** | — |
| `error` | Error message | **Commented out** | — |
| `step_count` | Current step number | **Commented out** | — |
| `budget_remaining` | Steps left | **Commented out** | — |
| `action_history` | Summary of actions | **Commented out** | — |
| `messages` | — | **Added**: `list[Message]` | Full conversation history |
| `tokens` | — | **Added**: `torch.Tensor` | Flattened token tensor |

The original observation fields are **commented out, not deleted** — they're expected to return in a future phase.

### `SQLState`

| Field | Main | Action-Feature | Notes |
|-------|------|----------------|-------|
| `game_name` | `"sql_env"` | **Commented out** | — |
| `history_messages` | — | **Added**: `list[Message]` | Full conversation history |
| `history_tokens` | — | **Added**: `list[torch.Tensor]` | Per-message token tensors |
| `current_action_type` | — | **Added**: `str` (default `"query"`) | Tracks current action |

**Design shift:** The branch moves from a **structured observation** (question + schema + result fields) to a **chat-based observation** (raw messages + tokens). This aligns with how LLM-based agents naturally consume conversational context.

---

## Data Flow

```
User Message (dict: {role: "user", content: "Show me the Student schema"})
    │
    ▼
message_to_action(message)                     [SQLEnvironment or SQLEnvClient]
    ├── Detect action type via keywords
    │   "schema" found → action_type = "describe"
    ├── Append message to _state.history_messages     ← MUTATES STATE
    ├── Tokenize FULL conversation via tokenizer.apply_chat_template()
    └── Return SQLAction(action_type="describe",
    │                     action_description="Show me the Student schema",
    │                     tokens=<tensor>)
    │
    ▼
step(action)                                    [SQLEnvironment]
    ├── Dispatch on action.action_type:
    │   "describe" → _call_ollama_to_select_table("Show me the Student schema")
    │                → returns "Student"
    │                → _get_table_schema("Student")
    │                → introspects SQLAlchemy model columns
    │                → "Table 'Student' has: student_id: integer, ..."
    ├── Create assistant Message with schema info
    ├── Append assistant message to _state.history_messages
    ├── Append action.tokens to _state.history_tokens
    └── _create_observation()
        ├── Flatten all history_tokens via torch.cat → single 1D tensor
        ├── Copy history_messages
        ├── Apply transform (if configured)
        └── Return SQLObservation(messages=[...], tokens=<flat tensor>)
```

---

## External Dependencies Added

| Dependency | Version | Purpose | Integration Point |
|------------|---------|---------|-------------------|
| Ollama (local service) | — | LLM inference for SQL generation + table selection | `sql_environment.py:_call_ollama_for_sql()`, `_call_ollama_to_select_table()` |
| `requests` | >=2.31.0 | HTTP client for Ollama API | `sql_environment.py` |
| `torch` | ==2.2.2 | Tensor operations for tokenized representations | `models.py`, `client.py`, `sql_environment.py` |
| `transformers` | <5 | HuggingFace tokenizers (chat template support) | `app.py:get_tokenizer()` |
| `numpy` | <2 | Torch dependency constraint | `pyproject.toml` |
| `sqlalchemy` | (transitive) | ORM for database schema introspection | `data/databases/models.py` |
| `datasets` | (scripts only) | HuggingFace `load_dataset` for Spider data download | `scripts/download_spider_data.py`, `scripts/generate_models_from_schema.py` |

**Environment variables:**

| Variable | Default | Purpose |
|----------|---------|---------|
| `TOKENIZER_NAME` | `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace tokenizer model |
| `SYSTEM_PROMPT` | Built-in schema description | Custom system prompt override |
| `OLLAMA_MODEL` | `qwen2` | Ollama model for SQL generation |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API endpoint |

---

## Known Gaps (Not Yet Implemented)

| Feature | Status | Notes |
|---------|--------|-------|
| `ANSWER` action type | Not implemented | Designed in main-branch models but removed from action-feature |
| Real database execution | Not implemented | `step()` generates SQL text via Ollama but never executes it against SQLite |
| Reward computation | Not implemented | `reward.py` is empty; 3-layer design exists in README only |
| Answer verification | Not implemented | `verifier.py` is empty |
| Budget tracking | Not implemented | No step limit enforcement |
| Episode question selection | Not implemented | Environment uses hardcoded schema; `student_assessment.json` is present but not loaded by the environment |
| Dockerfile | Not implemented | File is empty; `install_deps.sh` is ready |
| `openenv.yaml` manifest | Not implemented | Empty file |
| Formal test suite | Not implemented | No `tests/` directory; only `MockTokenizer` and notebook tests |

---

## Gotchas

- **`message_to_action()` mutates state:** On the server side, `message_to_action()` appends the message to `_state.history_messages` *before* tokenizing. This means calling it has a side effect — it's not a pure function. If you call it twice with the same message, you'll get duplicate entries in history.

- **Client vs Server `message_to_action` diverge:** The server version (`sql_environment.py:message_to_action`) manages state internally and mutates `_state`. The client version (`client.py:message_to_action`) requires passing `history_messages` explicitly and does not manage state. They have different signatures.

- **Schema description is hardcoded in `sql_environment.py`:** The `_build_schema_description()` function returns a fixed string with table/column names that don't perfectly match the SQLAlchemy ORM models. For example, the schema description says `Students (student_id, person_id, student_acc_status)` but the ORM model has `Students (student_id, student_details)`.

- **Ollama failure mode is silent:** If Ollama is unreachable, `_call_ollama_to_select_table()` catches all exceptions and returns the *first table in the dict* (`Address`). No error is surfaced to the caller. `_call_ollama_for_sql()` returns an error string, but it's treated as a normal assistant message.

- **Original observation fields are commented out, not deleted:** `SQLObservation` still has `question`, `schema_info`, `result`, `error`, `step_count`, `budget_remaining`, and `action_history` as comments. They're intended to return in a later phase.

- **`MockTokenizer` is imported by production code:** `app.py` imports `MockTokenizer` from `test_sql_env.py` at runtime when `transformers` is missing. This couples test utilities to production bootstrap.

- **`test_env.ipynb` lives at project root:** Not inside `tests/` or `envs/`. Easy to miss when exploring the codebase.

- **Pydantic + torch.Tensor:** `SQLAction`, `SQLObservation`, and `SQLState` use `torch.Tensor` fields with Pydantic. This requires `arbitrary_types_allowed = True` in the Pydantic model config (inherited from OpenEnv base classes). Standard Pydantic serialization (`.model_dump()`) won't work out of the box with tensors.

---

## Entry Points for Reading

| What You Want to Understand | Start Here | Then Read |
|----------------------------|------------|-----------|
| How actions are processed | `envs/sql_env/server/sql_environment.py:step()` | `_detect_action_type()`, `_call_ollama_for_sql()` |
| How messages become actions | `envs/sql_env/server/sql_environment.py:message_to_action()` | `envs/sql_env/client.py:message_to_action()` |
| Data contracts | `envs/sql_env/models.py` | Compare with `git show main:envs/sql_env/models.py` |
| Server bootstrap | `envs/sql_env/server/app.py` | `get_tokenizer()`, `create_sql_environment()` |
| Database schema | `envs/sql_env/data/databases/models.py` | `envs/sql_env/data/questions/student_assessment.json` |
| Client-side usage | `envs/sql_env/client.py` | `test_env.ipynb` |
| Data preparation | `envs/sql_env/scripts/download_spider_data.py` | `scripts/generate_models_from_schema.py` |

---

*This document covers only the `action-feature` branch delta. For the overall project design (POMDP architecture, reward layers, episode lifecycle), see [README.md](README.md).*

These issues are also changed as of now, check when we modify.
Known Issues Discovered
1. sqlalchemy is missing from pyproject.toml on the branch
2. Pydantic/TypedDict incompatibility on Python < 3.12 (demo auto-patches)
3. Hardcoded schema description in sql_environment.py doesn't match ORM models
4. Silent Ollama fallback to first table on connection failure

Please check the latest remote branch action-feature