Onboarding: action-feature Branch
What the
action-featurebranch adds compared tomain. Last updated: 2026-02-28 Focus: Branch delta β new components, model changes, data flow, and gaps.
What This Branch Does
The action-feature branch transforms SQLEnv from a scaffold with well-designed Pydantic models into a partially working environment with real action dispatch (describe/sample/query), Ollama-based SQL generation, a WebSocket client, SQLAlchemy ORM models for the student_assessment database, and Spider question data. It implements the core message β action β step β observation loop that the RL training pipeline will eventually drive.
Branch Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β action-feature: New/Changed Components β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Training Code / Notebook β
β ββββββββββββββββββββββββ β
β β test_env.ipynb NEW β Interactive walkthrough (5 test cells) β
β ββββββββββββ¬ββββββββββββ β
β β imports β
β ββββββββββββΌββββββββββββ ββββββββββββββββββββββββββββ β
β β client.py NEW ββββββΆβ models.py CHANGED β β
β β SQLEnvClient β β SQLAction (+ tokens, β β
β β _step_payload() β β action_desc) β β
β β _parse_result() β β SQLObservation β β
β β _parse_state() β β (messages + tokens) β β
β β message_to_action() β β SQLState β β
β ββββββββββββββββββββββββ β (history + tokens) β β
β β WebSocket ββββββββββββββββββββββββββββ β
β ββββββββββββΌββββββββββββ β
β β server/app.py CHG β FastAPI bootstrap + tokenizer factory β
β β create_sql_env() β β
β ββββββββββββ¬ββββββββββββ β
β β creates β
β ββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ β
β β server/sql_environment.py NEW β β
β β SQLEnvironment(Environment) β β
β β βββ reset() β clear state, return obs β β
β β βββ step(action) β dispatch on action_type β β
β β β βββ "describe" β Ollama selects table β ORM info β β
β β β βββ "sample" β Ollama selects table β SQL gen β β
β β β βββ "query" β Ollama generates SQL from NL β β
β β βββ message_to_action() β detect type, tokenize β β
β β βββ _detect_action_type() β keyword classifier β β
β ββββββββββββ¬ββββββββββββββββββββββββ¬ββββββββββββββββββββββββ β
β β introspects β HTTP calls β
β ββββββββββββΌββββββββββββ βββββββββΌβββββββββββββββββ β
β β data/databases/ β β Ollama (external) β β
β β models.py NEW β β /api/generate β β
β β 9 SQLAlchemy tables β β qwen2 (default) β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ β
β β data/questions/ β β scripts/ NEW β β
β β student_assessment β β download_spider_data.py β β
β β .json NEW β β generate_models_from_schema.pyβ β
β β (30+ Q&A pairs) β ββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββββ β
β β server/test_sql_env β β server/install_deps.sh β β
β β .py MockTokenizer β β Docker setup NEW β β
β β NEW β ββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Files Changed/Added
| File | Status | Purpose |
|---|---|---|
envs/sql_env/models.py |
Changed | Rewired SQLAction, SQLObservation, SQLState for message+token paradigm |
envs/sql_env/__init__.py |
Changed | Exports SQLAction, SQLObservation, SQLState; lazy client import |
envs/sql_env/client.py |
New | SQLEnvClient(EnvClient) β WebSocket client with tensor serialization |
envs/sql_env/server/sql_environment.py |
New | SQLEnvironment(Environment) β core environment logic (463 lines) |
envs/sql_env/server/app.py |
Changed | FastAPI bootstrap with tokenizer factory + MockTokenizer fallback |
envs/sql_env/server/__init__.py |
Changed | Exports SQLEnvironment |
envs/sql_env/server/test_sql_env.py |
New | MockTokenizer for testing without transformers library |
envs/sql_env/server/install_deps.sh |
New | Docker setup script: pip install + pre-download GPT-2 tokenizer |
envs/sql_env/server/requirements.txt |
New | Server-side pip deps for Docker (fastapi, torch, transformers, etc.) |
envs/sql_env/data/databases/models.py |
New | SQLAlchemy ORM for student_assessment DB (9 model classes) |
envs/sql_env/data/questions/student_assessment.json |
New | 30+ Spider questions with gold SQL, tokenized queries |
envs/sql_env/scripts/download_spider_data.py |
New | Downloads Spider questions from HuggingFace by db_id |
envs/sql_env/scripts/generate_models_from_schema.py |
New | Auto-generates SQLAlchemy models from Spider schema dataset |
envs/sql_env/pyproject.toml |
Changed | Python constrained to >=3.11,<3.13; added requests>=2.31.0 |
envs/sql_env/uv.lock |
Changed | Lock file updated for new dependencies |
README.md |
Changed | Added "Current Package State" section with pinned dependency rationale |
envs/sql_env/server/environment.py |
Emptied | Replaced by sql_environment.py |
test_env.ipynb |
New | Jupyter notebook with 5 interactive test scenarios |
Total: 18 files changed, +5702 / -412 lines.
Key Components Introduced
1. SQLEnvironment β envs/sql_env/server/sql_environment.py
The heart of the branch. Implements the OpenEnv Environment interface with three action types:
| Action Type | Dispatch Flow | Output |
|---|---|---|
describe |
Ollama selects table β _get_table_schema() introspects SQLAlchemy model |
Column names + natural language types |
sample |
Ollama selects table β _generate_sample_query() |
SELECT * FROM <table> LIMIT 10; |
query |
_call_ollama_for_sql() sends NL + schema to Ollama |
Generated SQL string |
Key methods:
reset()β Clears conversation history, re-initializes system prompt message + tokens. Returns initialSQLObservation.step(action)β Dispatches onaction.action_type. Appends assistant response tohistory_messages, stores action tokens inhistory_tokens. Returns flattened observation.message_to_action(message)β Server-side conversion ofMessagedict βSQLAction. Detects action type via keywords, appends message to state history, tokenizes full conversation._detect_action_type(content)β Keyword classifier: checks for "describe"/"schema"/"columns" βdescribe, "sample"/"example"/"rows" βsample, default βquery._create_observation()β BuildsSQLObservationfrom current state. Flattens allhistory_tokensinto a single 1D tensor viatorch.cat._get_table_schema(table_name)β Introspects SQLAlchemy model columns, converts types to natural language._call_ollama_for_sql(query)/_call_ollama_to_select_table(request)β HTTP POST to Ollama/api/generate.
Constructor params: tokenizer (must have apply_chat_template), optional system_prompt, optional transform.
Environment variables: OLLAMA_MODEL (default: qwen2), OLLAMA_BASE_URL (default: http://localhost:11434).
2. SQLEnvClient β envs/sql_env/client.py
WebSocket client extending OpenEnv's EnvClient[SQLAction, SQLObservation, SQLState]. Handles tensorβlist serialization for JSON transport:
_step_payload(action)β Convertsaction.tokens(Tensor) to Python list for JSON._parse_result(payload)β Deserializes response βStepResult[SQLObservation], converting token lists back to tensors._parse_state(payload)β Deserializes state βSQLStatewith tensor reconstruction.message_to_action(message, tokenizer, history_messages)β Client-side version of action creation (mirrors server logic). Requires passing a tokenizer explicitly.
3. server/app.py β FastAPI Bootstrap
Changed from a stub to a working application:
get_tokenizer()β Loads HuggingFace tokenizer fromTOKENIZER_NAMEenv var (default:mistralai/Mistral-7B-Instruct-v0.1). Falls back toMockTokenizerfromtest_sql_env.pyiftransformersis not installed.create_sql_environment()β Factory function creatingSQLEnvironmentper WebSocket session.app = create_app(create_sql_environment, SQLAction, SQLObservation, env_name="sql_env")β Wires up WebSocket endpoints.
4. SQLAlchemy ORM β envs/sql_env/data/databases/models.py
9 model classes for the student_assessment database:
| Model | Table | Key Columns |
|---|---|---|
Address |
Addresses | address_id, line_1, city, country |
Person |
People | person_id, first_name, last_name, email_address |
Student |
Students | student_id, student_details |
Course |
Courses | course_id (String PK), course_name |
PersonAddress |
People_Addresses | person_id (FK), address_id (FK), date_from/to |
StudentCourseRegistration |
Student_Course_Registrations | student_id (FK), course_id (FK), registration_date |
StudentCourseAttendance |
Student_Course_Attendance | student_id (FK), course_id (FK), date_of_attendance |
Candidate |
Candidates | candidate_id, candidate_details |
CandidateAssessment |
Candidate_Assessments | candidate_id (FK), qualification, assessment_date |
All models include proper foreign key relationships with back_populates.
5. Spider Question Data β envs/sql_env/data/questions/student_assessment.json
3,355-line JSON file containing 30+ question-answer pairs from the Spider dataset. Each entry includes:
db_idβ alwaysstudent_assessmentquestionβ natural language question (e.g., "which course has most number of registered students?")queryβ gold SQL (e.g.,SELECT T1.course_name FROM courses AS T1 JOIN student_course_registrations...)query_toks/query_toks_no_value/question_toksβ tokenized versions
6. Data Preparation Scripts β envs/sql_env/scripts/
download_spider_data.pyβ CLI tool to download Spider questions from HuggingFace. Supports--db-idfiltering and--splitselection.generate_models_from_schema.pyβ Auto-generates SQLAlchemy ORM models from therichardr1126/spider-schemaHuggingFace dataset. Maps Spider types to SQLAlchemy types, handles foreign keys.
7. MockTokenizer β envs/sql_env/server/test_sql_env.py
Deterministic tokenizer for testing without transformers:
apply_chat_template()β Converts message text to token IDs viaord(c) % 256.decode()β Reverses the encoding back to characters.- Imported by
app.pyas a fallback whentransformersis not installed.
Model Changes (Main β Action-Feature)
SQLAction
| Field | Main | Action-Feature | Notes |
|---|---|---|---|
action_type |
"DESCRIBE, SAMPLE, QUERY, ANSWER" |
"describe, sample, query" |
Lowercase, ANSWER removed |
argument |
Table name / SQL / answer value | Removed | β |
action_description |
β | Added: description string | Replaces argument |
tokens |
β | Added: torch.Tensor |
Tokenized conversation |
SQLObservation
| Field | Main | Action-Feature | Notes |
|---|---|---|---|
question |
NL question string | Commented out | β |
schema_info |
DB schema description | Commented out | β |
result |
Last action result | Commented out | β |
error |
Error message | Commented out | β |
step_count |
Current step number | Commented out | β |
budget_remaining |
Steps left | Commented out | β |
action_history |
Summary of actions | Commented out | β |
messages |
β | Added: list[Message] |
Full conversation history |
tokens |
β | Added: torch.Tensor |
Flattened token tensor |
The original observation fields are commented out, not deleted β they're expected to return in a future phase.
SQLState
| Field | Main | Action-Feature | Notes |
|---|---|---|---|
game_name |
"sql_env" |
Commented out | β |
history_messages |
β | Added: list[Message] |
Full conversation history |
history_tokens |
β | Added: list[torch.Tensor] |
Per-message token tensors |
current_action_type |
β | Added: str (default "query") |
Tracks current action |
Design shift: The branch moves from a structured observation (question + schema + result fields) to a chat-based observation (raw messages + tokens). This aligns with how LLM-based agents naturally consume conversational context.
Data Flow
User Message (dict: {role: "user", content: "Show me the Student schema"})
β
βΌ
message_to_action(message) [SQLEnvironment or SQLEnvClient]
βββ Detect action type via keywords
β "schema" found β action_type = "describe"
βββ Append message to _state.history_messages β MUTATES STATE
βββ Tokenize FULL conversation via tokenizer.apply_chat_template()
βββ Return SQLAction(action_type="describe",
β action_description="Show me the Student schema",
β tokens=<tensor>)
β
βΌ
step(action) [SQLEnvironment]
βββ Dispatch on action.action_type:
β "describe" β _call_ollama_to_select_table("Show me the Student schema")
β β returns "Student"
β β _get_table_schema("Student")
β β introspects SQLAlchemy model columns
β β "Table 'Student' has: student_id: integer, ..."
βββ Create assistant Message with schema info
βββ Append assistant message to _state.history_messages
βββ Append action.tokens to _state.history_tokens
βββ _create_observation()
βββ Flatten all history_tokens via torch.cat β single 1D tensor
βββ Copy history_messages
βββ Apply transform (if configured)
βββ Return SQLObservation(messages=[...], tokens=<flat tensor>)
External Dependencies Added
| Dependency | Version | Purpose | Integration Point |
|---|---|---|---|
| Ollama (local service) | β | LLM inference for SQL generation + table selection | sql_environment.py:_call_ollama_for_sql(), _call_ollama_to_select_table() |
requests |
>=2.31.0 | HTTP client for Ollama API | sql_environment.py |
torch |
==2.2.2 | Tensor operations for tokenized representations | models.py, client.py, sql_environment.py |
transformers |
<5 | HuggingFace tokenizers (chat template support) | app.py:get_tokenizer() |
numpy |
<2 | Torch dependency constraint | pyproject.toml |
sqlalchemy |
(transitive) | ORM for database schema introspection | data/databases/models.py |
datasets |
(scripts only) | HuggingFace load_dataset for Spider data download |
scripts/download_spider_data.py, scripts/generate_models_from_schema.py |
Environment variables:
| Variable | Default | Purpose |
|---|---|---|
TOKENIZER_NAME |
mistralai/Mistral-7B-Instruct-v0.1 |
HuggingFace tokenizer model |
SYSTEM_PROMPT |
Built-in schema description | Custom system prompt override |
OLLAMA_MODEL |
qwen2 |
Ollama model for SQL generation |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API endpoint |
Known Gaps (Not Yet Implemented)
| Feature | Status | Notes |
|---|---|---|
ANSWER action type |
Not implemented | Designed in main-branch models but removed from action-feature |
| Real database execution | Not implemented | step() generates SQL text via Ollama but never executes it against SQLite |
| Reward computation | Not implemented | reward.py is empty; 3-layer design exists in README only |
| Answer verification | Not implemented | verifier.py is empty |
| Budget tracking | Not implemented | No step limit enforcement |
| Episode question selection | Not implemented | Environment uses hardcoded schema; student_assessment.json is present but not loaded by the environment |
| Dockerfile | Not implemented | File is empty; install_deps.sh is ready |
openenv.yaml manifest |
Not implemented | Empty file |
| Formal test suite | Not implemented | No tests/ directory; only MockTokenizer and notebook tests |
Gotchas
message_to_action()mutates state: On the server side,message_to_action()appends the message to_state.history_messagesbefore tokenizing. This means calling it has a side effect β it's not a pure function. If you call it twice with the same message, you'll get duplicate entries in history.Client vs Server
message_to_actiondiverge: The server version (sql_environment.py:message_to_action) manages state internally and mutates_state. The client version (client.py:message_to_action) requires passinghistory_messagesexplicitly and does not manage state. They have different signatures.Schema description is hardcoded in
sql_environment.py: The_build_schema_description()function returns a fixed string with table/column names that don't perfectly match the SQLAlchemy ORM models. For example, the schema description saysStudents (student_id, person_id, student_acc_status)but the ORM model hasStudents (student_id, student_details).Ollama failure mode is silent: If Ollama is unreachable,
_call_ollama_to_select_table()catches all exceptions and returns the first table in the dict (Address). No error is surfaced to the caller._call_ollama_for_sql()returns an error string, but it's treated as a normal assistant message.Original observation fields are commented out, not deleted:
SQLObservationstill hasquestion,schema_info,result,error,step_count,budget_remaining, andaction_historyas comments. They're intended to return in a later phase.MockTokenizeris imported by production code:app.pyimportsMockTokenizerfromtest_sql_env.pyat runtime whentransformersis missing. This couples test utilities to production bootstrap.test_env.ipynblives at project root: Not insidetests/orenvs/. Easy to miss when exploring the codebase.Pydantic + torch.Tensor:
SQLAction,SQLObservation, andSQLStateusetorch.Tensorfields with Pydantic. This requiresarbitrary_types_allowed = Truein the Pydantic model config (inherited from OpenEnv base classes). Standard Pydantic serialization (.model_dump()) won't work out of the box with tensors.
Entry Points for Reading
| What You Want to Understand | Start Here | Then Read |
|---|---|---|
| How actions are processed | envs/sql_env/server/sql_environment.py:step() |
_detect_action_type(), _call_ollama_for_sql() |
| How messages become actions | envs/sql_env/server/sql_environment.py:message_to_action() |
envs/sql_env/client.py:message_to_action() |
| Data contracts | envs/sql_env/models.py |
Compare with git show main:envs/sql_env/models.py |
| Server bootstrap | envs/sql_env/server/app.py |
get_tokenizer(), create_sql_environment() |
| Database schema | envs/sql_env/data/databases/models.py |
envs/sql_env/data/questions/student_assessment.json |
| Client-side usage | envs/sql_env/client.py |
test_env.ipynb |
| Data preparation | envs/sql_env/scripts/download_spider_data.py |
scripts/generate_models_from_schema.py |
This document covers only the action-feature branch delta. For the overall project design (POMDP architecture, reward layers, episode lifecycle), see README.md.
These issues are also changed as of now, check when we modify. Known Issues Discovered
- sqlalchemy is missing from pyproject.toml on the branch
- Pydantic/TypedDict incompatibility on Python < 3.12 (demo auto-patches)
- Hardcoded schema description in sql_environment.py doesn't match ORM models
- Silent Ollama fallback to first table on connection failure
Please check the latest remote branch action-feature