Feature Demo: action-feature β Core Action Dispatch System
Generated: 2026-02-28T14:46:19+01:00 Branch:
origin/action-featurevsmainProject: sql-env-onboarding (SQLEnv RL Environment for OpenEnv Challenge) Execution environment: Python 3.11, torch 2.2.2, MockTokenizer (no Ollama required)
What This Feature Does
The action-feature branch adds the core action dispatch system to SQLEnv β the RL environment where AI agents learn interactive SQL exploration. Before this branch, the environment had data models but no way to actually process agent actions.
Now an agent can send natural language messages like "describe the students table" or "find all students enrolled in CS101", and the environment automatically classifies them into one of three action types (describe, sample, query), dispatches them to the appropriate handler, and returns observations with tokenized conversation history. This is the fundamental interaction loop that makes the environment usable for reinforcement learning.
The system works in two modes:
- Mock path (no external services): Uses
MockTokenizerfor tokenization. Describe and sample actions work fully; query actions return a graceful error since Ollama is unavailable. - Ollama path (full pipeline): Uses a local Ollama LLM to select relevant tables for describe/sample and to generate SQL for query actions.
Quickstart
# 1. Checkout the branch
git checkout origin/action-feature --detach
# 2. Install dependencies (from the sql_env package directory)
cd envs/sql_env/
uv sync
uv add sqlalchemy # missing from pyproject.toml, needed for ORM models
# 3. Run the full demo (71 checks, ~2 seconds, no external services needed)
uv run python demo_action_feature.py
# 4. (Optional) Return to main when done
git checkout main
Prerequisites: Python 3.11+, uv package manager, git.
Optional: Ollama running locally with llama3.2 model for full query generation (set OLLAMA_MODEL=llama3.2).
Note:
sqlalchemyis required by the ORM models but was omitted frompyproject.tomlon the branch. Theuv add sqlalchemystep is necessary.
Note: On Python < 3.12, a Pydantic compatibility patch is needed because
openenvdefinesMessagewithtyping.TypedDictinstead oftyping_extensions.TypedDict. The demo script applies this patch automatically.
Live Demo β Mock Path (Primary)
All output below was captured by executing uv run python demo_action_feature.py on the action-feature branch with no Ollama model configured (default qwen2 not installed).
1. Environment Instantiation with MockTokenizer
The environment loads 9 SQLAlchemy ORM models (Address, Person, Student, Course, etc.) and initializes conversation state with a system prompt.
uv run python demo_action_feature.py
============================================================
1. Environment Instantiation with MockTokenizer
============================================================
[PASS] MockTokenizer created
[PASS] SQLEnvironment created
[PASS] System prompt stored
[PASS] Tokenizer stored
[PASS] 9 database models loaded
[PASS] Initial state has 1 message (system)
[PASS] Initial state has 1 token tensor
[PASS] System message role is correct
[PASS] System message content matches prompt
The environment correctly stores the custom system prompt, attaches the tokenizer, and loads all 9 database table models from SQLAlchemy.
2. reset() Returns Valid SQLObservation
============================================================
2. reset() Returns Valid SQLObservation
============================================================
[PASS] reset() returns SQLObservation
[PASS] Observation has messages list
[PASS] Messages contain system prompt
[PASS] Observation has tokens tensor
[PASS] Tokens tensor is 1D
[PASS] Tokens are non-empty
[PASS] done is False
[PASS] reward is None
Observation details:
messages count: 1
tokens shape: torch.Size([29])
tokens[:10]: [89, 111, 117, 32, 97, 114, 101, 32, 97, 32]
After reset(), the observation contains one message (the system prompt) tokenized into a 1D tensor via MockTokenizer (char codes mod 256). The episode is not done and has no reward β ready for the agent's first action.
3. Action Type Detection
The keyword classifier maps user messages to three action types: describe, sample, and query.
============================================================
3. Action Type Detection (_detect_action_type)
============================================================
[PASS] 'describe the students table...' -> describe
[PASS] 'what columns does Course have...' -> describe
[PASS] 'show me the schema...' -> describe
[PASS] 'show me sample rows from students...' -> sample
[PASS] 'give me example data...' -> sample
[PASS] 'how many rows are in Courses...' -> sample
[PASS] 'find all students enrolled in CS101...' -> query
[PASS] 'select count(*) from students...' -> query
[PASS] 'what is the average score...' -> query
All 9 test cases correctly classified. Keywords like "describe"/"schema"/"columns" trigger describe; "sample"/"example"/"rows" trigger sample; everything else defaults to query.
4. message_to_action() Creates Properly Typed SQLAction
============================================================
4. message_to_action() Creates Properly Typed SQLAction
============================================================
[PASS] Returns SQLAction
[PASS] action_type is 'describe'
[PASS] action_description is message content
[PASS] tokens is a torch.Tensor
[PASS] tokens tensor is non-empty
Action details:
action_type: describe
action_description: describe the students table
tokens shape: torch.Size([1, 57])
[PASS] message_to_action adds message to history
[PASS] History[1] is the user message
[PASS] Sample message -> action_type 'sample'
[PASS] Query message -> action_type 'query'
message_to_action() converts a raw message dict into a SQLAction with the correct action_type, action_description, and tokenized tensor. Important side effect: it also appends the message to _state.history_messages before tokenizing, so the tokenizer sees the full conversation context.
5. step() with Describe Action
Without Ollama, the describe action falls back to the first table (Address) and returns its full SQLAlchemy-derived schema.
============================================================
5. step() with Describe Action (Schema from SQLAlchemy Models)
============================================================
[PASS] step() returns SQLObservation
[PASS] History now has 3 messages (system + user + assistant)
[PASS] Last message is from assistant
[PASS] Assistant message contains 'columns'
[PASS] Schema info contains column descriptions
Describe response (first 200 chars):
Table 'Address' has the following columns:
- address_id: integer number
- line_1: text (up to 255 characters)
- line_2: text (up to 255 characters)
- city: text (up to 255 characters)
- zip_postcode:
[PASS] Tokens tensor grew after step
The schema is extracted directly from SQLAlchemy model introspection (column names, types converted to natural language). The observation now has 3 messages (system β user β assistant) and the token tensor grew.
6. step() with Sample Action
The sample action generates a SELECT * ... LIMIT 10 query for the target table.
============================================================
6. step() with Sample Action (Generates SQL Query Text)
============================================================
[PASS] step(sample) returns assistant message
[PASS] Sample response contains SELECT
[PASS] Sample response contains LIMIT
Sample response:
Here's a query to sample data from Address:
SELECT * FROM Address LIMIT 10;
7. step() with Query Action (No Ollama)
Without Ollama, the query action returns a clear error message instead of crashing.
============================================================
7. step() with Query Action (Mock Path β No Ollama)
============================================================
[PASS] step(query) returns assistant message
[PASS] Query response is error string (no Ollama) or SQL
Query response (no Ollama):
Error: Ollama returned status 404
The error is a graceful 404 (Ollama server is running but the default qwen2 model isn't installed). The conversation continues normally β the error becomes part of the message history.
8. Multi-Turn Conversation State Management
Three turns of alternating user/assistant messages, verifying the conversation history grows correctly.
============================================================
8. Multi-Turn Conversation State Management
============================================================
[PASS] After turn 1: 3 messages (sys + user + assistant)
[PASS] After turn 2: 5 messages (sys + u1 + a1 + u2 + a2)
[PASS] Tokens grew between turns
[PASS] After turn 3: 7 messages
[PASS] Tokens grew again
[PASS] Message roles follow expected pattern
Conversation summary after 3 turns:
[0] system: You are a test SQL assistant....
[1] user: describe the Address table...
[2] assistant: Table 'Address' has the following columns: - address_id: in...
[3] user: show me sample rows...
[4] assistant: Here's a query to sample data from Address: SELECT * FROM A...
[5] user: find all addresses in New York...
[6] assistant: Error: Ollama returned status 404...
Total tokens: 987
Message roles follow the expected [system, user, assistant, user, assistant, user, assistant] pattern. Token count grows monotonically: the _create_observation() method flattens all history_tokens into a single 1D tensor via torch.cat.
9. Client Serialization Roundtrip
The SQLEnvClient converts tensor β list for JSON transport and list β tensor on the return path.
============================================================
9. Client Serialization Roundtrip (_step_payload)
============================================================
[PASS] Payload has action_type
[PASS] Payload has action_description
[PASS] Tokens converted to list
[PASS] Token values preserved
[PASS] _parse_result returns StepResult
[PASS] Observation messages parsed
[PASS] Tokens converted back to tensor
[PASS] Token values correct
[PASS] Reward parsed
Payload serialization:
action_type: query
tokens (list): [[1, 2, 3, 4, 5]]
Edge Cases Exercised
Empty System Prompt
When no system prompt is provided (empty string), the environment correctly starts with zero messages and an empty token tensor.
[PASS] Empty system prompt -> no messages in history
[PASS] Empty system prompt -> empty tokens
Invalid Message Inputs
message_to_action() validates its input and raises ValueError for malformed messages.
[PASS] Missing 'role' raises ValueError
[PASS] Missing 'content' raises ValueError
[PASS] None content raises ValueError
Unknown Table Handling
Schema lookup and sample query generation gracefully handle non-existent tables.
[PASS] Unknown table returns 'not found' message
[PASS] Unknown table sample returns error
MockTokenizer Encode/Decode Roundtrip
The mock tokenizer's ord(c) % 256 encoding correctly roundtrips through chr(t) decoding.
[PASS] MockTokenizer encode/decode roundtrip
Invalid Tokenizer Validation
The environment constructor rejects tokenizers missing apply_chat_template.
[PASS] Invalid tokenizer raises ValueError
Live Demo β Ollama Path (Optional)
When Ollama is running locally with a compatible model, the query action generates real SQL and the describe action selects the correct table.
Describe with Ollama
With OLLAMA_MODEL=llama3.2, the LLM correctly identifies "Student" as the most relevant table for "describe the students table":
DESCRIBE RESULT:
Table 'Student' has the following columns:
- student_id: integer number
- student_details: text (up to 255 characters)
Compare with mock path: fell back to "Address" (first table in dict). With Ollama, table selection is intelligent.
Query with Ollama
The LLM generates valid SQL for natural language questions:
QUERY RESULT:
SELECT * FROM Students WHERE CourseID IN (SELECT CourseID FROM Courses WHERE CourseName = 'CS101')
Note: The generated SQL references column names from the schema description prompt, not the actual SQLAlchemy model column names. This is expected β the LLM generates SQL based on the natural language schema it receives.
Full Result Summary
============================================================
SUMMARY
============================================================
Total checks: 71
Passed: 71
Failed: 0
ALL CHECKS PASSED
| Category | Checks | Passed | Failed |
|---|---|---|---|
| Imports | 1 | 1 | 0 |
| Instantiation | 8 | 8 | 0 |
| reset() | 8 | 8 | 0 |
| Action detection | 9 | 9 | 0 |
| message_to_action | 9 | 9 | 0 |
| step(describe) | 6 | 6 | 0 |
| step(sample) | 3 | 3 | 0 |
| step(query) | 2 | 2 | 0 |
| Multi-turn state | 6 | 6 | 0 |
| Client serialization | 9 | 9 | 0 |
| Edge cases | 9 | 9 | 0 |
| Total | 71 | 71 | 0 |
Verification Checklist
- Environment instantiation with MockTokenizer β 8 checks
-
reset()returns valid SQLObservation with system prompt β 8 checks - Action type detection for all 3 types (describe/sample/query) β 9 keywords tested
-
message_to_action()creates SQLAction with correct type and tokens β 9 checks -
step()with describe returns schema from SQLAlchemy models β 6 checks -
step()with sample returns SQL query text β 3 checks -
step()with query returns Ollama error gracefully (mock path) β 2 checks - Multi-turn conversation state grows correctly β 6 checks
- Client tensorβlist serialization roundtrip β 9 checks
- Edge cases: empty prompt, invalid inputs, unknown tables, tokenizer validation β 9 checks
Known Issues Found
sqlalchemymissing frompyproject.tomlβ The ORM models importsqlalchemybut it's not listed as a dependency. Mustuv add sqlalchemymanually.Pydantic/TypedDict incompatibility on Python < 3.12 β The
openenvlibrary definesMessagewithtyping.TypedDict, but Pydantic 2.x requirestyping_extensions.TypedDict. The demo script monkey-patches this, but the issue would affect any direct usage.Ollama default model (
qwen2) unlikely to be installed β The defaultOLLAMA_MODELisqwen2, which users probably don't have. The 404 error is graceful but confusing. Consider defaulting tollama3.2or documenting the required model.describe/sample fallback to first table β When Ollama is unavailable,
_call_ollama_to_select_table()silently falls back to the first table in the dict (Address). This is correct behavior but may confuse users expecting the table from their query.
File Reference
| File | What it does |
|---|---|
envs/sql_env/demo_action_feature.py |
Executable demo script (71 checks) |
envs/sql_env/server/sql_environment.py |
Core SQLEnvironment with reset/step/dispatch |
envs/sql_env/models.py |
SQLAction, SQLObservation, SQLState Pydantic models |
envs/sql_env/client.py |
SQLEnvClient with tensorβlist serialization |
envs/sql_env/server/test_sql_env.py |
MockTokenizer (char ord encoding) |
envs/sql_env/data/databases/models.py |
9 SQLAlchemy ORM models |
How to Reproduce
git clone <repo-url>
cd sql-env-onboarding
git checkout origin/action-feature --detach
cd envs/sql_env/
uv sync && uv add sqlalchemy
uv run python demo_action_feature.py # Mock path: 71/71 checks
# Optional: Ollama path
export OLLAMA_MODEL=llama3.2 # or any installed model
uv run python demo_action_feature.py # Query actions now return real SQL
Demo generated 2026-02-28. Re-run uv run python demo_action_feature.py on the action-feature branch to refresh.