sql_env / DEMO_action_feature.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

Feature Demo: action-feature β€” Core Action Dispatch System

Generated: 2026-02-28T14:46:19+01:00 Branch: origin/action-feature vs main Project: sql-env-onboarding (SQLEnv RL Environment for OpenEnv Challenge) Execution environment: Python 3.11, torch 2.2.2, MockTokenizer (no Ollama required)


What This Feature Does

The action-feature branch adds the core action dispatch system to SQLEnv β€” the RL environment where AI agents learn interactive SQL exploration. Before this branch, the environment had data models but no way to actually process agent actions.

Now an agent can send natural language messages like "describe the students table" or "find all students enrolled in CS101", and the environment automatically classifies them into one of three action types (describe, sample, query), dispatches them to the appropriate handler, and returns observations with tokenized conversation history. This is the fundamental interaction loop that makes the environment usable for reinforcement learning.

The system works in two modes:

  • Mock path (no external services): Uses MockTokenizer for tokenization. Describe and sample actions work fully; query actions return a graceful error since Ollama is unavailable.
  • Ollama path (full pipeline): Uses a local Ollama LLM to select relevant tables for describe/sample and to generate SQL for query actions.

Quickstart

# 1. Checkout the branch
git checkout origin/action-feature --detach

# 2. Install dependencies (from the sql_env package directory)
cd envs/sql_env/
uv sync
uv add sqlalchemy   # missing from pyproject.toml, needed for ORM models

# 3. Run the full demo (71 checks, ~2 seconds, no external services needed)
uv run python demo_action_feature.py

# 4. (Optional) Return to main when done
git checkout main

Prerequisites: Python 3.11+, uv package manager, git. Optional: Ollama running locally with llama3.2 model for full query generation (set OLLAMA_MODEL=llama3.2).

Note: sqlalchemy is required by the ORM models but was omitted from pyproject.toml on the branch. The uv add sqlalchemy step is necessary.

Note: On Python < 3.12, a Pydantic compatibility patch is needed because openenv defines Message with typing.TypedDict instead of typing_extensions.TypedDict. The demo script applies this patch automatically.


Live Demo β€” Mock Path (Primary)

All output below was captured by executing uv run python demo_action_feature.py on the action-feature branch with no Ollama model configured (default qwen2 not installed).

1. Environment Instantiation with MockTokenizer

The environment loads 9 SQLAlchemy ORM models (Address, Person, Student, Course, etc.) and initializes conversation state with a system prompt.

uv run python demo_action_feature.py
============================================================
  1. Environment Instantiation with MockTokenizer
============================================================
  [PASS] MockTokenizer created
  [PASS] SQLEnvironment created
  [PASS] System prompt stored
  [PASS] Tokenizer stored
  [PASS] 9 database models loaded
  [PASS] Initial state has 1 message (system)
  [PASS] Initial state has 1 token tensor
  [PASS] System message role is correct
  [PASS] System message content matches prompt

The environment correctly stores the custom system prompt, attaches the tokenizer, and loads all 9 database table models from SQLAlchemy.

2. reset() Returns Valid SQLObservation

============================================================
  2. reset() Returns Valid SQLObservation
============================================================
  [PASS] reset() returns SQLObservation
  [PASS] Observation has messages list
  [PASS] Messages contain system prompt
  [PASS] Observation has tokens tensor
  [PASS] Tokens tensor is 1D
  [PASS] Tokens are non-empty
  [PASS] done is False
  [PASS] reward is None

  Observation details:
    messages count: 1
    tokens shape:   torch.Size([29])
    tokens[:10]:    [89, 111, 117, 32, 97, 114, 101, 32, 97, 32]

After reset(), the observation contains one message (the system prompt) tokenized into a 1D tensor via MockTokenizer (char codes mod 256). The episode is not done and has no reward β€” ready for the agent's first action.

3. Action Type Detection

The keyword classifier maps user messages to three action types: describe, sample, and query.

============================================================
  3. Action Type Detection (_detect_action_type)
============================================================
  [PASS] 'describe the students table...' -> describe
  [PASS] 'what columns does Course have...' -> describe
  [PASS] 'show me the schema...' -> describe
  [PASS] 'show me sample rows from students...' -> sample
  [PASS] 'give me example data...' -> sample
  [PASS] 'how many rows are in Courses...' -> sample
  [PASS] 'find all students enrolled in CS101...' -> query
  [PASS] 'select count(*) from students...' -> query
  [PASS] 'what is the average score...' -> query

All 9 test cases correctly classified. Keywords like "describe"/"schema"/"columns" trigger describe; "sample"/"example"/"rows" trigger sample; everything else defaults to query.

4. message_to_action() Creates Properly Typed SQLAction

============================================================
  4. message_to_action() Creates Properly Typed SQLAction
============================================================
  [PASS] Returns SQLAction
  [PASS] action_type is 'describe'
  [PASS] action_description is message content
  [PASS] tokens is a torch.Tensor
  [PASS] tokens tensor is non-empty

  Action details:
    action_type:        describe
    action_description: describe the students table
    tokens shape:       torch.Size([1, 57])
  [PASS] message_to_action adds message to history
  [PASS] History[1] is the user message
  [PASS] Sample message -> action_type 'sample'
  [PASS] Query message -> action_type 'query'

message_to_action() converts a raw message dict into a SQLAction with the correct action_type, action_description, and tokenized tensor. Important side effect: it also appends the message to _state.history_messages before tokenizing, so the tokenizer sees the full conversation context.

5. step() with Describe Action

Without Ollama, the describe action falls back to the first table (Address) and returns its full SQLAlchemy-derived schema.

============================================================
  5. step() with Describe Action (Schema from SQLAlchemy Models)
============================================================
  [PASS] step() returns SQLObservation
  [PASS] History now has 3 messages (system + user + assistant)
  [PASS] Last message is from assistant
  [PASS] Assistant message contains 'columns'
  [PASS] Schema info contains column descriptions

  Describe response (first 200 chars):
    Table 'Address' has the following columns:

- address_id: integer number
- line_1: text (up to 255 characters)
- line_2: text (up to 255 characters)
- city: text (up to 255 characters)
- zip_postcode:
  [PASS] Tokens tensor grew after step

The schema is extracted directly from SQLAlchemy model introspection (column names, types converted to natural language). The observation now has 3 messages (system β†’ user β†’ assistant) and the token tensor grew.

6. step() with Sample Action

The sample action generates a SELECT * ... LIMIT 10 query for the target table.

============================================================
  6. step() with Sample Action (Generates SQL Query Text)
============================================================
  [PASS] step(sample) returns assistant message
  [PASS] Sample response contains SELECT
  [PASS] Sample response contains LIMIT

  Sample response:
    Here's a query to sample data from Address:

SELECT * FROM Address LIMIT 10;

7. step() with Query Action (No Ollama)

Without Ollama, the query action returns a clear error message instead of crashing.

============================================================
  7. step() with Query Action (Mock Path β€” No Ollama)
============================================================
  [PASS] step(query) returns assistant message
  [PASS] Query response is error string (no Ollama) or SQL

  Query response (no Ollama):
    Error: Ollama returned status 404

The error is a graceful 404 (Ollama server is running but the default qwen2 model isn't installed). The conversation continues normally β€” the error becomes part of the message history.

8. Multi-Turn Conversation State Management

Three turns of alternating user/assistant messages, verifying the conversation history grows correctly.

============================================================
  8. Multi-Turn Conversation State Management
============================================================
  [PASS] After turn 1: 3 messages (sys + user + assistant)
  [PASS] After turn 2: 5 messages (sys + u1 + a1 + u2 + a2)
  [PASS] Tokens grew between turns
  [PASS] After turn 3: 7 messages
  [PASS] Tokens grew again
  [PASS] Message roles follow expected pattern

  Conversation summary after 3 turns:
    [0]     system: You are a test SQL assistant....
    [1]       user: describe the Address table...
    [2]  assistant: Table 'Address' has the following columns:  - address_id: in...
    [3]       user: show me sample rows...
    [4]  assistant: Here's a query to sample data from Address:  SELECT * FROM A...
    [5]       user: find all addresses in New York...
    [6]  assistant: Error: Ollama returned status 404...
  Total tokens: 987

Message roles follow the expected [system, user, assistant, user, assistant, user, assistant] pattern. Token count grows monotonically: the _create_observation() method flattens all history_tokens into a single 1D tensor via torch.cat.

9. Client Serialization Roundtrip

The SQLEnvClient converts tensor β†’ list for JSON transport and list β†’ tensor on the return path.

============================================================
  9. Client Serialization Roundtrip (_step_payload)
============================================================
  [PASS] Payload has action_type
  [PASS] Payload has action_description
  [PASS] Tokens converted to list
  [PASS] Token values preserved
  [PASS] _parse_result returns StepResult
  [PASS] Observation messages parsed
  [PASS] Tokens converted back to tensor
  [PASS] Token values correct
  [PASS] Reward parsed

  Payload serialization:
    action_type: query
    tokens (list): [[1, 2, 3, 4, 5]]

Edge Cases Exercised

Empty System Prompt

When no system prompt is provided (empty string), the environment correctly starts with zero messages and an empty token tensor.

  [PASS] Empty system prompt -> no messages in history
  [PASS] Empty system prompt -> empty tokens

Invalid Message Inputs

message_to_action() validates its input and raises ValueError for malformed messages.

  [PASS] Missing 'role' raises ValueError
  [PASS] Missing 'content' raises ValueError
  [PASS] None content raises ValueError

Unknown Table Handling

Schema lookup and sample query generation gracefully handle non-existent tables.

  [PASS] Unknown table returns 'not found' message
  [PASS] Unknown table sample returns error

MockTokenizer Encode/Decode Roundtrip

The mock tokenizer's ord(c) % 256 encoding correctly roundtrips through chr(t) decoding.

  [PASS] MockTokenizer encode/decode roundtrip

Invalid Tokenizer Validation

The environment constructor rejects tokenizers missing apply_chat_template.

  [PASS] Invalid tokenizer raises ValueError

Live Demo β€” Ollama Path (Optional)

When Ollama is running locally with a compatible model, the query action generates real SQL and the describe action selects the correct table.

Describe with Ollama

With OLLAMA_MODEL=llama3.2, the LLM correctly identifies "Student" as the most relevant table for "describe the students table":

DESCRIBE RESULT:
Table 'Student' has the following columns:

- student_id: integer number
- student_details: text (up to 255 characters)

Compare with mock path: fell back to "Address" (first table in dict). With Ollama, table selection is intelligent.

Query with Ollama

The LLM generates valid SQL for natural language questions:

QUERY RESULT:
SELECT * FROM Students WHERE CourseID IN (SELECT CourseID FROM Courses WHERE CourseName = 'CS101')

Note: The generated SQL references column names from the schema description prompt, not the actual SQLAlchemy model column names. This is expected β€” the LLM generates SQL based on the natural language schema it receives.


Full Result Summary

============================================================
  SUMMARY
============================================================
  Total checks: 71
  Passed:       71
  Failed:       0

  ALL CHECKS PASSED
Category Checks Passed Failed
Imports 1 1 0
Instantiation 8 8 0
reset() 8 8 0
Action detection 9 9 0
message_to_action 9 9 0
step(describe) 6 6 0
step(sample) 3 3 0
step(query) 2 2 0
Multi-turn state 6 6 0
Client serialization 9 9 0
Edge cases 9 9 0
Total 71 71 0

Verification Checklist

  • Environment instantiation with MockTokenizer β€” 8 checks
  • reset() returns valid SQLObservation with system prompt β€” 8 checks
  • Action type detection for all 3 types (describe/sample/query) β€” 9 keywords tested
  • message_to_action() creates SQLAction with correct type and tokens β€” 9 checks
  • step() with describe returns schema from SQLAlchemy models β€” 6 checks
  • step() with sample returns SQL query text β€” 3 checks
  • step() with query returns Ollama error gracefully (mock path) β€” 2 checks
  • Multi-turn conversation state grows correctly β€” 6 checks
  • Client tensor↔list serialization roundtrip β€” 9 checks
  • Edge cases: empty prompt, invalid inputs, unknown tables, tokenizer validation β€” 9 checks

Known Issues Found

  1. sqlalchemy missing from pyproject.toml β€” The ORM models import sqlalchemy but it's not listed as a dependency. Must uv add sqlalchemy manually.

  2. Pydantic/TypedDict incompatibility on Python < 3.12 β€” The openenv library defines Message with typing.TypedDict, but Pydantic 2.x requires typing_extensions.TypedDict. The demo script monkey-patches this, but the issue would affect any direct usage.

  3. Ollama default model (qwen2) unlikely to be installed β€” The default OLLAMA_MODEL is qwen2, which users probably don't have. The 404 error is graceful but confusing. Consider defaulting to llama3.2 or documenting the required model.

  4. describe/sample fallback to first table β€” When Ollama is unavailable, _call_ollama_to_select_table() silently falls back to the first table in the dict (Address). This is correct behavior but may confuse users expecting the table from their query.


File Reference

File What it does
envs/sql_env/demo_action_feature.py Executable demo script (71 checks)
envs/sql_env/server/sql_environment.py Core SQLEnvironment with reset/step/dispatch
envs/sql_env/models.py SQLAction, SQLObservation, SQLState Pydantic models
envs/sql_env/client.py SQLEnvClient with tensor↔list serialization
envs/sql_env/server/test_sql_env.py MockTokenizer (char ord encoding)
envs/sql_env/data/databases/models.py 9 SQLAlchemy ORM models

How to Reproduce

git clone <repo-url>
cd sql-env-onboarding
git checkout origin/action-feature --detach
cd envs/sql_env/
uv sync && uv add sqlalchemy
uv run python demo_action_feature.py      # Mock path: 71/71 checks

# Optional: Ollama path
export OLLAMA_MODEL=llama3.2              # or any installed model
uv run python demo_action_feature.py      # Query actions now return real SQL

Demo generated 2026-02-28. Re-run uv run python demo_action_feature.py on the action-feature branch to refresh.