Flickinshots commited on
Commit
38c9982
·
verified ·
1 Parent(s): b1ec107

Deploy Project Epsilon Space bundle

Browse files
.env.app.example ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ OPENROUTER_API_KEY=
2
+ OPENROUTER_SITE_URL=http://localhost:7860
3
+ OPENROUTER_APP_NAME=Autonomous Executive Assistant Sandbox
.env.training.example ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ OPENROUTER_API_KEY=
2
+ OPENROUTER_MODEL=google/gemma-4-31b-it
3
+ OPENROUTER_SITE_URL=http://localhost:8888
4
+ OPENROUTER_APP_NAME=Autonomous Executive Assistant Sandbox Training
5
+ OPENROUTER_TEMPERATURE=0.1
6
+ OPENROUTER_MAX_TOKENS=600
.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ .venv-app/
2
+ .venv-training/
3
+ artifacts/
4
+ .pytest_cache/
5
+ __pycache__/
6
+ .env
7
+ .env.app
8
+ .env.training
AGENTS.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Repository Guidelines
2
+
3
+ ## Project Structure & Module Organization
4
+ Core application code lives in `src/executive_assistant/`. Keep environment logic in `env.py`, SQLite workspace behavior in `workspace.py`, reward logic in `graders.py`, typed contracts in `models.py`, provider configuration in `config.py`, prompt construction in `prompts.py`, OpenRouter calls in `llm_service.py`, shared episode execution in `runner.py`, policies in `agent.py`, and RL logic in `training.py`. Tests live in `tests/` and should mirror the module they validate. Operational scripts live in `scripts/`. Use `training_env.ipynb` with the `scalerhack2-training` kernel for experiments and rollout export only; move stable logic back into `src/`. Top-level runtime files include `app.py`, `openenv.yaml`, `requirements*.txt`, and `PRD.md`.
5
+
6
+ ## Build, Test, and Development Commands
7
+ Set up the separate app and training environments with:
8
+
9
+ ```bash
10
+ bash scripts/setup_app_env.sh
11
+ bash scripts/setup_training_env.sh
12
+ ```
13
+
14
+ Run the test suite with `.venv-training/bin/pytest -q`. Start the local Gradio entrypoint with `.venv-app/bin/python app.py`. Evaluate the deterministic baseline across all seeded tasks with `.venv-training/bin/python scripts/evaluate_policies.py --provider baseline`. Run one full episode trace with `.venv-training/bin/python scripts/run_policy_episode.py --task hard_rag_reply --provider baseline`. Train the tabular RL policy with `.venv-training/bin/python scripts/train_rl_agent.py --episodes 300`. To exercise the Gemma model through OpenRouter, set `OPENROUTER_API_KEY` first, then switch `--provider openrouter` or set `POLICY_PROVIDER = "openrouter"` in the notebook.
15
+
16
+ ```bash
17
+ .venv-training/bin/python scripts/evaluate_policies.py --provider baseline
18
+ ```
19
+
20
+ ## Coding Style & Naming Conventions
21
+ Target Python 3.11+ and use 4-space indentation. Prefer explicit types and small, single-purpose functions. Follow existing naming patterns: `snake_case` for functions, variables, and modules; `PascalCase` for Pydantic models and environment classes; uppercase for constants such as `TASK_SEEDS`. Keep comments brief and only where behavior is not obvious. There is no formatter configured yet, so match the existing style and keep imports tidy.
22
+
23
+ ## Testing Guidelines
24
+ Tests use `pytest`. Add or update tests with every behavioral change, especially for environment transitions, reward shaping, seeded task completion, runner traces, OpenRouter service behavior, and RL training smoke paths. Name test files `test_*.py` and test functions `test_*`. Prefer deterministic assertions against observations, snapshots, action logs, checkpoints, and scores over loose text checks. If you change notebook-driven workflows, validate the underlying module or script rather than testing notebook JSON behavior only.
25
+
26
+ ## Commit & Pull Request Guidelines
27
+ Current history uses short, imperative commit subjects such as `Initial RL agent sandbox scaffold` and `Add PRD progress checkpoint note`. Continue that style: concise subject line, capitalized first word, no trailing period. Pull requests should include a brief summary, note any changed scenarios or rewards, list validation steps run (`pytest -q`, smoke tests), and attach screenshots only when UI behavior in `app.py` changes.
28
+
29
+ ## Agent-Specific Notes
30
+ Preserve determinism in the environment, graders, and baseline policy. Live API access belongs in policy layers such as `OpenRouterPolicy`, not in the workspace or reward path. Keep `EpisodeRunner` as the shared execution path for scripts, tests, Gradio, and notebook workflows. Treat OpenRouter calls as optional runtime behavior: tests and RL smoke runs must stay runnable without network access. If notebook experiments uncover a useful change, codify it in `src/` and cover it with tests before treating it as part of the baseline.
31
+
32
+ ## Agent Workflow Loop
33
+ All execution surfaces in this repository should follow the same loop:
34
+
35
+ 1. Load environment state
36
+ 2. Generate observation
37
+ 3. Send to LLM or policy
38
+ 4. Receive structured action
39
+ 5. Execute action in workspace
40
+ 6. Update state
41
+ 7. Repeat until task complete
42
+
43
+ In code, keep this flow inside `EpisodeRunner`. Use `initialize()` for steps 1-2, `choose_action()` for steps 3-4, and `advance()` plus `env.step()` for steps 5-6. Do not duplicate bespoke episode loops in notebooks, scripts, or UI handlers.
Dockerfile ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ COPY requirements.app.txt .
7
+ RUN pip install --no-cache-dir -r requirements.app.txt
8
+
9
+ COPY . .
10
+
11
+ EXPOSE 7860
12
+ ENV GRADIO_SERVER_NAME=0.0.0.0
13
+
14
+ CMD ["python", "app.py"]
PRD.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Product Requirements Document (PRD): Autonomous Executive Assistant Sandbox
2
+
3
+ **Target Deployment:** Hugging Face Spaces (Gradio UI + OpenEnv Container)
4
+ **Primary Dev Environment:** Kaggle / Jupyter Notebooks (`training_env.ipynb`)
5
+
6
+ ---
7
+
8
+ ## Progress Note
9
+ Status as of 2026-04-08:
10
+
11
+ - The deterministic SQLite-backed workspace is implemented with action logging, seeded scenarios, snapshots, and richer step semantics.
12
+ - The OpenEnv contract is represented in typed Pydantic models for observations, actions, rewards, and policy decisions.
13
+ - Deterministic graders are implemented for all three seeded tasks with dense reward shaping and terminal success checks.
14
+ - A shared `EpisodeRunner` now owns the agent workflow loop across scripts, tests, the notebook, and Gradio.
15
+ - A deterministic baseline policy is implemented and solves all three seeded tasks end to end.
16
+ - An OpenRouter-backed `google/gemma-4-31b-it` policy path is integrated, prompt-hardened, and validated on the hard task.
17
+ - Separate app and training environments are in place, including a registered `scalerhack2-training` Jupyter kernel.
18
+ - The training notebook loads `.env.training`, exports traces, runs RL training, and saves checkpoints.
19
+ - A tabular Q-learning policy exists as a seeded-task RL prototype and can be trained, evaluated, and checkpointed.
20
+ - The current Gradio app can reset scenarios and run full episodes for baseline and OpenRouter policies.
21
+
22
+ Resume from here:
23
+
24
+ - Make the trained RL checkpoint a first-class runtime policy in the app and scripts.
25
+ - Refine the Gradio UI from one-shot episode execution into a stepwise or streaming judge-facing experience.
26
+ - Ensure the app, notebook, and scripts can all use the same trained RL artifact without drift.
27
+ - Expand notebook analysis cells and runtime metrics for stronger model-vs-baseline-vs-RL comparisons.
28
+ - Keep the current tabular RL policy as a prototype while leaving room for a richer learned policy after hackathon delivery.
29
+
30
+ ---
31
+
32
+ ## 1. Executive Summary
33
+ We are building a deterministic, isolated OpenEnv simulation of a corporate or academic workflow. Instead of wrapping a brittle, live API like Gmail (which causes rate limits and non-deterministic grading), we will engineer an **in-memory SQLite Mock Mail Server & Local File System**.
34
+
35
+ The AI agent will act as an Autonomous Executive Assistant. It must navigate a chaotic mock inbox, extract deadlines to a mock task manager, negotiate meeting times, and perform Retrieval-Augmented Generation (RAG) over a mock file system to draft intelligent replies.
36
+
37
+ This environment proves the agent's ability to act as a *router* and a *tool-user*, moving beyond text generation into full workflow automation.
38
+
39
+ ---
40
+
41
+ ## 2. Core Architecture & Stack
42
+ * **State Management:** In-memory SQLite (`sqlite3`) simulating a mail server, calendar, and file system.
43
+ * **Typing & Validation:** `pydantic` (Strictly defining Observations, Actions, and Rewards per OpenEnv spec).
44
+ * **Development & Debugging:** Jupyter Notebooks plus scriptable runners. The state machine, model prompts, rollout export, and RL smoke training are exercised from `training_env.ipynb` and mirrored by CLI scripts.
45
+ * **Model Runtime:** OpenRouter using `google/gemma-4-31b-it` for live policy inference, with prompt/schema hardening and response repair.
46
+ * **RL Prototype:** Tabular Q-learning over a finite action template catalog, with teacher warm-start from the deterministic baseline and JSON checkpoint persistence.
47
+ * **Deployment & Visualization:** Gradio (to visualize the inbox state for judges) packaged within a Docker container on Hugging Face Spaces.
48
+
49
+ ---
50
+
51
+ ## 3. Step-by-Step Implementation Plan
52
+
53
+ ### Phase 1: The Mock Server Setup (Notebook Environment)
54
+ **Goal:** Build the deterministic world the agent will live in. Do this entirely in the first few cells of your Kaggle notebook so you can instantly query and reset the state.
55
+
56
+ 1. **Database Initialization:** Create an in-memory SQLite database (`sqlite3.connect(':memory:')`).
57
+ 2. **Table Creation:**
58
+ * `Emails` (id, sender, recipient, subject, body, timestamp, is_read, is_archived)
59
+ * `Todos` (id, task_name, deadline_date, context)
60
+ * `Files` (id, filename, content_text) - *This acts as the local knowledge base.*
61
+ 3. **The Wrapper Class (`MockWorkspace`):** Write Python methods to interact with this DB safely.
62
+ * `get_unread_emails()`
63
+ * `send_reply(email_id, text)`
64
+ * `create_todo(task, date)`
65
+ * `search_documents(query)`
66
+
67
+ ### Phase 2: OpenEnv Specifications (Pydantic Models)
68
+ **Goal:** Define the strict APIs the agent must use. This is the core of the hackathon requirement.
69
+
70
+ **Observation Space:**
71
+ ```python
72
+ class WorkspaceObservation(BaseModel):
73
+ current_time: str
74
+ unread_emails: List[Dict[str, str]] # ID, Sender, Subject snippet
75
+ active_todos: List[str]
76
+ last_action_status: str # e.g., "Email successfully sent to Manager"
77
+ ```
78
+
79
+ **Action Space:**
80
+ ```python
81
+ class AssistantAction(BaseModel):
82
+ action_type: Literal["read_email", "reply", "forward", "add_todo", "archive", "search_files"]
83
+ target_id: Optional[str] = None # email_id or file_id
84
+ payload: Optional[str] = None # The body of the reply, or the search query
85
+ secondary_payload: Optional[str] = None # Date for todos, or recipient for forwards
86
+ ```
87
+
88
+ **Reward Space:**
89
+ ```python
90
+ class TaskReward(BaseModel):
91
+ step_reward: float
92
+ total_score: float
93
+ is_done: bool
94
+ reasoning: str
95
+ ```
96
+
97
+ ### Phase 3: Task Definitions & Deterministic Graders
98
+ Implement the three required difficulty tiers. The grader simply runs SQL queries against your mock database to verify the agent's actions.
99
+
100
+ #### Task 1: Easy (Syllabus & Deadline Extraction)
101
+ * **Initial State:** DB injected with an email from `prof.smith@university.edu` containing 3 specific project deadlines.
102
+ * **Agent Goal:** Read email, create 3 corresponding tasks in the `Todos` table, and archive the email.
103
+ * **Grader Logic:** `SELECT COUNT(*) FROM Todos WHERE deadline_date IS NOT NULL;` -> If 3, return `+1.0`.
104
+
105
+ #### Task 2: Medium (Triage & Meeting Negotiation)
106
+ * **Initial State:** DB injected with 5 emails: 3 newsletters, 1 urgent client complaint, 1 team meeting reschedule request.
107
+ * **Agent Goal:** Archive newsletters, forward the client complaint to `manager@company.com`, and reply to the reschedule request proposing a time.
108
+ * **Grader Logic:** Check if newsletters are marked `is_archived=True` (+0.3). Check if complaint is in the DB as sent to manager (+0.4). Check if reply contains a valid time string (+0.3).
109
+
110
+ #### Task 3: Hard (Autonomous RAG & Drafting)
111
+ * **Initial State:** DB injected with an email from a VIP stakeholder asking for specific metrics from the "Q3 Architecture Report".
112
+ * **Agent Goal:** Use `action_type: "search_files"` with query "Q3 Architecture", read the file contents, and use `action_type: "reply"` synthesizing the exact metrics from the file into a professional response.
113
+ * **Grader Logic:** Check if `search_files` was called (+0.3). Use regex to verify the specific metric string from the mock file exists in the sent reply body (+0.7).
114
+
115
+ ### Phase 4: Baseline Agent Testing (Notebook Environment)
116
+ **Goal:** Prove the environment works using both a deterministic policy and a live model-backed policy.
117
+ 1. Use the deterministic `BaselineAgent` to verify seeded tasks and grader behavior.
118
+ 2. Use a standard `while not done:` loop, now centralized in `EpisodeRunner`.
119
+ 3. Pass the `WorkspaceObservation` to the live model policy through OpenRouter using strict JSON outputs.
120
+ 4. Pass the model action into the environment's `step()` function.
121
+ 5. Print and export the interaction loop directly in the notebook to debug prompt formatting, policy behavior, and reward shaping.
122
+
123
+ #### Agent Workflow Loop
124
+ 1. Load environment state
125
+ 2. Generate observation
126
+ 3. Send to LLM
127
+ 4. Receive structured action
128
+ 5. Execute action in workspace
129
+ 6. Update state
130
+ 7. Repeat until task complete
131
+
132
+ Implementation note: this loop is now represented directly in the shared `EpisodeRunner` so the notebook, scripts, tests, and Gradio app all execute the same control flow.
133
+
134
+ ### Phase 5: Hugging Face Spaces & Gradio Deployment
135
+ **Goal:** Package the OpenEnv logic and build a visual interface so judges can physically see the agent working, including deterministic, model-backed, and learned-policy runs.
136
+
137
+ 1. **The Gradio Wrapper (`app.py`):**
138
+ * Build a Gradio UI that exposes selectable policies (`baseline`, `openrouter`, and trained `rl`) and visually represents the `Emails`, `Todos`, `Files`, and action history tables.
139
+ * As the OpenEnv `step()` function runs, update the Gradio state step by step so judges can watch the inbox drain, the to-do list populate, and the replies send in real time.
140
+ * Ensure the app can load the same trained RL checkpoint artifact produced by the notebook and CLI training scripts.
141
+ 2. **Containerization (`Dockerfile`):**
142
+ ```dockerfile
143
+ FROM python:3.11-slim
144
+ WORKDIR /app
145
+ COPY requirements.app.txt .
146
+ RUN pip install --no-cache-dir -r requirements.app.txt
147
+ COPY . .
148
+ # OpenEnv requires specific metadata handling, Gradio runs on 7860
149
+ EXPOSE 7860
150
+ ENV GRADIO_SERVER_NAME="0.0.0.0"
151
+ CMD ["python", "app.py"]
152
+ ```
153
+ 3. **OpenEnv Spec Compliance:** Ensure your `openenv.yaml` is correctly mapped to your Pydantic classes at the root of the repository.
154
+ 4. **Push to HF:** Commit the repo to a Hugging Face Space, tag it with `openenv`, and ensure the policy runners and training instructions are easily executable via the README instructions.
README.md CHANGED
@@ -1,14 +1,54 @@
1
  ---
2
- title: EmailMaestro
3
- emoji: 🔥
4
- colorFrom: red
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 6.11.0
8
- app_file: app.py
9
  pinned: false
10
- license: mit
11
- short_description: ' Deterministic RL-style workspace for an exec assist agent'
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: EmailMaestro | Executive Assistant Sandbox
3
+ emoji: "🧭"
4
+ colorFrom: yellow
5
+ colorTo: gray
6
+ sdk: docker
7
+ app_port: 7860
 
8
  pinned: false
9
+ short_description: OpenEnv executive assistant sandbox demo for judges.
 
10
  ---
11
 
12
+ # Project Epsilon
13
+
14
+ Discrete Hugging Face Space for the **Autonomous Executive Assistant Sandbox**, built for the **OpenEnv Scaler x Meta x PyTorch Hack**.
15
+
16
+ ## Team
17
+
18
+ - Team name: `Project Epsilon`
19
+ - Hugging Face usernames: `@Flickinshots`, `@HF_USERNAME_2`, `@HF_USERNAME_3`
20
+ - Space repo: `Flickinshots/EmailMaestro`
21
+
22
+ Replace the placeholder usernames above once the final team accounts are ready.
23
+
24
+ ## What This Space Shows
25
+
26
+ - A deterministic OpenEnv-style executive assistant environment backed by an isolated SQLite workspace
27
+ - A judge-friendly Gradio interface that replays the shared `EpisodeRunner` loop step by step
28
+ - Side-by-side policy execution for `baseline`, `rl`, and optional `openrouter`
29
+ - Visible inbox, todo, file-search, and action-log state so evaluators can inspect each mutation
30
+
31
+ ## Hack Context
32
+
33
+ OpenEnv was announced by Hugging Face and Meta as an open source framework for building agent environments with typed observations, actions, and rewards. The Scaler dashboard for this hack lists the submission round as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space packages our environment to match that workflow: deterministic tasks, structured actions, visible state transitions, and reproducible judge demos.
34
+
35
+ ## Runtime Notes
36
+
37
+ - SDK: `docker`
38
+ - App port: `7860`
39
+ - Entry point: `python app.py`
40
+ - Optional secret: `OPENROUTER_API_KEY`
41
+ - A trained RL checkpoint is bundled in `artifacts/checkpoints/` so the `rl` policy is available immediately in the demo.
42
+
43
+ ## Judge Flow
44
+
45
+ 1. Open the Space and choose one of the seeded scenarios.
46
+ 2. Run the deterministic `baseline` policy for a guaranteed reference trace.
47
+ 3. Switch to `rl` to replay the bundled learned checkpoint.
48
+ 4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed path.
49
+
50
+ ## References
51
+
52
+ - Hack dashboard: https://www.scaler.com/openenv-hackathon
53
+ - OpenEnv launch: https://huggingface.co/blog/openenv
54
+ - Space URL: https://huggingface.co/spaces/Flickinshots/EmailMaestro
app.py ADDED
@@ -0,0 +1,915 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import os
5
+ import time
6
+ import uuid
7
+ from html import escape
8
+
9
+ import gradio as gr
10
+
11
+ from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
12
+ from src.executive_assistant.config import AppRuntimeConfig, OpenRouterConfig, load_env_file
13
+ from src.executive_assistant.env import ExecutiveAssistantEnv
14
+ from src.executive_assistant.runner import EpisodeRunner
15
+ from src.executive_assistant.training import QLearningPolicy, default_checkpoint_path
16
+
17
+ load_env_file(AppRuntimeConfig().env_file)
18
+ APP_RUNTIME = AppRuntimeConfig()
19
+ EMAIL_COLUMNS = ["id", "sender", "recipient", "subject", "body", "timestamp", "is_read", "is_archived"]
20
+ TODO_COLUMNS = ["id", "task_name", "deadline_date", "context"]
21
+ FILE_COLUMNS = ["id", "filename", "content_text"]
22
+ ACTION_LOG_COLUMNS = ["id", "action_type", "target_id", "payload", "secondary_payload", "status"]
23
+ TRACE_COLUMNS = ["step", "reasoning", "action_type", "status", "score", "done"]
24
+ APP_CSS = """
25
+ :root {
26
+ color-scheme: dark;
27
+ --ea-bg: #120f0c;
28
+ --ea-bg-soft: #1a1511;
29
+ --ea-panel: rgba(28, 22, 18, 0.88);
30
+ --ea-panel-strong: #241c17;
31
+ --ea-ink: #f5ede2;
32
+ --ea-muted: #b7a796;
33
+ --ea-border: rgba(236, 214, 188, 0.12);
34
+ --ea-border-strong: rgba(236, 214, 188, 0.24);
35
+ --ea-accent: #c97943;
36
+ --ea-accent-deep: #e1a16f;
37
+ --ea-highlight: #3a2a1f;
38
+ --ea-success: #72c79a;
39
+ --ea-danger: #ef8d76;
40
+ --ea-shadow: 0 24px 70px rgba(0, 0, 0, 0.34);
41
+ }
42
+
43
+ .gradio-container {
44
+ min-height: 100vh;
45
+ background:
46
+ radial-gradient(circle at top left, rgba(124, 73, 39, 0.22), transparent 24%),
47
+ radial-gradient(circle at 85% 10%, rgba(201, 121, 67, 0.16), transparent 22%),
48
+ linear-gradient(180deg, #17120f 0%, #0f0c0a 100%);
49
+ color: var(--ea-ink);
50
+ font-family: "Avenir Next", "Segoe UI", sans-serif;
51
+ }
52
+
53
+ .gradio-container .prose,
54
+ .gradio-container .gr-markdown,
55
+ .gradio-container .gr-button,
56
+ .gradio-container .gr-input,
57
+ .gradio-container .gr-box,
58
+ .gradio-container .gr-form,
59
+ .gradio-container .gr-panel {
60
+ color: var(--ea-ink);
61
+ }
62
+
63
+ .app-shell {
64
+ max-width: 1480px;
65
+ margin: 0 auto;
66
+ padding: 18px 18px 28px;
67
+ }
68
+
69
+ .hero {
70
+ background:
71
+ linear-gradient(140deg, rgba(33, 25, 20, 0.96), rgba(21, 17, 14, 0.96)),
72
+ linear-gradient(90deg, rgba(201, 121, 67, 0.12), transparent);
73
+ border: 1px solid var(--ea-border);
74
+ border-radius: 32px;
75
+ padding: 34px;
76
+ box-shadow: var(--ea-shadow);
77
+ margin-bottom: 20px;
78
+ position: relative;
79
+ overflow: hidden;
80
+ }
81
+
82
+ .hero::after {
83
+ content: "";
84
+ position: absolute;
85
+ inset: auto -10% -44% 34%;
86
+ height: 220px;
87
+ background: radial-gradient(circle, rgba(201, 121, 67, 0.18), transparent 62%);
88
+ pointer-events: none;
89
+ }
90
+
91
+ .hero-grid {
92
+ display: grid;
93
+ grid-template-columns: minmax(0, 1.7fr) minmax(280px, 0.95fr);
94
+ gap: 22px;
95
+ align-items: end;
96
+ }
97
+
98
+ .hero-kicker {
99
+ display: inline-flex;
100
+ align-items: center;
101
+ gap: 10px;
102
+ padding: 7px 12px;
103
+ border-radius: 999px;
104
+ background: rgba(201, 121, 67, 0.10);
105
+ border: 1px solid rgba(201, 121, 67, 0.18);
106
+ color: var(--ea-accent-deep);
107
+ font-size: 0.76rem;
108
+ letter-spacing: 0.14em;
109
+ text-transform: uppercase;
110
+ margin-bottom: 16px;
111
+ }
112
+
113
+ .hero-copy {
114
+ position: relative;
115
+ z-index: 1;
116
+ }
117
+
118
+ .hero h1 {
119
+ margin: 0 0 12px;
120
+ font-family: "Baskerville", "Times New Roman", serif;
121
+ font-size: clamp(2.6rem, 5vw, 4.5rem);
122
+ line-height: 1.05;
123
+ letter-spacing: -0.05em;
124
+ max-width: 10ch;
125
+ }
126
+
127
+ .hero p {
128
+ margin: 0;
129
+ max-width: 760px;
130
+ color: var(--ea-muted);
131
+ font-size: 1.02rem;
132
+ line-height: 1.65;
133
+ }
134
+
135
+ .hero-strip {
136
+ display: flex;
137
+ gap: 12px;
138
+ flex-wrap: wrap;
139
+ margin-top: 22px;
140
+ }
141
+
142
+ .hero-pill {
143
+ background: rgba(255, 255, 255, 0.05);
144
+ color: var(--ea-ink);
145
+ border: 1px solid rgba(236, 214, 188, 0.08);
146
+ border-radius: 999px;
147
+ padding: 10px 14px;
148
+ font-size: 0.84rem;
149
+ backdrop-filter: blur(12px);
150
+ }
151
+
152
+ .hero-aside {
153
+ position: relative;
154
+ z-index: 1;
155
+ background: rgba(255, 255, 255, 0.04);
156
+ border: 1px solid rgba(236, 214, 188, 0.08);
157
+ border-radius: 24px;
158
+ padding: 20px;
159
+ backdrop-filter: blur(12px);
160
+ }
161
+
162
+ .hero-aside-label {
163
+ margin: 0 0 10px;
164
+ color: var(--ea-accent-deep);
165
+ font-size: 0.8rem;
166
+ letter-spacing: 0.14em;
167
+ text-transform: uppercase;
168
+ }
169
+
170
+ .hero-aside-value {
171
+ margin: 0 0 14px;
172
+ font-family: "Baskerville", "Times New Roman", serif;
173
+ font-size: 1.6rem;
174
+ line-height: 1.05;
175
+ }
176
+
177
+ .hero-aside-copy {
178
+ margin: 0;
179
+ color: var(--ea-muted);
180
+ line-height: 1.6;
181
+ }
182
+
183
+ .panel-card,
184
+ .status-card {
185
+ background: var(--ea-panel);
186
+ border: 1px solid var(--ea-border);
187
+ border-radius: 24px;
188
+ box-shadow: var(--ea-shadow);
189
+ backdrop-filter: blur(10px);
190
+ }
191
+
192
+ .panel-card {
193
+ padding: 18px;
194
+ }
195
+
196
+ .status-card {
197
+ padding: 22px 22px 18px;
198
+ }
199
+
200
+ .panel-title {
201
+ margin: 0 0 6px;
202
+ font-family: "Baskerville", "Times New Roman", serif;
203
+ font-size: 1.5rem;
204
+ letter-spacing: -0.03em;
205
+ }
206
+
207
+ .panel-copy {
208
+ margin: 0 0 16px;
209
+ color: var(--ea-muted);
210
+ line-height: 1.55;
211
+ }
212
+
213
+ .surface-card {
214
+ background: rgba(23, 18, 14, 0.84);
215
+ border: 1px solid var(--ea-border);
216
+ border-radius: 24px;
217
+ box-shadow: var(--ea-shadow);
218
+ overflow: hidden;
219
+ }
220
+
221
+ .surface-card .gr-tab-nav {
222
+ background: rgba(255, 255, 255, 0.03);
223
+ padding: 10px 10px 0;
224
+ border-bottom: 1px solid var(--ea-border);
225
+ }
226
+
227
+ .surface-card .gr-tab-nav button {
228
+ border-radius: 16px 16px 0 0;
229
+ border: 1px solid transparent;
230
+ color: var(--ea-muted);
231
+ font-weight: 600;
232
+ }
233
+
234
+ .surface-card .gr-tab-nav button.selected {
235
+ background: var(--ea-panel-strong);
236
+ color: var(--ea-ink);
237
+ border-color: var(--ea-border);
238
+ }
239
+
240
+ .surface-card .gr-tabitem {
241
+ padding: 18px;
242
+ }
243
+
244
+ .status-topline {
245
+ display: flex;
246
+ align-items: center;
247
+ justify-content: space-between;
248
+ gap: 14px;
249
+ margin-bottom: 12px;
250
+ }
251
+
252
+ .status-title {
253
+ font-family: "Baskerville", "Times New Roman", serif;
254
+ font-size: 1.7rem;
255
+ letter-spacing: -0.04em;
256
+ }
257
+
258
+ .status-badge {
259
+ display: inline-flex;
260
+ align-items: center;
261
+ border-radius: 999px;
262
+ padding: 8px 13px;
263
+ font-size: 0.78rem;
264
+ text-transform: uppercase;
265
+ letter-spacing: 0.12em;
266
+ border: 1px solid transparent;
267
+ background: rgba(201, 121, 67, 0.10);
268
+ }
269
+
270
+ .status-badge.running,
271
+ .status-badge.initialized {
272
+ border-color: rgba(180, 95, 45, 0.18);
273
+ color: var(--ea-accent-deep);
274
+ }
275
+
276
+ .status-badge.completed.success {
277
+ background: rgba(45, 122, 88, 0.10);
278
+ border-color: rgba(45, 122, 88, 0.18);
279
+ color: var(--ea-success);
280
+ }
281
+
282
+ .status-badge.completed.failure {
283
+ background: rgba(178, 76, 56, 0.10);
284
+ border-color: rgba(178, 76, 56, 0.16);
285
+ color: var(--ea-danger);
286
+ }
287
+
288
+ .metric-grid {
289
+ display: grid;
290
+ grid-template-columns: repeat(4, minmax(0, 1fr));
291
+ gap: 12px;
292
+ margin-bottom: 12px;
293
+ }
294
+
295
+ .metric {
296
+ background: rgba(255, 255, 255, 0.04);
297
+ border: 1px solid rgba(236, 214, 188, 0.08);
298
+ border-radius: 18px;
299
+ padding: 14px;
300
+ }
301
+
302
+ .metric-label {
303
+ color: var(--ea-muted);
304
+ font-size: 0.72rem;
305
+ text-transform: uppercase;
306
+ letter-spacing: 0.11em;
307
+ margin-bottom: 7px;
308
+ }
309
+
310
+ .metric-value {
311
+ font-size: 1rem;
312
+ line-height: 1.25;
313
+ }
314
+
315
+ .status-reason {
316
+ background: rgba(201, 121, 67, 0.08);
317
+ border: 1px solid rgba(236, 214, 188, 0.08);
318
+ border-radius: 18px;
319
+ padding: 14px 15px;
320
+ color: var(--ea-muted);
321
+ line-height: 1.55;
322
+ }
323
+
324
+ .scenario-brief {
325
+ background: linear-gradient(180deg, rgba(32, 25, 20, 0.92), rgba(22, 18, 14, 0.94));
326
+ border: 1px solid var(--ea-border);
327
+ border-radius: 24px;
328
+ padding: 22px;
329
+ color: var(--ea-ink);
330
+ box-shadow: var(--ea-shadow);
331
+ }
332
+
333
+ .scenario-brief h3 {
334
+ margin: 0 0 10px;
335
+ font-family: "Baskerville", "Times New Roman", serif;
336
+ font-size: 1.5rem;
337
+ letter-spacing: -0.03em;
338
+ }
339
+
340
+ .scenario-brief p {
341
+ margin: 0 0 14px;
342
+ color: var(--ea-muted);
343
+ line-height: 1.6;
344
+ }
345
+
346
+ .scenario-brief ul {
347
+ margin: 0;
348
+ padding-left: 18px;
349
+ color: var(--ea-ink);
350
+ }
351
+
352
+ .scenario-brief li {
353
+ margin-bottom: 8px;
354
+ line-height: 1.5;
355
+ }
356
+
357
+ .panel-card .gr-form,
358
+ .panel-card .gr-box,
359
+ .panel-card .gr-group {
360
+ border: 0;
361
+ background: transparent;
362
+ box-shadow: none;
363
+ }
364
+
365
+ .panel-card .gr-button,
366
+ .gradio-container .gr-button {
367
+ min-height: 48px;
368
+ border-radius: 999px;
369
+ font-weight: 700;
370
+ letter-spacing: 0.02em;
371
+ }
372
+
373
+ .gradio-container button.primary {
374
+ background: linear-gradient(135deg, var(--ea-accent) 0%, var(--ea-accent-deep) 100%);
375
+ border: 0;
376
+ box-shadow: 0 14px 30px rgba(138, 62, 23, 0.18);
377
+ }
378
+
379
+ .gradio-container button.secondary {
380
+ background: rgba(255, 255, 255, 0.05);
381
+ border: 1px solid var(--ea-border-strong);
382
+ color: var(--ea-ink);
383
+ }
384
+
385
+ .gradio-container label,
386
+ .gradio-container .gr-block-label,
387
+ .gradio-container .gr-form > label {
388
+ color: var(--ea-muted);
389
+ font-size: 0.76rem;
390
+ text-transform: uppercase;
391
+ letter-spacing: 0.12em;
392
+ }
393
+
394
+ .gradio-container input,
395
+ .gradio-container textarea,
396
+ .gradio-container select {
397
+ background: rgba(255, 255, 255, 0.05) !important;
398
+ border: 1px solid rgba(236, 214, 188, 0.12) !important;
399
+ border-radius: 16px !important;
400
+ color: var(--ea-ink) !important;
401
+ }
402
+
403
+ .gradio-container .gr-accordion,
404
+ .gradio-container .gr-panel,
405
+ .gradio-container .gr-box,
406
+ .gradio-container .block {
407
+ border-color: var(--ea-border) !important;
408
+ }
409
+
410
+ .workspace-grid .gr-dataframe,
411
+ .workspace-grid .gr-code,
412
+ .workspace-grid .gr-box,
413
+ .workspace-grid .gr-panel {
414
+ border-radius: 20px !important;
415
+ overflow: hidden;
416
+ }
417
+
418
+ .workspace-grid .gr-code,
419
+ .workspace-grid .gr-dataframe {
420
+ box-shadow: inset 0 0 0 1px rgba(58, 43, 28, 0.06);
421
+ }
422
+
423
+ .workspace-grid table {
424
+ font-size: 0.92rem;
425
+ }
426
+
427
+ .footnote {
428
+ margin-top: 14px;
429
+ color: var(--ea-muted);
430
+ font-size: 0.85rem;
431
+ line-height: 1.6;
432
+ }
433
+
434
+ @media (max-width: 1120px) {
435
+ .hero-grid {
436
+ grid-template-columns: 1fr;
437
+ }
438
+ }
439
+
440
+ @media (max-width: 980px) {
441
+ .metric-grid {
442
+ grid-template-columns: repeat(2, minmax(0, 1fr));
443
+ }
444
+ }
445
+
446
+ @media (max-width: 640px) {
447
+ .hero {
448
+ padding: 24px 18px;
449
+ }
450
+
451
+ .metric-grid {
452
+ grid-template-columns: 1fr;
453
+ }
454
+
455
+ .app-shell {
456
+ padding: 12px 12px 20px;
457
+ }
458
+ }
459
+ """
460
+ SCENARIO_GUIDANCE = {
461
+ "easy_deadline_extraction": {
462
+ "title": "Deadline Extraction",
463
+ "description": "Read the professor email, capture the three exact milestones as todos, then archive the source email once the list is complete.",
464
+ "checks": [
465
+ "Read the source email before creating todos.",
466
+ "Create exactly three canonical todos with ISO dates.",
467
+ "Archive the email only after all deadlines are captured.",
468
+ ],
469
+ },
470
+ "medium_triage_and_negotiation": {
471
+ "title": "Inbox Triage And Negotiation",
472
+ "description": "Clear low-value newsletters, escalate the client complaint to the manager, and send a concrete meeting time to the teammate without archiving unresolved important mail too early.",
473
+ "checks": [
474
+ "Archive all three newsletters.",
475
+ "Forward the client complaint to manager@company.com.",
476
+ "Reply to the teammate with a specific meeting time.",
477
+ ],
478
+ },
479
+ "hard_rag_reply": {
480
+ "title": "RAG Reply",
481
+ "description": "Read the stakeholder request, search the local report store, and reply with the exact Q3 metrics from the matching file.",
482
+ "checks": [
483
+ "Read the VIP email first.",
484
+ "Search for the Q3 architecture report before replying.",
485
+ "Reply with 99.95%, 182ms, and 14% plus a greeting and signoff.",
486
+ ],
487
+ },
488
+ }
489
+
490
+
491
+ def _records_to_rows(records: list[dict], columns: list[str]) -> list[list[object]]:
492
+ return [[record.get(column) for column in columns] for record in records]
493
+
494
+
495
+ def render_scenario_brief(task_name: str) -> str:
496
+ guidance = SCENARIO_GUIDANCE[task_name]
497
+ checks = "".join(f"<li>{escape(item)}</li>" for item in guidance["checks"])
498
+ return (
499
+ '<div class="scenario-brief">'
500
+ f"<h3>{escape(guidance['title'])}</h3>"
501
+ f"<p>{escape(guidance['description'])}</p>"
502
+ f"<ul>{checks}</ul>"
503
+ "</div>"
504
+ )
505
+
506
+
507
+ def render_status_card(summary_payload: dict) -> str:
508
+ status = str(summary_payload["status"])
509
+ completed = bool(summary_payload["completed"])
510
+ badge_class = f"status-badge {status} {'success' if completed else 'failure'}".strip()
511
+ return (
512
+ '<div class="status-card">'
513
+ '<div class="status-topline">'
514
+ f'<div class="status-title">Run {escape(str(summary_payload["run_id"]))}</div>'
515
+ f'<div class="{badge_class}">{escape(status)}</div>'
516
+ "</div>"
517
+ '<div class="metric-grid">'
518
+ f'<div class="metric"><div class="metric-label">Requested Provider</div><div class="metric-value">{escape(str(summary_payload["requested_provider"]))}</div></div>'
519
+ f'<div class="metric"><div class="metric-label">Effective Policy</div><div class="metric-value">{escape(str(summary_payload["policy_name"]))}</div></div>'
520
+ f'<div class="metric"><div class="metric-label">Scenario</div><div class="metric-value">{escape(str(summary_payload["task_name"]))}</div></div>'
521
+ f'<div class="metric"><div class="metric-label">Final Score</div><div class="metric-value">{summary_payload["final_score"]:.2f}</div></div>'
522
+ "</div>"
523
+ '<div class="metric-grid">'
524
+ f'<div class="metric"><div class="metric-label">Model</div><div class="metric-value">{escape(str(summary_payload["model_name"] or "n/a"))}</div></div>'
525
+ f'<div class="metric"><div class="metric-label">Checkpoint</div><div class="metric-value">{escape(str(summary_payload["checkpoint_path"] or "n/a"))}</div></div>'
526
+ f'<div class="metric"><div class="metric-label">Completed</div><div class="metric-value">{escape(str(completed))}</div></div>'
527
+ f'<div class="metric"><div class="metric-label">Status</div><div class="metric-value">{escape(status)}</div></div>'
528
+ "</div>"
529
+ f'<div class="status-reason">{escape(str(summary_payload["termination_reason"]))}</div>'
530
+ "</div>"
531
+ )
532
+
533
+
534
+ def build_snapshot(task_name: str) -> tuple[str, list[list[object]], list[list[object]], list[list[object]], list[list[object]]]:
535
+ env = ExecutiveAssistantEnv(task_name=task_name)
536
+ observation = env.reset()
537
+ snapshot = env.workspace.snapshot()
538
+ return (
539
+ json.dumps(observation.model_dump(), indent=2),
540
+ _records_to_rows(snapshot["emails"], EMAIL_COLUMNS),
541
+ _records_to_rows(snapshot["todos"], TODO_COLUMNS),
542
+ _records_to_rows(snapshot["files"], FILE_COLUMNS),
543
+ _records_to_rows(snapshot["action_log"], ACTION_LOG_COLUMNS),
544
+ )
545
+
546
+
547
+ def _default_rl_checkpoint() -> str:
548
+ return str(
549
+ default_checkpoint_path(
550
+ APP_RUNTIME.checkpoint_dir,
551
+ APP_RUNTIME.default_checkpoint_name,
552
+ )
553
+ )
554
+
555
+
556
+ def _build_policy(
557
+ provider: str,
558
+ model_name: str,
559
+ api_key: str,
560
+ checkpoint_path: str,
561
+ ) -> object:
562
+ if provider == "baseline":
563
+ return BaselineAgent()
564
+ if provider == "rl":
565
+ return QLearningPolicy.load(checkpoint_path or _default_rl_checkpoint())
566
+ env_api_key = api_key or os.environ.get("OPENROUTER_API_KEY", "")
567
+ config = OpenRouterConfig(
568
+ api_key=env_api_key,
569
+ model_name=model_name,
570
+ site_url=os.environ.get("OPENROUTER_SITE_URL", "http://localhost:7860"),
571
+ app_name=os.environ.get(
572
+ "OPENROUTER_APP_NAME",
573
+ "Autonomous Executive Assistant Sandbox",
574
+ ),
575
+ )
576
+ return OpenRouterPolicy(config=config)
577
+
578
+
579
+ def _trace_to_rows(trace: object) -> list[dict]:
580
+ return [
581
+ {
582
+ "step": step.step_index,
583
+ "reasoning": step.reasoning,
584
+ "action_type": step.action["action_type"],
585
+ "status": step.status,
586
+ "score": step.reward["total_score"],
587
+ "done": step.reward["is_done"],
588
+ }
589
+ for step in trace.steps
590
+ ]
591
+
592
+
593
+ def _summary_payload(
594
+ *,
595
+ run_id: str,
596
+ task_name: str,
597
+ provider: str,
598
+ policy_name: str,
599
+ model_name: str,
600
+ checkpoint_path: str,
601
+ status: str,
602
+ final_score: float,
603
+ completed: bool,
604
+ termination_reason: str,
605
+ ) -> dict[str, object]:
606
+ return {
607
+ "run_id": run_id,
608
+ "task_name": task_name,
609
+ "requested_provider": provider,
610
+ "policy_name": policy_name,
611
+ "model_name": model_name if provider == "openrouter" else None,
612
+ "checkpoint_path": checkpoint_path if provider == "rl" else None,
613
+ "status": status,
614
+ "final_score": final_score,
615
+ "completed": completed,
616
+ "termination_reason": termination_reason,
617
+ }
618
+
619
+
620
+ def _step_payload(
621
+ observation_payload: dict,
622
+ snapshot_payload: dict,
623
+ trace_rows: list[dict],
624
+ summary_payload: dict,
625
+ ) -> tuple[str, str, list[list[object]], list[list[object]], list[list[object]], list[list[object]], list[list[object]], str]:
626
+ return (
627
+ json.dumps(observation_payload, indent=2),
628
+ render_status_card(summary_payload),
629
+ _records_to_rows(snapshot_payload["emails"], EMAIL_COLUMNS),
630
+ _records_to_rows(snapshot_payload["todos"], TODO_COLUMNS),
631
+ _records_to_rows(snapshot_payload["files"], FILE_COLUMNS),
632
+ _records_to_rows(snapshot_payload["action_log"], ACTION_LOG_COLUMNS),
633
+ _records_to_rows(trace_rows, TRACE_COLUMNS),
634
+ json.dumps(summary_payload, indent=2),
635
+ )
636
+
637
+
638
+ def configure_provider_inputs(provider: str) -> tuple[dict, dict, dict]:
639
+ is_openrouter = provider == "openrouter"
640
+ is_rl = provider == "rl"
641
+ return (
642
+ gr.update(visible=is_openrouter, interactive=is_openrouter),
643
+ gr.update(visible=is_openrouter, interactive=is_openrouter),
644
+ gr.update(visible=is_rl, interactive=is_rl),
645
+ )
646
+
647
+
648
+ def build_initial_status(task_name: str, provider: str, model_name: str, checkpoint_path: str) -> str:
649
+ return render_status_card(
650
+ _summary_payload(
651
+ run_id="pending",
652
+ task_name=task_name,
653
+ provider=provider,
654
+ policy_name="not started",
655
+ model_name=model_name,
656
+ checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
657
+ status="initialized",
658
+ final_score=0.0,
659
+ completed=False,
660
+ termination_reason="Choose a policy and start an episode.",
661
+ )
662
+ )
663
+
664
+
665
+ def run_live_episode(
666
+ task_name: str,
667
+ provider: str,
668
+ model_name: str,
669
+ api_key: str,
670
+ max_steps: int,
671
+ checkpoint_path: str,
672
+ ):
673
+ run_id = uuid.uuid4().hex[:8]
674
+ runner = EpisodeRunner(
675
+ policy=_build_policy(
676
+ provider=provider,
677
+ model_name=model_name,
678
+ api_key=api_key,
679
+ checkpoint_path=checkpoint_path,
680
+ ),
681
+ max_steps=max_steps,
682
+ )
683
+ env, observation = runner.initialize(task_name)
684
+ trace_rows: list[dict] = []
685
+
686
+ initial_snapshot = env.workspace.snapshot()
687
+ yield _step_payload(
688
+ observation_payload=observation.model_dump(),
689
+ snapshot_payload=initial_snapshot,
690
+ trace_rows=trace_rows,
691
+ summary_payload=_summary_payload(
692
+ run_id=run_id,
693
+ task_name=task_name,
694
+ provider=provider,
695
+ policy_name=type(runner.policy).__name__,
696
+ model_name=model_name,
697
+ checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
698
+ status="initialized",
699
+ final_score=0.0,
700
+ completed=False,
701
+ termination_reason="episode not started",
702
+ ),
703
+ )
704
+
705
+ while True:
706
+ _, observation, reward, record = runner.advance(task_name, env, observation)
707
+ trace_rows.append(
708
+ {
709
+ "step": record.step_index,
710
+ "reasoning": record.reasoning,
711
+ "action_type": record.action["action_type"],
712
+ "status": record.status,
713
+ "score": record.reward["total_score"],
714
+ "done": record.reward["is_done"],
715
+ }
716
+ )
717
+ yield _step_payload(
718
+ observation_payload=record.observation,
719
+ snapshot_payload=record.snapshot,
720
+ trace_rows=trace_rows,
721
+ summary_payload=_summary_payload(
722
+ run_id=run_id,
723
+ task_name=task_name,
724
+ provider=provider,
725
+ policy_name=type(runner.policy).__name__,
726
+ model_name=model_name,
727
+ checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
728
+ status="running" if not reward.is_done else "completed",
729
+ final_score=reward.total_score,
730
+ completed=reward.total_score >= 1.0,
731
+ termination_reason=reward.reasoning,
732
+ ),
733
+ )
734
+ if reward.is_done:
735
+ return
736
+ time.sleep(0.15)
737
+
738
+
739
+ with gr.Blocks(title="Autonomous Executive Assistant Sandbox") as demo:
740
+ with gr.Column(elem_classes=["app-shell"]):
741
+ gr.HTML(
742
+ """
743
+ <section class="hero">
744
+ <div class="hero-grid">
745
+ <div class="hero-copy">
746
+ <div class="hero-kicker">Deterministic Eval Console</div>
747
+ <h1>Executive Assistant Sandbox</h1>
748
+ <p>
749
+ Run the exact same episode loop used in training, inspect each workspace mutation in real time,
750
+ and compare baseline, RL, and OpenRouter-backed policies without losing the structure of the task.
751
+ </p>
752
+ <div class="hero-strip">
753
+ <div class="hero-pill">Shared EpisodeRunner path</div>
754
+ <div class="hero-pill">Seeded scenarios with visible state</div>
755
+ <div class="hero-pill">Policy debugging without notebook sprawl</div>
756
+ </div>
757
+ </div>
758
+ <aside class="hero-aside">
759
+ <p class="hero-aside-label">What This UI Optimizes For</p>
760
+ <p class="hero-aside-value">Fast policy comparison with readable state.</p>
761
+ <p class="hero-aside-copy">
762
+ The interface is intentionally light, structured, and editorial rather than “chat app” themed.
763
+ Controls stay compact while the workspace and trace remain the visual priority.
764
+ </p>
765
+ </aside>
766
+ </div>
767
+ </section>
768
+ """
769
+ )
770
+
771
+ with gr.Row(equal_height=True):
772
+ with gr.Column(scale=4):
773
+ with gr.Group(elem_classes=["panel-card"]):
774
+ gr.HTML(
775
+ """
776
+ <h2 class="panel-title">Control Room</h2>
777
+ <p class="panel-copy">
778
+ Pick a scenario, choose a policy provider, and run a stepwise episode against the same environment used by training and evaluation.
779
+ </p>
780
+ """
781
+ )
782
+ task = gr.Dropdown(
783
+ choices=[
784
+ "easy_deadline_extraction",
785
+ "medium_triage_and_negotiation",
786
+ "hard_rag_reply",
787
+ ],
788
+ value="easy_deadline_extraction",
789
+ label="Scenario",
790
+ )
791
+ provider = gr.Dropdown(
792
+ choices=["baseline", "openrouter", "rl"],
793
+ value="baseline",
794
+ label="Policy",
795
+ )
796
+ max_steps = gr.Number(value=12, precision=0, label="Max Steps")
797
+ with gr.Accordion("Provider Settings", open=False):
798
+ model_name = gr.Textbox(
799
+ value="google/gemma-4-31b-it",
800
+ label="OpenRouter Model",
801
+ )
802
+ checkpoint_path = gr.Textbox(
803
+ value=_default_rl_checkpoint(),
804
+ label="RL Checkpoint Path",
805
+ )
806
+ api_key = gr.Textbox(
807
+ type="password",
808
+ label="OPENROUTER_API_KEY",
809
+ )
810
+ with gr.Row():
811
+ reset = gr.Button("Reset Scenario", variant="secondary")
812
+ run_episode_btn = gr.Button("Run Episode", variant="primary")
813
+ gr.HTML(
814
+ """
815
+ <p class="footnote">
816
+ OpenRouter inputs appear only when needed. RL checkpoint selection stays available for policy replay without changing the execution path.
817
+ </p>
818
+ """
819
+ )
820
+ with gr.Column(scale=5):
821
+ scenario_brief = gr.HTML(render_scenario_brief("easy_deadline_extraction"))
822
+ status_card = gr.HTML(
823
+ build_initial_status(
824
+ "easy_deadline_extraction",
825
+ "baseline",
826
+ "google/gemma-4-31b-it",
827
+ _default_rl_checkpoint(),
828
+ )
829
+ )
830
+
831
+ with gr.Group(elem_classes=["surface-card", "workspace-grid"]):
832
+ with gr.Tabs():
833
+ with gr.Tab("Live Workspace"):
834
+ with gr.Row():
835
+ observation = gr.Code(label="Observation", language="json")
836
+ summary = gr.Code(label="Run Summary", language="json")
837
+ with gr.Row():
838
+ emails = gr.Dataframe(headers=EMAIL_COLUMNS, label="Unread Emails")
839
+ todos = gr.Dataframe(headers=TODO_COLUMNS, label="Todos")
840
+ with gr.Row():
841
+ files = gr.Dataframe(headers=FILE_COLUMNS, label="Search Results")
842
+ action_log = gr.Dataframe(headers=ACTION_LOG_COLUMNS, label="Action Log")
843
+ with gr.Tab("Episode Trace"):
844
+ trace_table = gr.Dataframe(headers=TRACE_COLUMNS, label="Episode Trace")
845
+
846
+ reset.click(
847
+ fn=build_snapshot,
848
+ inputs=[task],
849
+ outputs=[observation, emails, todos, files, action_log],
850
+ )
851
+ reset.click(
852
+ fn=render_scenario_brief,
853
+ inputs=[task],
854
+ outputs=[scenario_brief],
855
+ )
856
+ reset.click(
857
+ fn=build_initial_status,
858
+ inputs=[task, provider, model_name, checkpoint_path],
859
+ outputs=[status_card],
860
+ )
861
+ provider.change(
862
+ fn=configure_provider_inputs,
863
+ inputs=[provider],
864
+ outputs=[model_name, api_key, checkpoint_path],
865
+ )
866
+ provider.change(
867
+ fn=build_initial_status,
868
+ inputs=[task, provider, model_name, checkpoint_path],
869
+ outputs=[status_card],
870
+ )
871
+ task.change(
872
+ fn=render_scenario_brief,
873
+ inputs=[task],
874
+ outputs=[scenario_brief],
875
+ )
876
+ task.change(
877
+ fn=build_initial_status,
878
+ inputs=[task, provider, model_name, checkpoint_path],
879
+ outputs=[status_card],
880
+ )
881
+ run_episode_btn.click(
882
+ fn=run_live_episode,
883
+ inputs=[task, provider, model_name, api_key, max_steps, checkpoint_path],
884
+ outputs=[observation, status_card, emails, todos, files, action_log, trace_table, summary],
885
+ )
886
+
887
+ demo.load(
888
+ fn=build_snapshot,
889
+ inputs=[task],
890
+ outputs=[observation, emails, todos, files, action_log],
891
+ )
892
+ demo.load(
893
+ fn=configure_provider_inputs,
894
+ inputs=[provider],
895
+ outputs=[model_name, api_key, checkpoint_path],
896
+ )
897
+ demo.load(
898
+ fn=render_scenario_brief,
899
+ inputs=[task],
900
+ outputs=[scenario_brief],
901
+ )
902
+ demo.load(
903
+ fn=build_initial_status,
904
+ inputs=[task, provider, model_name, checkpoint_path],
905
+ outputs=[status_card],
906
+ )
907
+
908
+
909
+ if __name__ == "__main__":
910
+ demo.launch(
911
+ server_name=APP_RUNTIME.host,
912
+ server_port=APP_RUNTIME.port,
913
+ show_error=True,
914
+ css=APP_CSS,
915
+ )
docs/HF_SPACE_README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Project Epsilon | Executive Assistant Sandbox
3
+ emoji: "🧭"
4
+ colorFrom: yellow
5
+ colorTo: gray
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ short_description: OpenEnv executive assistant sandbox demo for judges.
10
+ ---
11
+
12
+ # Project Epsilon
13
+
14
+ Discrete Hugging Face Space README for the **Autonomous Executive Assistant Sandbox**, prepared for the **OpenEnv Scaler x Meta x PyTorch Hack**.
15
+
16
+ ## Team
17
+
18
+ - Team name: `Project Epsilon`
19
+ - Hugging Face usernames: `@HF_USERNAME_1`, `@HF_USERNAME_2`, `@HF_USERNAME_3`
20
+ - Space repo: `HF_USERNAME_PLACEHOLDER/project-epsilon-executive-assistant`
21
+
22
+ Replace the placeholder usernames and repo owner when the final team accounts are ready.
23
+
24
+ ## What This Space Shows
25
+
26
+ - Deterministic OpenEnv-style tasks over a SQLite-backed executive assistant workspace
27
+ - A Gradio judge console that replays the shared `EpisodeRunner` loop step by step
28
+ - Policy switching across `baseline`, bundled `rl`, and optional `openrouter`
29
+ - Visible inbox, todo, file-search, and action-log state transitions
30
+
31
+ ## Hack Context
32
+
33
+ OpenEnv was introduced by Hugging Face and Meta as an open source framework for typed agent environments. The Scaler hack dashboard lists the build window as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space is tuned for that style of evaluation: deterministic tasks, structured actions, reproducible runs, and a judge-friendly visual trace.
34
+
35
+ ## Runtime Notes
36
+
37
+ - SDK: `docker`
38
+ - App port: `7860`
39
+ - Entry point: `python app.py`
40
+ - Optional secret: `OPENROUTER_API_KEY`
41
+ - Bundled RL checkpoint path: `artifacts/checkpoints/q_policy_notebook.json`
42
+
43
+ ## Judge Flow
44
+
45
+ 1. Open the Space and choose one of the seeded scenarios.
46
+ 2. Run `baseline` first for the reference trace.
47
+ 3. Switch to `rl` to replay the trained checkpoint bundled with the Space.
48
+ 4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed policy.
49
+
50
+ ## References
51
+
52
+ - Hack dashboard: https://www.scaler.com/openenv-hackathon
53
+ - OpenEnv launch: https://huggingface.co/blog/openenv
openenv.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ name: autonomous-executive-assistant-sandbox
2
+ description: Deterministic executive assistant environment backed by an in-memory SQLite workspace.
3
+ entrypoint: src.executive_assistant.env:ExecutiveAssistantEnv
4
+ observation_model: src.executive_assistant.models:WorkspaceObservation
5
+ action_model: src.executive_assistant.models:AssistantAction
6
+ reward_model: src.executive_assistant.models:TaskReward
7
+ tasks:
8
+ - easy_deadline_extraction
9
+ - medium_triage_and_negotiation
10
+ - hard_rag_reply
pytest.ini ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [pytest]
2
+ pythonpath = .
requirements.app.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ -r requirements.txt
requirements.training.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ -r requirements.txt
2
+ huggingface_hub>=0.31.0
3
+ jupyterlab>=4.2.0
4
+ ipykernel>=6.29.0
5
+ pandas>=2.2.0
6
+ matplotlib>=3.9.0
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio>=5.0.0
2
+ openai>=1.76.0
3
+ pydantic>=2.8.0
4
+ pytest>=8.0.0
5
+ PyYAML>=6.0.0
run.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.executive_assistant.env import ExecutiveAssistantEnv
2
+ from src.executive_assistant.agent import BaselineAgent
3
+
4
+ # Create env
5
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
6
+
7
+ # Create agent
8
+ agent = BaselineAgent()
9
+
10
+ # Reset env
11
+ obs = env.reset()
12
+
13
+ print("STARTING...\n")
14
+
15
+ # Run loop
16
+ for step in range(10):
17
+ decision = agent.choose_action(env.task_name, obs)
18
+
19
+ print(f"\nSTEP {step+1}")
20
+ print("Reasoning:", decision.reasoning)
21
+ print("Action:", decision.action)
22
+
23
+ obs, reward = env.step(decision.action)
24
+
25
+ print("Reward:", reward)
26
+
27
+ if reward.is_done:
28
+ print("\nTASK COMPLETE ✅")
29
+ break
scripts/deploy_hf_space.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import os
5
+ import shutil
6
+ import sys
7
+ import tempfile
8
+ from pathlib import Path
9
+
10
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
11
+ if str(PROJECT_ROOT) not in sys.path:
12
+ sys.path.insert(0, str(PROJECT_ROOT))
13
+
14
+ from src.executive_assistant.deployment import (
15
+ DEFAULT_SPACE_TITLE,
16
+ HFSpaceDeployConfig,
17
+ parse_hf_usernames,
18
+ stage_space_bundle,
19
+ )
20
+
21
+
22
+ def build_parser() -> argparse.ArgumentParser:
23
+ parser = argparse.ArgumentParser(
24
+ description="Create or update a Hugging Face Space from this repository in one command."
25
+ )
26
+ parser.add_argument(
27
+ "--repo-id",
28
+ default=os.environ.get("HF_SPACE_REPO", "").strip(),
29
+ help="Target Space repo in owner/name form. Defaults to HF_SPACE_REPO.",
30
+ )
31
+ parser.add_argument(
32
+ "--token",
33
+ default=os.environ.get("HF_TOKEN", "").strip(),
34
+ help="Hugging Face token. Defaults to HF_TOKEN.",
35
+ )
36
+ parser.add_argument(
37
+ "--title",
38
+ default=os.environ.get("HF_SPACE_TITLE", DEFAULT_SPACE_TITLE),
39
+ help="Space title used in the generated HF README.",
40
+ )
41
+ parser.add_argument(
42
+ "--team-name",
43
+ default=os.environ.get("HF_SPACE_TEAM_NAME", "Project Epsilon"),
44
+ help="Team name shown in the generated HF README.",
45
+ )
46
+ parser.add_argument(
47
+ "--hf-usernames",
48
+ default=os.environ.get(
49
+ "HF_SPACE_TEAM_USERNAMES",
50
+ "HF_USERNAME_1,HF_USERNAME_2,HF_USERNAME_3",
51
+ ),
52
+ help="Comma-separated HF usernames for the HF README placeholders.",
53
+ )
54
+ parser.add_argument(
55
+ "--checkpoint-name",
56
+ default=os.environ.get("HF_SPACE_CHECKPOINT_NAME", "q_policy_notebook.json"),
57
+ help="Checkpoint filename staged into artifacts/checkpoints/ for RL replay.",
58
+ )
59
+ parser.add_argument(
60
+ "--openrouter-api-key",
61
+ default=os.environ.get("OPENROUTER_API_KEY", "").strip(),
62
+ help="Optional secret to set on the Space during deployment.",
63
+ )
64
+ parser.add_argument(
65
+ "--private",
66
+ action="store_true",
67
+ default=os.environ.get("HF_SPACE_PRIVATE", "").strip().lower() == "true",
68
+ help="Create or keep the Space private.",
69
+ )
70
+ parser.add_argument(
71
+ "--skip-checkpoint",
72
+ action="store_true",
73
+ help="Skip bundling the RL checkpoint.",
74
+ )
75
+ parser.add_argument(
76
+ "--keep-stage-dir",
77
+ default="",
78
+ help="Optional local folder where the prepared Space bundle should be copied after upload.",
79
+ )
80
+ return parser
81
+
82
+
83
+ def require_huggingface_hub():
84
+ try:
85
+ from huggingface_hub import HfApi # type: ignore
86
+ except ImportError as exc:
87
+ raise SystemExit(
88
+ "huggingface_hub is required for deployment. Install the training environment "
89
+ "or run `python -m pip install huggingface_hub` first."
90
+ ) from exc
91
+ return HfApi
92
+
93
+
94
+ def maybe_set_space_secret(api, repo_id: str, key: str, value: str) -> str:
95
+ if not value.strip():
96
+ return f"Skipped secret {key} because no value was provided."
97
+ add_secret = getattr(api, "add_space_secret", None)
98
+ if add_secret is None:
99
+ return f"Upload succeeded, but this huggingface_hub version cannot set {key} automatically."
100
+ add_secret(repo_id=repo_id, key=key, value=value)
101
+ return f"Set Space secret {key}."
102
+
103
+
104
+ def maybe_set_space_variable(api, repo_id: str, key: str, value: str) -> str:
105
+ add_variable = getattr(api, "add_space_variable", None)
106
+ if add_variable is None:
107
+ return f"Upload succeeded, but this huggingface_hub version cannot set variable {key} automatically."
108
+ add_variable(repo_id=repo_id, key=key, value=value)
109
+ return f"Set Space variable {key}={value}."
110
+
111
+
112
+ def main() -> int:
113
+ parser = build_parser()
114
+ args = parser.parse_args()
115
+
116
+ if not args.repo_id:
117
+ parser.error("A Space repo id is required. Pass --repo-id or set HF_SPACE_REPO.")
118
+ if "/" not in args.repo_id:
119
+ parser.error("Space repo id must be in owner/name form.")
120
+ if not args.token:
121
+ parser.error("A Hugging Face token is required. Pass --token or set HF_TOKEN.")
122
+
123
+ config = HFSpaceDeployConfig(
124
+ repo_id=args.repo_id,
125
+ title=args.title,
126
+ team_name=args.team_name,
127
+ hf_usernames=parse_hf_usernames(args.hf_usernames),
128
+ checkpoint_name=args.checkpoint_name,
129
+ private=args.private,
130
+ include_checkpoint=not args.skip_checkpoint,
131
+ )
132
+
133
+ HfApi = require_huggingface_hub()
134
+ api = HfApi(token=args.token)
135
+
136
+ with tempfile.TemporaryDirectory(prefix="hf-space-stage-") as tmp_dir:
137
+ stage_dir = Path(tmp_dir)
138
+ checkpoint_path = stage_space_bundle(config, stage_dir)
139
+
140
+ api.create_repo(
141
+ repo_id=config.repo_id,
142
+ repo_type="space",
143
+ space_sdk="docker",
144
+ private=config.private,
145
+ exist_ok=True,
146
+ )
147
+ api.upload_folder(
148
+ folder_path=str(stage_dir),
149
+ repo_id=config.repo_id,
150
+ repo_type="space",
151
+ commit_message="Deploy Project Epsilon Space bundle",
152
+ delete_patterns=["*", "**/*"],
153
+ )
154
+
155
+ messages = [
156
+ f"Uploaded Space bundle to {config.space_url}",
157
+ f"App URL: {config.app_url}",
158
+ ]
159
+ if checkpoint_path is not None:
160
+ messages.append(f"Bundled RL checkpoint: {checkpoint_path.relative_to(stage_dir)}")
161
+ messages.append(maybe_set_space_secret(api, config.repo_id, "OPENROUTER_API_KEY", args.openrouter_api_key))
162
+ messages.append(maybe_set_space_variable(api, config.repo_id, "OPENROUTER_APP_NAME", config.title))
163
+ messages.append(maybe_set_space_variable(api, config.repo_id, "OPENROUTER_SITE_URL", config.app_url))
164
+
165
+ if args.keep_stage_dir:
166
+ target_dir = Path(args.keep_stage_dir).resolve()
167
+ if target_dir.exists():
168
+ shutil.rmtree(target_dir)
169
+ shutil.copytree(stage_dir, target_dir)
170
+ messages.append(f"Saved staged bundle to {target_dir}")
171
+
172
+ for message in messages:
173
+ print(message)
174
+ return 0
175
+
176
+
177
+ if __name__ == "__main__":
178
+ sys.exit(main())
scripts/deploy_hf_space.sh ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ if [[ -f ".env.hf.space" ]]; then
5
+ while IFS= read -r raw_line || [[ -n "${raw_line}" ]]; do
6
+ line="${raw_line#"${raw_line%%[![:space:]]*}"}"
7
+ line="${line%"${line##*[![:space:]]}"}"
8
+ if [[ -z "${line}" || "${line}" == \#* || "${line}" != *=* ]]; then
9
+ continue
10
+ fi
11
+ key="${line%%=*}"
12
+ value="${line#*=}"
13
+ key="${key%"${key##*[![:space:]]}"}"
14
+ value="${value#"${value%%[![:space:]]*}"}"
15
+ value="${value%"${value##*[![:space:]]}"}"
16
+ export "${key}=${value}"
17
+ done < .env.hf.space
18
+ fi
19
+
20
+ PYTHON_BIN="${PYTHON_BIN:-.venv-training/bin/python}"
21
+ if [[ ! -x "${PYTHON_BIN}" ]]; then
22
+ PYTHON_BIN="python"
23
+ fi
24
+
25
+ exec "${PYTHON_BIN}" scripts/deploy_hf_space.py "$@"
scripts/evaluate_policies.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ import sys
6
+ from pathlib import Path
7
+
8
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
9
+ if str(PROJECT_ROOT) not in sys.path:
10
+ sys.path.insert(0, str(PROJECT_ROOT))
11
+
12
+ from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
13
+ from src.executive_assistant.config import OpenRouterConfig, TrainingRuntimeConfig, load_env_file
14
+ from src.executive_assistant.runner import export_traces_jsonl, run_policy_suite
15
+
16
+
17
+ TASKS = [
18
+ "easy_deadline_extraction",
19
+ "medium_triage_and_negotiation",
20
+ "hard_rag_reply",
21
+ ]
22
+
23
+
24
+ def build_policy(provider: str, model_name: str) -> object:
25
+ if provider == "baseline":
26
+ return BaselineAgent()
27
+ if provider == "openrouter":
28
+ load_env_file(TrainingRuntimeConfig().env_file)
29
+ config = OpenRouterConfig.from_env()
30
+ config = OpenRouterConfig(
31
+ api_key=config.api_key,
32
+ model_name=model_name,
33
+ base_url=config.base_url,
34
+ site_url=config.site_url,
35
+ app_name=config.app_name,
36
+ temperature=config.temperature,
37
+ max_tokens=config.max_tokens,
38
+ )
39
+ return OpenRouterPolicy(config=config)
40
+ raise ValueError(f"Unsupported provider: {provider}")
41
+
42
+
43
+ def main() -> None:
44
+ load_env_file(TrainingRuntimeConfig().env_file)
45
+ parser = argparse.ArgumentParser(description="Evaluate a policy over all seeded tasks.")
46
+ parser.add_argument("--provider", choices=["baseline", "openrouter"], default="baseline")
47
+ parser.add_argument("--model", default="google/gemma-4-31b-it")
48
+ parser.add_argument("--max-steps", type=int, default=12)
49
+ parser.add_argument("--output", default="")
50
+ args = parser.parse_args()
51
+
52
+ traces = run_policy_suite(
53
+ policy=build_policy(args.provider, args.model),
54
+ task_names=TASKS,
55
+ max_steps=args.max_steps,
56
+ )
57
+ summary = {
58
+ task_name: {
59
+ "completed": trace.completed,
60
+ "final_score": trace.final_score,
61
+ "steps": len(trace.steps),
62
+ "termination_reason": trace.termination_reason,
63
+ }
64
+ for task_name, trace in traces.items()
65
+ }
66
+ print(json.dumps(summary, indent=2))
67
+
68
+ if args.output:
69
+ export_traces_jsonl(list(traces.values()), args.output)
70
+ print(f"Saved traces to {args.output}")
71
+
72
+
73
+ if __name__ == "__main__":
74
+ main()
scripts/run_policy_episode.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ import sys
6
+ from pathlib import Path
7
+
8
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
9
+ if str(PROJECT_ROOT) not in sys.path:
10
+ sys.path.insert(0, str(PROJECT_ROOT))
11
+
12
+ from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
13
+ from src.executive_assistant.config import OpenRouterConfig, TrainingRuntimeConfig, load_env_file
14
+ from src.executive_assistant.runner import EpisodeRunner
15
+
16
+
17
+ def build_policy(provider: str, model_name: str) -> object:
18
+ if provider == "baseline":
19
+ return BaselineAgent()
20
+ if provider == "openrouter":
21
+ load_env_file(TrainingRuntimeConfig().env_file)
22
+ config = OpenRouterConfig.from_env()
23
+ config = OpenRouterConfig(
24
+ api_key=config.api_key,
25
+ model_name=model_name,
26
+ base_url=config.base_url,
27
+ site_url=config.site_url,
28
+ app_name=config.app_name,
29
+ temperature=config.temperature,
30
+ max_tokens=config.max_tokens,
31
+ )
32
+ return OpenRouterPolicy(config=config)
33
+ raise ValueError(f"Unsupported provider: {provider}")
34
+
35
+
36
+ def main() -> None:
37
+ load_env_file(TrainingRuntimeConfig().env_file)
38
+ parser = argparse.ArgumentParser(description="Run a single policy episode.")
39
+ parser.add_argument("--task", required=True)
40
+ parser.add_argument("--provider", choices=["baseline", "openrouter"], default="baseline")
41
+ parser.add_argument("--model", default="google/gemma-4-31b-it")
42
+ parser.add_argument("--max-steps", type=int, default=12)
43
+ args = parser.parse_args()
44
+
45
+ runner = EpisodeRunner(policy=build_policy(args.provider, args.model), max_steps=args.max_steps)
46
+ trace = runner.run(args.task)
47
+ print(json.dumps(trace.to_dict(), indent=2))
48
+
49
+
50
+ if __name__ == "__main__":
51
+ main()
scripts/setup_app_env.sh ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ python -m venv .venv-app
5
+ source .venv-app/bin/activate
6
+ python -m pip install --upgrade pip
7
+ python -m pip install -r requirements.app.txt
8
+ echo "App environment ready at .venv-app"
scripts/setup_training_env.sh ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ python -m venv .venv-training
5
+ source .venv-training/bin/activate
6
+ python -m pip install --upgrade pip
7
+ python -m pip install -r requirements.training.txt
8
+ python -m ipykernel install --user --name scalerhack2-training --display-name "Python (scalerhack2-training)"
9
+ echo "Training environment ready at .venv-training with Jupyter kernel scalerhack2-training"
scripts/train_rl_agent.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ import sys
6
+ from pathlib import Path
7
+
8
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
9
+ if str(PROJECT_ROOT) not in sys.path:
10
+ sys.path.insert(0, str(PROJECT_ROOT))
11
+
12
+ from src.executive_assistant.agent import BaselineAgent
13
+ from src.executive_assistant.config import TrainingRuntimeConfig, load_env_file
14
+ from src.executive_assistant.training import evaluate_q_policy, train_q_learning
15
+
16
+
17
+ def main() -> None:
18
+ load_env_file(TrainingRuntimeConfig().env_file)
19
+ parser = argparse.ArgumentParser(description="Train a tabular RL policy for seeded tasks.")
20
+ parser.add_argument("--episodes", type=int, default=300)
21
+ parser.add_argument("--epsilon", type=float, default=0.15)
22
+ parser.add_argument("--checkpoint", default="artifacts/checkpoints/q_policy.json")
23
+ parser.add_argument("--no-teacher", action="store_true")
24
+ args = parser.parse_args()
25
+
26
+ teacher = None if args.no_teacher else BaselineAgent()
27
+ policy, training_scores = train_q_learning(
28
+ episodes=args.episodes,
29
+ epsilon=args.epsilon,
30
+ teacher=teacher,
31
+ )
32
+ checkpoint_path = policy.save(args.checkpoint)
33
+ evaluation = evaluate_q_policy(policy)
34
+ print(
35
+ json.dumps(
36
+ {
37
+ "checkpoint": str(checkpoint_path),
38
+ "training_scores": training_scores,
39
+ "evaluation": evaluation,
40
+ },
41
+ indent=2,
42
+ )
43
+ )
44
+
45
+
46
+ if __name__ == "__main__":
47
+ main()
src/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ """Top-level package namespace for local src-based imports."""
2
+
src/executive_assistant/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ """Core package for the autonomous executive assistant sandbox."""
2
+
src/executive_assistant/agent.py ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import re
4
+
5
+ from src.executive_assistant.config import OpenRouterConfig
6
+ from src.executive_assistant.llm_service import OpenRouterLLMService
7
+ from src.executive_assistant.models import AssistantAction, PolicyDecision, WorkspaceObservation
8
+ from src.executive_assistant.runner import EpisodeRunner, EpisodeTrace, run_policy_suite
9
+
10
+
11
+ class ActionCatalog:
12
+ """Finite action templates for smoke-testing and future policy indexing."""
13
+
14
+ @staticmethod
15
+ def enumerate_actions(observation: WorkspaceObservation) -> list[AssistantAction]:
16
+ actions: list[AssistantAction] = []
17
+ for email in observation.unread_emails:
18
+ actions.append(AssistantAction(action_type="read_email", target_id=email.id))
19
+ actions.append(AssistantAction(action_type="archive", target_id=email.id))
20
+ actions.append(
21
+ AssistantAction(
22
+ action_type="forward",
23
+ target_id=email.id,
24
+ secondary_payload="manager@company.com",
25
+ payload="Escalating this for review.",
26
+ )
27
+ )
28
+ if observation.current_email is not None:
29
+ actions.append(
30
+ AssistantAction(
31
+ action_type="reply",
32
+ target_id=observation.current_email.id,
33
+ payload="Hello, I will follow up shortly.\nRegards, Executive Assistant",
34
+ )
35
+ )
36
+ actions.extend(
37
+ [
38
+ AssistantAction(action_type="search_files", payload="Q3 Architecture"),
39
+ AssistantAction(action_type="search_files", payload="architecture metrics"),
40
+ ]
41
+ )
42
+ return actions
43
+
44
+
45
+ class BaselineAgent:
46
+ """Deterministic baseline policy for seeded scenarios and training-pipeline smoke tests."""
47
+
48
+ def __init__(self, model_name: str = "deterministic-baseline-v1") -> None:
49
+ self.model_name = model_name
50
+
51
+ def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
52
+ if task_name == "easy_deadline_extraction":
53
+ return self._choose_easy_action(observation)
54
+ if task_name == "medium_triage_and_negotiation":
55
+ return self._choose_medium_action(observation)
56
+ if task_name == "hard_rag_reply":
57
+ return self._choose_hard_action(observation)
58
+ raise ValueError(f"Unsupported task: {task_name}")
59
+
60
+ def _choose_easy_action(self, observation: WorkspaceObservation) -> PolicyDecision:
61
+ if observation.current_email is None:
62
+ email = observation.unread_emails[0]
63
+ return PolicyDecision(
64
+ reasoning="Read the seeded deadline email before extracting any tasks.",
65
+ action=AssistantAction(action_type="read_email", target_id=email.id),
66
+ )
67
+
68
+ deadlines = self._extract_deadlines(observation.current_email.body)
69
+ existing = {todo.strip().lower() for todo in observation.active_todos}
70
+ for task_name, deadline_date in deadlines:
71
+ if task_name.lower() not in existing:
72
+ return PolicyDecision(
73
+ reasoning=f"Add the missing todo '{task_name}' with deadline {deadline_date}.",
74
+ action=AssistantAction(
75
+ action_type="add_todo",
76
+ payload=task_name,
77
+ secondary_payload=deadline_date,
78
+ ),
79
+ )
80
+ return PolicyDecision(
81
+ reasoning="All deadlines are captured, so archive the source email.",
82
+ action=AssistantAction(action_type="archive", target_id=observation.current_email.id),
83
+ )
84
+
85
+ def _choose_medium_action(self, observation: WorkspaceObservation) -> PolicyDecision:
86
+ newsletters = {
87
+ "news@updates.example",
88
+ "promotions@vendor.example",
89
+ "events@community.example",
90
+ }
91
+ action_history = " ".join(observation.action_history).lower()
92
+ for email in observation.unread_emails:
93
+ if email.sender in newsletters:
94
+ return PolicyDecision(
95
+ reasoning=f"Archive non-actionable newsletter from {email.sender}.",
96
+ action=AssistantAction(action_type="archive", target_id=email.id),
97
+ )
98
+
99
+ client_email = next(
100
+ (email for email in observation.unread_emails if email.sender == "client@company.com"),
101
+ None,
102
+ )
103
+ if client_email is not None and "forward: forwarded to manager@company.com" not in action_history:
104
+ return PolicyDecision(
105
+ reasoning="Escalate the urgent client complaint to the manager.",
106
+ action=AssistantAction(
107
+ action_type="forward",
108
+ target_id=client_email.id,
109
+ secondary_payload="manager@company.com",
110
+ payload="Urgent client complaint. Please take over immediately.",
111
+ ),
112
+ )
113
+
114
+ teammate_email = next(
115
+ (email for email in observation.unread_emails if email.sender == "teammate@company.com"),
116
+ None,
117
+ )
118
+ if teammate_email is not None and "reply: reply drafted" not in action_history:
119
+ return PolicyDecision(
120
+ reasoning="Reply to the reschedule request with a concrete proposed time.",
121
+ action=AssistantAction(
122
+ action_type="reply",
123
+ target_id=teammate_email.id,
124
+ payload="Hello, 3:30 PM IST works for me. Regards, Executive Assistant",
125
+ ),
126
+ )
127
+
128
+ if observation.current_email is not None:
129
+ return PolicyDecision(
130
+ reasoning="Archive the currently open message to reduce inbox clutter.",
131
+ action=AssistantAction(action_type="archive", target_id=observation.current_email.id),
132
+ )
133
+ raise RuntimeError("No valid medium-task action available")
134
+
135
+ def _choose_hard_action(self, observation: WorkspaceObservation) -> PolicyDecision:
136
+ if observation.current_email is None:
137
+ email = observation.unread_emails[0]
138
+ return PolicyDecision(
139
+ reasoning="Read the stakeholder email to ground the response request.",
140
+ action=AssistantAction(action_type="read_email", target_id=email.id),
141
+ )
142
+
143
+ if not observation.search_results:
144
+ return PolicyDecision(
145
+ reasoning="Search the local report store for the Q3 architecture document.",
146
+ action=AssistantAction(action_type="search_files", payload="Q3 Architecture"),
147
+ )
148
+
149
+ metrics = self._extract_report_metrics(observation.search_results[0].snippet)
150
+ payload = (
151
+ "Hello,\n"
152
+ f"Here are the requested Q3 architecture metrics: availability {metrics['availability']}, "
153
+ f"mean API latency {metrics['latency']}, and infrastructure cost reduction {metrics['cost_reduction']}.\n"
154
+ "Regards,\nExecutive Assistant"
155
+ )
156
+ return PolicyDecision(
157
+ reasoning="Reply with the three requested metrics pulled from the report search results.",
158
+ action=AssistantAction(
159
+ action_type="reply",
160
+ target_id=observation.current_email.id,
161
+ payload=payload,
162
+ ),
163
+ )
164
+
165
+ @staticmethod
166
+ def _extract_deadlines(email_body: str) -> list[tuple[str, str]]:
167
+ pattern = re.compile(r"([a-z ]+ due)\s+(\d{4}-\d{2}-\d{2})", re.IGNORECASE)
168
+ cleaned: list[tuple[str, str]] = []
169
+ for task, date in pattern.findall(email_body):
170
+ normalized_task = re.sub(r"^(and\s+)", "", task.strip(), flags=re.IGNORECASE)
171
+ cleaned.append((normalized_task.title(), date))
172
+ return cleaned
173
+
174
+ @staticmethod
175
+ def _extract_report_metrics(snippet: str) -> dict[str, str]:
176
+ metrics = {
177
+ "availability": re.search(r"(\d+\.\d+%)", snippet),
178
+ "latency": re.search(r"(\d+ms)", snippet),
179
+ "cost_reduction": re.search(r"(\d+%)", snippet.split("Infrastructure cost reduction:")[-1]),
180
+ }
181
+ return {
182
+ "availability": metrics["availability"].group(1) if metrics["availability"] else "unknown",
183
+ "latency": metrics["latency"].group(1) if metrics["latency"] else "unknown",
184
+ "cost_reduction": (
185
+ metrics["cost_reduction"].group(1) if metrics["cost_reduction"] else "unknown"
186
+ ),
187
+ }
188
+
189
+
190
+ class OpenRouterPolicy:
191
+ def __init__(
192
+ self,
193
+ config: OpenRouterConfig | None = None,
194
+ service: OpenRouterLLMService | None = None,
195
+ ) -> None:
196
+ self.config = config or OpenRouterConfig.from_env()
197
+ self.service = service or OpenRouterLLMService(self.config)
198
+
199
+ def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
200
+ decision = self.service.generate_policy_decision(task_name, observation)
201
+ return self._sanitize_decision(task_name, observation, decision)
202
+
203
+ def _sanitize_decision(
204
+ self,
205
+ task_name: str,
206
+ observation: WorkspaceObservation,
207
+ decision: PolicyDecision,
208
+ ) -> PolicyDecision:
209
+ action = decision.action
210
+ if action.action_type == "add_todo":
211
+ action = self._normalize_easy_todo_action(task_name, observation, action)
212
+ elif action.action_type == "search_files":
213
+ action = AssistantAction(
214
+ action_type=action.action_type,
215
+ target_id=None,
216
+ payload=action.payload,
217
+ secondary_payload=None,
218
+ )
219
+ elif action.action_type == "add_todo":
220
+ action = AssistantAction(
221
+ action_type=action.action_type,
222
+ target_id=None,
223
+ payload=action.payload,
224
+ secondary_payload=action.secondary_payload,
225
+ )
226
+ elif action.action_type in {"read_email", "archive"}:
227
+ action = AssistantAction(
228
+ action_type=action.action_type,
229
+ target_id=action.target_id,
230
+ payload=None,
231
+ secondary_payload=None,
232
+ )
233
+ elif action.action_type == "forward":
234
+ action = self._normalize_forward_action(task_name, observation, action)
235
+ if action.action_type == "reply" and action.payload:
236
+ payload = action.payload.strip()
237
+ target_id = action.target_id
238
+ if task_name == "hard_rag_reply":
239
+ if not payload.lower().startswith("hello"):
240
+ payload = f"Hello,\n{payload}"
241
+ if "regards" not in payload.lower():
242
+ payload = f"{payload}\nRegards,\nExecutive Assistant"
243
+ elif task_name == "medium_triage_and_negotiation":
244
+ if not re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", payload):
245
+ payload = "Hello, 3:30 PM IST works for me."
246
+ if "regards" not in payload.lower():
247
+ payload = f"{payload}\nRegards,\nExecutive Assistant"
248
+ target_id = self._resolve_teammate_email_id(observation, action.target_id)
249
+ action = AssistantAction(
250
+ action_type=action.action_type,
251
+ target_id=target_id,
252
+ payload=payload,
253
+ secondary_payload=action.secondary_payload,
254
+ )
255
+
256
+ return PolicyDecision(reasoning=decision.reasoning, action=action)
257
+
258
+ def _normalize_easy_todo_action(
259
+ self,
260
+ task_name: str,
261
+ observation: WorkspaceObservation,
262
+ action: AssistantAction,
263
+ ) -> AssistantAction:
264
+ if task_name != "easy_deadline_extraction":
265
+ return AssistantAction(
266
+ action_type=action.action_type,
267
+ target_id=None,
268
+ payload=action.payload,
269
+ secondary_payload=action.secondary_payload,
270
+ )
271
+
272
+ canonical_todos = [
273
+ ("proposal", "Proposal Due", "2026-04-10"),
274
+ ("prototype", "Prototype Due", "2026-04-20"),
275
+ ("final report", "Final Report Due", "2026-04-30"),
276
+ ]
277
+ payload = (action.payload or "").strip()
278
+ payload_lower = payload.lower()
279
+
280
+ for marker, canonical_name, canonical_deadline in canonical_todos:
281
+ if marker in payload_lower:
282
+ return AssistantAction(
283
+ action_type="add_todo",
284
+ target_id=None,
285
+ payload=canonical_name,
286
+ secondary_payload=canonical_deadline,
287
+ )
288
+
289
+ existing = {todo.strip().lower() for todo in observation.active_todos}
290
+ for _, canonical_name, canonical_deadline in canonical_todos:
291
+ if canonical_name.lower() not in existing:
292
+ return AssistantAction(
293
+ action_type="add_todo",
294
+ target_id=None,
295
+ payload=canonical_name,
296
+ secondary_payload=canonical_deadline,
297
+ )
298
+
299
+ return AssistantAction(
300
+ action_type="add_todo",
301
+ target_id=None,
302
+ payload=payload,
303
+ secondary_payload=action.secondary_payload,
304
+ )
305
+
306
+ def _normalize_forward_action(
307
+ self,
308
+ task_name: str,
309
+ observation: WorkspaceObservation,
310
+ action: AssistantAction,
311
+ ) -> AssistantAction:
312
+ target_id = action.target_id
313
+ recipient = action.secondary_payload
314
+ note = action.payload
315
+
316
+ if task_name == "medium_triage_and_negotiation":
317
+ if target_id is None and observation.current_email is not None:
318
+ target_id = observation.current_email.id
319
+ if recipient is None:
320
+ recipient = "manager@company.com"
321
+ if note is None or not note.strip():
322
+ note = "Urgent client complaint. Please take over immediately."
323
+
324
+ return AssistantAction(
325
+ action_type="forward",
326
+ target_id=target_id,
327
+ payload=note,
328
+ secondary_payload=recipient,
329
+ )
330
+
331
+ @staticmethod
332
+ def _resolve_teammate_email_id(
333
+ observation: WorkspaceObservation,
334
+ target_id: int | None,
335
+ ) -> int | None:
336
+ if target_id is not None:
337
+ return target_id
338
+ if observation.current_email and observation.current_email.sender == "teammate@company.com":
339
+ return observation.current_email.id
340
+ teammate_email = next(
341
+ (email for email in observation.unread_emails if email.sender == "teammate@company.com"),
342
+ None,
343
+ )
344
+ return teammate_email.id if teammate_email is not None else None
345
+
346
+
347
+ OpenAIResponsesPolicy = OpenRouterPolicy
348
+
349
+
350
+ def run_episode(task_name: str, max_steps: int = 12) -> EpisodeTrace:
351
+ runner = EpisodeRunner(policy=BaselineAgent(), max_steps=max_steps)
352
+ return runner.run(task_name)
353
+
354
+
355
+ def smoke_test_training_pipeline() -> dict[str, EpisodeTrace]:
356
+ return run_policy_suite(
357
+ policy=BaselineAgent(),
358
+ task_names=[
359
+ "easy_deadline_extraction",
360
+ "medium_triage_and_negotiation",
361
+ "hard_rag_reply",
362
+ ],
363
+ )
src/executive_assistant/config.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import os
4
+ from dataclasses import dataclass
5
+ from pathlib import Path
6
+
7
+
8
+ def load_env_file(env_path: str | Path, override: bool = False) -> bool:
9
+ path = Path(env_path)
10
+ if not path.exists():
11
+ return False
12
+
13
+ for raw_line in path.read_text().splitlines():
14
+ line = raw_line.strip()
15
+ if not line or line.startswith("#") or "=" not in line:
16
+ continue
17
+ key, value = line.split("=", 1)
18
+ key = key.strip()
19
+ value = value.strip().strip('"').strip("'")
20
+ if override or key not in os.environ:
21
+ os.environ[key] = value
22
+ return True
23
+
24
+
25
+ @dataclass(frozen=True)
26
+ class OpenRouterConfig:
27
+ api_key: str
28
+ model_name: str = "google/gemma-4-31b-it"
29
+ base_url: str = "https://openrouter.ai/api/v1"
30
+ site_url: str = "http://localhost:7860"
31
+ app_name: str = "Autonomous Executive Assistant Sandbox"
32
+ temperature: float = 0.1
33
+ max_tokens: int = 600
34
+
35
+ @classmethod
36
+ def from_env(cls, env_file: str | Path | None = None) -> "OpenRouterConfig":
37
+ if env_file is not None:
38
+ load_env_file(env_file)
39
+ api_key = os.environ.get("OPENROUTER_API_KEY", "").strip()
40
+ if not api_key:
41
+ raise RuntimeError("OPENROUTER_API_KEY is required for OpenRouter model access.")
42
+ return cls(
43
+ api_key=api_key,
44
+ model_name=os.environ.get("OPENROUTER_MODEL", "google/gemma-4-31b-it"),
45
+ base_url=os.environ.get("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1"),
46
+ site_url=os.environ.get("OPENROUTER_SITE_URL", "http://localhost:7860"),
47
+ app_name=os.environ.get(
48
+ "OPENROUTER_APP_NAME",
49
+ "Autonomous Executive Assistant Sandbox",
50
+ ),
51
+ temperature=float(os.environ.get("OPENROUTER_TEMPERATURE", "0.1")),
52
+ max_tokens=int(os.environ.get("OPENROUTER_MAX_TOKENS", "600")),
53
+ )
54
+
55
+ def extra_headers(self) -> dict[str, str]:
56
+ return {
57
+ "HTTP-Referer": self.site_url,
58
+ "X-OpenRouter-Title": self.app_name,
59
+ }
60
+
61
+
62
+ @dataclass(frozen=True)
63
+ class TrainingRuntimeConfig:
64
+ kernel_name: str = "scalerhack2-training"
65
+ kernel_display_name: str = "Python (scalerhack2-training)"
66
+ checkpoint_dir: str = "artifacts/checkpoints"
67
+ trace_dir: str = "artifacts/traces"
68
+ env_file: str = ".env.training"
69
+ default_checkpoint_name: str = "q_policy_notebook.json"
70
+
71
+
72
+ @dataclass(frozen=True)
73
+ class AppRuntimeConfig:
74
+ host: str = "0.0.0.0"
75
+ port: int = 7860
76
+ env_file: str = ".env.app"
77
+ checkpoint_dir: str = "artifacts/checkpoints"
78
+ default_checkpoint_name: str = "q_policy_notebook.json"
src/executive_assistant/deployment.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import shutil
4
+ from dataclasses import dataclass
5
+ from pathlib import Path
6
+
7
+ from src.executive_assistant.agent import BaselineAgent
8
+ from src.executive_assistant.training import default_checkpoint_path, train_q_learning
9
+
10
+
11
+ REPO_ROOT = Path(__file__).resolve().parents[2]
12
+ DEFAULT_SPACE_TITLE = "Project Epsilon | Executive Assistant Sandbox"
13
+ DEFAULT_HF_USERNAMES = [
14
+ "HF_USERNAME_1",
15
+ "HF_USERNAME_2",
16
+ "HF_USERNAME_3",
17
+ ]
18
+ DEFAULT_CHECKPOINT_NAME = "q_policy_notebook.json"
19
+ DEFAULT_STAGE_IGNORE_NAMES = {
20
+ ".git",
21
+ ".codex",
22
+ ".pytest_cache",
23
+ ".venv-app",
24
+ ".venv-training",
25
+ ".vscode",
26
+ "__pycache__",
27
+ }
28
+ DEFAULT_STAGE_IGNORE_SUFFIXES = {
29
+ ".pyc",
30
+ }
31
+ DEFAULT_STAGE_IGNORE_FILES = {
32
+ ".env",
33
+ ".env.app",
34
+ ".env.hf.space",
35
+ ".env.training",
36
+ "training_env.executed.ipynb",
37
+ }
38
+
39
+
40
+ @dataclass(frozen=True)
41
+ class HFSpaceDeployConfig:
42
+ repo_id: str
43
+ title: str = DEFAULT_SPACE_TITLE
44
+ team_name: str = "Project Epsilon"
45
+ hf_usernames: tuple[str, ...] = tuple(DEFAULT_HF_USERNAMES)
46
+ checkpoint_name: str = DEFAULT_CHECKPOINT_NAME
47
+ app_port: int = 7860
48
+ private: bool = False
49
+ include_checkpoint: bool = True
50
+
51
+ @property
52
+ def repo_slug(self) -> str:
53
+ return self.repo_id.split("/", 1)[1]
54
+
55
+ @property
56
+ def owner(self) -> str:
57
+ return self.repo_id.split("/", 1)[0]
58
+
59
+ @property
60
+ def space_url(self) -> str:
61
+ return f"https://huggingface.co/spaces/{self.repo_id}"
62
+
63
+ @property
64
+ def app_url(self) -> str:
65
+ return f"https://{self.owner}-{self.repo_slug}.hf.space"
66
+
67
+ @property
68
+ def checkpoint_source_path(self) -> Path:
69
+ return REPO_ROOT / "artifacts" / "checkpoints" / self.checkpoint_name
70
+
71
+
72
+ def parse_hf_usernames(raw_value: str | None) -> tuple[str, ...]:
73
+ if raw_value is None or not raw_value.strip():
74
+ return tuple(DEFAULT_HF_USERNAMES)
75
+ usernames = [item.strip().lstrip("@") for item in raw_value.split(",") if item.strip()]
76
+ return tuple(usernames) or tuple(DEFAULT_HF_USERNAMES)
77
+
78
+
79
+ def render_space_readme(config: HFSpaceDeployConfig) -> str:
80
+ usernames = ", ".join(f"`@{username}`" for username in config.hf_usernames)
81
+ checkpoint_note = (
82
+ "A trained RL checkpoint is bundled in `artifacts/checkpoints/` so the `rl` policy "
83
+ "is available immediately in the demo."
84
+ if config.include_checkpoint
85
+ else "The Space can still run the deterministic baseline immediately; add an RL checkpoint "
86
+ "later if you want the `rl` option available in the UI."
87
+ )
88
+ return f"""---
89
+ title: {config.title}
90
+ emoji: "🧭"
91
+ colorFrom: yellow
92
+ colorTo: gray
93
+ sdk: docker
94
+ app_port: {config.app_port}
95
+ pinned: false
96
+ short_description: OpenEnv executive assistant sandbox demo for judges.
97
+ ---
98
+
99
+ # {config.team_name}
100
+
101
+ Discrete Hugging Face Space for the **Autonomous Executive Assistant Sandbox**, built for the **OpenEnv Scaler x Meta x PyTorch Hack**.
102
+
103
+ ## Team
104
+
105
+ - Team name: `{config.team_name}`
106
+ - Hugging Face usernames: {usernames}
107
+ - Space repo: `{config.repo_id}`
108
+
109
+ Replace the placeholder usernames above once the final team accounts are ready.
110
+
111
+ ## What This Space Shows
112
+
113
+ - A deterministic OpenEnv-style executive assistant environment backed by an isolated SQLite workspace
114
+ - A judge-friendly Gradio interface that replays the shared `EpisodeRunner` loop step by step
115
+ - Side-by-side policy execution for `baseline`, `rl`, and optional `openrouter`
116
+ - Visible inbox, todo, file-search, and action-log state so evaluators can inspect each mutation
117
+
118
+ ## Hack Context
119
+
120
+ OpenEnv was announced by Hugging Face and Meta as an open source framework for building agent environments with typed observations, actions, and rewards. The Scaler dashboard for this hack lists the submission round as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space packages our environment to match that workflow: deterministic tasks, structured actions, visible state transitions, and reproducible judge demos.
121
+
122
+ ## Runtime Notes
123
+
124
+ - SDK: `docker`
125
+ - App port: `{config.app_port}`
126
+ - Entry point: `python app.py`
127
+ - Optional secret: `OPENROUTER_API_KEY`
128
+ - {checkpoint_note}
129
+
130
+ ## Judge Flow
131
+
132
+ 1. Open the Space and choose one of the seeded scenarios.
133
+ 2. Run the deterministic `baseline` policy for a guaranteed reference trace.
134
+ 3. Switch to `rl` to replay the bundled learned checkpoint.
135
+ 4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed path.
136
+
137
+ ## References
138
+
139
+ - Hack dashboard: https://www.scaler.com/openenv-hackathon
140
+ - OpenEnv launch: https://huggingface.co/blog/openenv
141
+ - Space URL: {config.space_url}
142
+ """
143
+
144
+
145
+ def copy_repo_for_space(stage_dir: Path) -> None:
146
+ stage_dir.mkdir(parents=True, exist_ok=True)
147
+ for source in REPO_ROOT.iterdir():
148
+ if source.name in DEFAULT_STAGE_IGNORE_NAMES:
149
+ continue
150
+ if source.name in DEFAULT_STAGE_IGNORE_FILES:
151
+ continue
152
+ if source.suffix in DEFAULT_STAGE_IGNORE_SUFFIXES:
153
+ continue
154
+ destination = stage_dir / source.name
155
+ if source.is_dir():
156
+ shutil.copytree(
157
+ source,
158
+ destination,
159
+ ignore=shutil.ignore_patterns(
160
+ "__pycache__",
161
+ "*.pyc",
162
+ ".env",
163
+ ".env.app",
164
+ ".env.hf.space",
165
+ ".env.training",
166
+ "training_env.executed.ipynb",
167
+ ),
168
+ )
169
+ else:
170
+ shutil.copy2(source, destination)
171
+
172
+
173
+ def ensure_checkpoint(config: HFSpaceDeployConfig, stage_dir: Path) -> Path | None:
174
+ if not config.include_checkpoint:
175
+ return None
176
+
177
+ destination = stage_dir / "artifacts" / "checkpoints" / config.checkpoint_name
178
+ destination.parent.mkdir(parents=True, exist_ok=True)
179
+
180
+ source = config.checkpoint_source_path
181
+ if source.exists():
182
+ shutil.copy2(source, destination)
183
+ return destination
184
+
185
+ policy, _ = train_q_learning(episodes=120, epsilon=0.12, teacher=BaselineAgent())
186
+ return policy.save(destination)
187
+
188
+
189
+ def stage_space_bundle(config: HFSpaceDeployConfig, stage_dir: Path) -> Path | None:
190
+ copy_repo_for_space(stage_dir)
191
+ checkpoint_path = ensure_checkpoint(config, stage_dir)
192
+ readme_path = stage_dir / "README.md"
193
+ readme_path.write_text(render_space_readme(config))
194
+ example_env_path = stage_dir / ".env.hf.space.example"
195
+ if example_env_path.exists():
196
+ example_env_path.unlink()
197
+ return checkpoint_path
198
+
199
+
200
+ def default_checkpoint_runtime_path(checkpoint_name: str = DEFAULT_CHECKPOINT_NAME) -> Path:
201
+ return default_checkpoint_path("artifacts/checkpoints", checkpoint_name)
src/executive_assistant/env.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from src.executive_assistant.graders import grade_easy, grade_hard, grade_medium
4
+ from src.executive_assistant.models import (
5
+ AssistantAction,
6
+ EmailDetail,
7
+ EmailSummary,
8
+ FileSearchResult,
9
+ TaskReward,
10
+ WorkspaceObservation,
11
+ )
12
+ from src.executive_assistant.seeds import TASK_SEEDS
13
+ from src.executive_assistant.workspace import MockWorkspace
14
+
15
+
16
+ class ExecutiveAssistantEnv:
17
+ def __init__(self, task_name: str = "easy_deadline_extraction") -> None:
18
+ self.task_name = task_name
19
+ self.workspace = MockWorkspace()
20
+ self.last_action_status = "environment initialized"
21
+ self.current_email: EmailDetail | None = None
22
+ self.search_results: list[FileSearchResult] = []
23
+ self.step_count = 0
24
+ self.max_steps = 12
25
+
26
+ def reset(self) -> WorkspaceObservation:
27
+ self.workspace = MockWorkspace()
28
+ seed = TASK_SEEDS[self.task_name]
29
+ self.workspace.seed(seed.get("emails", []), seed.get("files", []))
30
+ self.last_action_status = f"scenario reset: {self.task_name}"
31
+ self.current_email = None
32
+ self.search_results = []
33
+ self.step_count = 0
34
+ return self.observe()
35
+
36
+ def observe(self) -> WorkspaceObservation:
37
+ unread = [
38
+ EmailSummary(
39
+ id=row["id"],
40
+ sender=row["sender"],
41
+ subject=row["subject"],
42
+ snippet=row["snippet"],
43
+ )
44
+ for row in self.workspace.get_unread_emails()
45
+ ]
46
+ todos = [row["task_name"] for row in self.workspace.list_todos()]
47
+ recent_actions = [
48
+ f"{row['action_type']}: {row['status']}"
49
+ for row in reversed(self.workspace.list_recent_actions(limit=6))
50
+ ]
51
+ return WorkspaceObservation(
52
+ current_time="2026-04-04T10:00:00Z",
53
+ unread_emails=unread,
54
+ active_todos=todos,
55
+ last_action_status=self.last_action_status,
56
+ current_email=self.current_email,
57
+ search_results=self.search_results,
58
+ action_history=recent_actions,
59
+ )
60
+
61
+ def step(self, action: AssistantAction) -> tuple[WorkspaceObservation, TaskReward]:
62
+ self.step_count += 1
63
+ if action.action_type == "read_email" and action.target_id is not None:
64
+ row = self.workspace.read_email(action.target_id)
65
+ self.current_email = EmailDetail(**dict(row)) if row else None
66
+ self.last_action_status = "email read" if row else "email not found"
67
+ elif action.action_type == "reply" and action.target_id is not None and action.payload:
68
+ self.last_action_status = self.workspace.send_reply(action.target_id, action.payload)
69
+ elif (
70
+ action.action_type == "forward"
71
+ and action.target_id is not None
72
+ and action.secondary_payload
73
+ ):
74
+ self.last_action_status = self.workspace.forward_email(
75
+ action.target_id,
76
+ action.secondary_payload,
77
+ action.payload,
78
+ )
79
+ elif action.action_type == "add_todo" and action.payload:
80
+ self.last_action_status = self.workspace.create_todo(
81
+ task_name=action.payload,
82
+ deadline_date=action.secondary_payload,
83
+ context=(
84
+ f"Created from email {self.current_email.id}: {self.current_email.subject}"
85
+ if self.current_email
86
+ else f"Created from task {self.task_name}"
87
+ ),
88
+ )
89
+ elif action.action_type == "archive" and action.target_id is not None:
90
+ self.last_action_status = self.workspace.archive_email(action.target_id)
91
+ elif action.action_type == "search_files" and action.payload:
92
+ results = self.workspace.search_documents(action.payload)
93
+ self.search_results = [
94
+ FileSearchResult(
95
+ id=row["id"],
96
+ filename=row["filename"],
97
+ snippet=row["content_text"][:160],
98
+ )
99
+ for row in results
100
+ ]
101
+ self.last_action_status = f"search returned {len(results)} file(s)"
102
+ else:
103
+ self.last_action_status = "invalid action payload"
104
+
105
+ observation = self.observe()
106
+ reward = self.grade()
107
+ if self.step_count >= self.max_steps and not reward.is_done:
108
+ reward = TaskReward(
109
+ step_reward=reward.step_reward,
110
+ total_score=reward.total_score,
111
+ is_done=True,
112
+ reasoning=f"{reward.reasoning}; terminated at step budget",
113
+ )
114
+ return observation, reward
115
+
116
+ def grade(self) -> TaskReward:
117
+ if self.task_name == "easy_deadline_extraction":
118
+ return grade_easy(self.workspace)
119
+ if self.task_name == "medium_triage_and_negotiation":
120
+ return grade_medium(self.workspace)
121
+ if self.task_name == "hard_rag_reply":
122
+ return grade_hard(self.workspace)
123
+ return TaskReward(reasoning="No grader configured")
src/executive_assistant/graders.py ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import re
4
+
5
+ from src.executive_assistant.models import TaskReward
6
+ from src.executive_assistant.workspace import MockWorkspace
7
+
8
+
9
+ def _clamp_score(value: float) -> float:
10
+ return max(0.0, min(1.0, round(value, 4)))
11
+
12
+
13
+ def grade_easy(workspace: MockWorkspace) -> TaskReward:
14
+ expected = {
15
+ ("proposal due", "2026-04-10"),
16
+ ("prototype due", "2026-04-20"),
17
+ ("final report due", "2026-04-30"),
18
+ }
19
+ todos = workspace.connection.execute(
20
+ "SELECT task_name, deadline_date FROM Todos"
21
+ ).fetchall()
22
+ normalized = {
23
+ (row["task_name"].strip().lower(), (row["deadline_date"] or "").strip()) for row in todos
24
+ }
25
+ matched = len(expected & normalized)
26
+ incorrect = len(normalized - expected)
27
+ read_source = workspace.connection.execute(
28
+ "SELECT COUNT(*) FROM ActionLog WHERE action_type = 'read_email' AND target_id = 1"
29
+ ).fetchone()[0]
30
+ archived = workspace.connection.execute(
31
+ "SELECT COUNT(*) FROM Emails WHERE id = 1 AND is_archived = 1"
32
+ ).fetchone()[0]
33
+
34
+ score = 0.15 if read_source else 0.0
35
+ score += matched * 0.25
36
+ score += 0.10 if archived else 0.0
37
+ score -= incorrect * 0.10
38
+ total_score = _clamp_score(score)
39
+ done = matched == 3 and archived == 1 and incorrect == 0
40
+ return TaskReward(
41
+ step_reward=total_score,
42
+ total_score=total_score,
43
+ is_done=done,
44
+ reasoning=(
45
+ "Extracted all three deadlines and archived the source email"
46
+ if done
47
+ else f"Matched {matched}/3 deadlines, archived={bool(archived)}, incorrect_todos={incorrect}"
48
+ ),
49
+ )
50
+
51
+
52
+ def grade_medium(workspace: MockWorkspace) -> TaskReward:
53
+ newsletters_archived = workspace.connection.execute(
54
+ """
55
+ SELECT COUNT(*) FROM Emails
56
+ WHERE sender IN ('news@updates.example', 'promotions@vendor.example', 'events@community.example')
57
+ AND is_archived = 1
58
+ """
59
+ ).fetchone()[0]
60
+ forwarded = workspace.connection.execute(
61
+ """
62
+ SELECT COUNT(*) FROM ActionLog
63
+ WHERE action_type = 'forward' AND secondary_payload = 'manager@company.com'
64
+ """
65
+ ).fetchone()[0]
66
+ correct_forward = workspace.connection.execute(
67
+ """
68
+ SELECT COUNT(*) FROM ActionLog
69
+ WHERE action_type = 'forward'
70
+ AND secondary_payload = 'manager@company.com'
71
+ AND target_id = (
72
+ SELECT id FROM Emails WHERE sender = 'client@company.com' LIMIT 1
73
+ )
74
+ """
75
+ ).fetchone()[0]
76
+ reply = workspace.connection.execute(
77
+ """
78
+ SELECT payload, target_id FROM ActionLog
79
+ WHERE action_type = 'reply'
80
+ ORDER BY id DESC LIMIT 1
81
+ """
82
+ ).fetchone()
83
+ important_archived = workspace.connection.execute(
84
+ """
85
+ SELECT COUNT(*) FROM Emails
86
+ WHERE sender IN ('client@company.com', 'teammate@company.com')
87
+ AND is_archived = 1
88
+ """
89
+ ).fetchone()[0]
90
+
91
+ score = 0.0
92
+ score += min(newsletters_archived, 3) * 0.1
93
+ if correct_forward >= 1:
94
+ score += 0.4
95
+ elif forwarded >= 1:
96
+ score += 0.1
97
+
98
+ teammate_id = workspace.connection.execute(
99
+ "SELECT id FROM Emails WHERE sender = 'teammate@company.com' LIMIT 1"
100
+ ).fetchone()[0]
101
+ if (
102
+ reply
103
+ and reply["target_id"] == teammate_id
104
+ and re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", reply["payload"] or "")
105
+ ):
106
+ score += 0.3
107
+ elif reply and re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", reply["payload"] or ""):
108
+ score += 0.1
109
+
110
+ score -= important_archived * 0.15
111
+ total_score = _clamp_score(score)
112
+
113
+ return TaskReward(
114
+ step_reward=total_score,
115
+ total_score=total_score,
116
+ is_done=newsletters_archived == 3 and correct_forward >= 1 and total_score >= 1.0,
117
+ reasoning=(
118
+ "Archived newsletters, escalated client complaint, and proposed a meeting time"
119
+ if newsletters_archived == 3 and correct_forward >= 1 and total_score >= 1.0
120
+ else (
121
+ f"newsletters_archived={newsletters_archived}/3, "
122
+ f"correct_forward={correct_forward}, important_archived={important_archived}"
123
+ )
124
+ ),
125
+ )
126
+
127
+
128
+ def grade_hard(workspace: MockWorkspace) -> TaskReward:
129
+ search_called = workspace.connection.execute(
130
+ "SELECT COUNT(*) FROM ActionLog WHERE action_type = 'search_files'"
131
+ ).fetchone()[0]
132
+ targeted_search = workspace.connection.execute(
133
+ """
134
+ SELECT COUNT(*) FROM ActionLog
135
+ WHERE action_type = 'search_files'
136
+ AND LOWER(COALESCE(payload, '')) LIKE '%q3%'
137
+ AND LOWER(COALESCE(payload, '')) LIKE '%architecture%'
138
+ """
139
+ ).fetchone()[0]
140
+ reply = workspace.connection.execute(
141
+ """
142
+ SELECT payload, target_id FROM ActionLog
143
+ WHERE action_type = 'reply'
144
+ ORDER BY id DESC LIMIT 1
145
+ """
146
+ ).fetchone()
147
+ vip_id = workspace.connection.execute(
148
+ "SELECT id FROM Emails WHERE sender = 'vip.stakeholder@company.com' LIMIT 1"
149
+ ).fetchone()[0]
150
+
151
+ score = 0.1 if search_called >= 1 else 0.0
152
+ score += 0.2 if targeted_search >= 1 else 0.0
153
+ if reply and reply["target_id"] == vip_id:
154
+ payload = reply["payload"] or ""
155
+ metrics_found = sum(
156
+ metric in payload for metric in ("99.95%", "182ms", "14%")
157
+ )
158
+ score += metrics_found * 0.2
159
+ if payload.lower().startswith("hello") or "regards" in payload.lower():
160
+ score += 0.1
161
+ total_score = _clamp_score(score)
162
+
163
+ return TaskReward(
164
+ step_reward=total_score,
165
+ total_score=total_score,
166
+ is_done=total_score >= 1.0,
167
+ reasoning=(
168
+ "Searched the report and replied with the required metrics"
169
+ if total_score >= 1.0
170
+ else f"search_called={search_called}, targeted_search={targeted_search}, score={total_score}"
171
+ ),
172
+ )
src/executive_assistant/llm_service.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ from typing import Any
5
+
6
+ from src.executive_assistant.config import OpenRouterConfig
7
+ from src.executive_assistant.models import PolicyDecision, WorkspaceObservation
8
+ from src.executive_assistant.prompts import build_repair_prompt, build_system_prompt, build_user_prompt
9
+
10
+
11
+ class LLMServiceError(RuntimeError):
12
+ """Raised when the configured LLM provider cannot produce a valid policy decision."""
13
+
14
+
15
+ class OpenRouterLLMService:
16
+ def __init__(self, config: OpenRouterConfig, client: Any | None = None) -> None:
17
+ self.config = config
18
+ if client is not None:
19
+ self.client = client
20
+ return
21
+ try:
22
+ from openai import OpenAI
23
+ except ImportError as exc:
24
+ raise LLMServiceError(
25
+ "openai package is required for OpenRouter access. Install requirements first."
26
+ ) from exc
27
+ self.client = OpenAI(
28
+ api_key=config.api_key,
29
+ base_url=config.base_url,
30
+ )
31
+
32
+ def generate_policy_decision(
33
+ self,
34
+ task_name: str,
35
+ observation: WorkspaceObservation,
36
+ ) -> PolicyDecision:
37
+ raw_message = self._request_json(
38
+ system_prompt=build_system_prompt(task_name),
39
+ user_prompt=build_user_prompt(task_name, observation),
40
+ )
41
+ try:
42
+ payload = json.loads(raw_message)
43
+ return PolicyDecision.model_validate(payload)
44
+ except Exception:
45
+ repaired_message = self._request_json(
46
+ system_prompt="You are a strict JSON repair assistant.",
47
+ user_prompt=build_repair_prompt(raw_message),
48
+ )
49
+ try:
50
+ repaired_payload = json.loads(repaired_message)
51
+ return PolicyDecision.model_validate(repaired_payload)
52
+ except Exception as exc:
53
+ raise LLMServiceError(
54
+ f"Provider response did not match policy schema after repair: {repaired_message}"
55
+ ) from exc
56
+
57
+ def _request_json(self, system_prompt: str, user_prompt: str) -> str:
58
+ try:
59
+ completion = self.client.chat.completions.create(
60
+ model=self.config.model_name,
61
+ messages=[
62
+ {"role": "system", "content": system_prompt},
63
+ {"role": "user", "content": user_prompt},
64
+ ],
65
+ response_format={"type": "json_object"},
66
+ temperature=self.config.temperature,
67
+ max_tokens=self.config.max_tokens,
68
+ extra_headers=self.config.extra_headers(),
69
+ )
70
+ except Exception as exc: # pragma: no cover - network/provider dependent
71
+ raise LLMServiceError(f"OpenRouter request failed: {exc}") from exc
72
+
73
+ message = completion.choices[0].message.content or ""
74
+ if not message.strip():
75
+ raise LLMServiceError("OpenRouter returned an empty response.")
76
+ return message
src/executive_assistant/models.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Literal
4
+
5
+ from pydantic import BaseModel, Field
6
+
7
+
8
+ class EmailSummary(BaseModel):
9
+ id: int
10
+ sender: str
11
+ subject: str
12
+ snippet: str
13
+
14
+
15
+ class EmailDetail(BaseModel):
16
+ id: int
17
+ sender: str
18
+ recipient: str
19
+ subject: str
20
+ body: str
21
+ timestamp: str
22
+
23
+
24
+ class FileSearchResult(BaseModel):
25
+ id: int
26
+ filename: str
27
+ snippet: str
28
+
29
+
30
+ class WorkspaceObservation(BaseModel):
31
+ current_time: str
32
+ unread_emails: list[EmailSummary]
33
+ active_todos: list[str]
34
+ last_action_status: str
35
+ current_email: EmailDetail | None = None
36
+ search_results: list[FileSearchResult] = Field(default_factory=list)
37
+ action_history: list[str] = Field(default_factory=list)
38
+
39
+
40
+ class AssistantAction(BaseModel):
41
+ action_type: Literal[
42
+ "read_email",
43
+ "reply",
44
+ "forward",
45
+ "add_todo",
46
+ "archive",
47
+ "search_files",
48
+ ]
49
+ target_id: int | None = None
50
+ payload: str | None = None
51
+ secondary_payload: str | None = None
52
+
53
+
54
+ class TaskReward(BaseModel):
55
+ step_reward: float = Field(default=0.0)
56
+ total_score: float = Field(default=0.0)
57
+ is_done: bool = Field(default=False)
58
+ reasoning: str = Field(default="")
59
+
60
+
61
+ class PolicyDecision(BaseModel):
62
+ reasoning: str = Field(default="")
63
+ action: AssistantAction
src/executive_assistant/prompts.py ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+
5
+ from src.executive_assistant.models import WorkspaceObservation
6
+
7
+ def build_system_prompt(task_name: str) -> str:
8
+ return f"""
9
+ You are the policy layer for a deterministic executive-assistant environment.
10
+
11
+ Mission:
12
+ - Choose exactly one valid structured action at a time.
13
+ - Move the environment toward completion as quickly and safely as possible.
14
+ - Never invent state that is not present in the observation.
15
+
16
+ Response contract:
17
+ - Return strict JSON only with keys: reasoning, action.
18
+ - The action object must contain exactly: action_type, target_id, payload, secondary_payload.
19
+ - Keep reasoning short, concrete, and operational.
20
+ - Do not wrap JSON in markdown fences.
21
+
22
+ Core rules:
23
+ - Use only IDs visible in the observation.
24
+ - Prefer reading before extracting, searching before drafting, and concrete actions over passive behavior.
25
+ - Never hallucinate files, metrics, recipients, dates, or email contents.
26
+ - If information is missing, choose the next action that will reveal it.
27
+ - When replying, write professional but concise email text.
28
+ - Do not repeat already-completed work when the action history shows it succeeded.
29
+
30
+ Task guidance:
31
+ - easy_deadline_extraction:
32
+ - Read the professor email first.
33
+ - Create exactly three todos with the exact task names and exact ISO dates from the email.
34
+ - Archive the source email only after all three todos exist.
35
+ - medium_triage_and_negotiation:
36
+ - Archive newsletters.
37
+ - Forward the urgent client complaint to manager@company.com.
38
+ - Reply to the reschedule request with a concrete time string.
39
+ - Do not archive important unresolved emails before acting on them.
40
+ - hard_rag_reply:
41
+ - Read the stakeholder email first.
42
+ - Search files for the Q3 architecture report before replying.
43
+ - Reply with the exact metrics found in the file search results.
44
+ - The reply should start with a short greeting such as "Hello," and end with a signoff such as "Regards,".
45
+
46
+ Allowed action types:
47
+ - read_email
48
+ - reply
49
+ - forward
50
+ - add_todo
51
+ - archive
52
+ - search_files
53
+
54
+ Current scenario: {task_name}
55
+ """.strip()
56
+
57
+
58
+ def build_user_prompt(task_name: str, observation: WorkspaceObservation) -> str:
59
+ return (
60
+ "Observation JSON follows. Choose the single best next action for the active scenario.\n\n"
61
+ f"SCENARIO: {task_name}\n"
62
+ "OBSERVATION:\n"
63
+ f"{json.dumps(observation.model_dump(), indent=2)}\n\n"
64
+ "Return only one JSON object matching:\n"
65
+ "{\n"
66
+ ' "reasoning": "short operational justification",\n'
67
+ ' "action": {\n'
68
+ ' "action_type": "read_email|reply|forward|add_todo|archive|search_files",\n'
69
+ ' "target_id": 1,\n'
70
+ ' "payload": null,\n'
71
+ ' "secondary_payload": null\n'
72
+ " }\n"
73
+ "}\n"
74
+ )
75
+
76
+
77
+ def build_repair_prompt(raw_response: str) -> str:
78
+ return (
79
+ "The previous model output did not match the required JSON schema.\n"
80
+ "Repair it into one valid JSON object with keys reasoning and action only.\n"
81
+ "Do not add markdown fences or commentary.\n\n"
82
+ f"INVALID OUTPUT:\n{raw_response}"
83
+ )
src/executive_assistant/runner.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ from dataclasses import asdict, dataclass
5
+ from pathlib import Path
6
+ from typing import Protocol
7
+
8
+ from src.executive_assistant.env import ExecutiveAssistantEnv
9
+ from src.executive_assistant.models import AssistantAction, PolicyDecision, TaskReward, WorkspaceObservation
10
+
11
+
12
+ class AssistantPolicy(Protocol):
13
+ def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
14
+ ...
15
+
16
+
17
+ @dataclass(frozen=True)
18
+ class EpisodeStepRecord:
19
+ step_index: int
20
+ reasoning: str
21
+ action: dict[str, object]
22
+ observation: dict[str, object]
23
+ snapshot: dict[str, object]
24
+ reward: dict[str, object]
25
+ status: str
26
+
27
+
28
+ @dataclass(frozen=True)
29
+ class EpisodeTrace:
30
+ task_name: str
31
+ policy_name: str
32
+ steps: list[EpisodeStepRecord]
33
+ final_score: float
34
+ completed: bool
35
+ termination_reason: str
36
+
37
+ def to_dict(self) -> dict[str, object]:
38
+ return {
39
+ "task_name": self.task_name,
40
+ "policy_name": self.policy_name,
41
+ "steps": [asdict(step) for step in self.steps],
42
+ "final_score": self.final_score,
43
+ "completed": self.completed,
44
+ "termination_reason": self.termination_reason,
45
+ }
46
+
47
+
48
+ class EpisodeRunner:
49
+ def __init__(self, policy: AssistantPolicy, max_steps: int = 12) -> None:
50
+ self.policy = policy
51
+ self.max_steps = max_steps
52
+
53
+ def initialize(self, task_name: str) -> tuple[ExecutiveAssistantEnv, WorkspaceObservation]:
54
+ """Load environment state and generate the initial observation."""
55
+ env = ExecutiveAssistantEnv(task_name=task_name)
56
+ env.max_steps = self.max_steps
57
+ observation = env.reset()
58
+ return env, observation
59
+
60
+ def advance(
61
+ self,
62
+ task_name: str,
63
+ env: ExecutiveAssistantEnv,
64
+ observation: WorkspaceObservation,
65
+ ) -> tuple[PolicyDecision, WorkspaceObservation, TaskReward, EpisodeStepRecord]:
66
+ """
67
+ Execute one full agent workflow step:
68
+ 1. Send observation to policy
69
+ 2. Receive structured action
70
+ 3. Execute action in workspace
71
+ 4. Update state and capture the resulting trace record
72
+ """
73
+ decision = self.policy.choose_action(task_name, observation)
74
+ next_observation, reward = env.step(decision.action)
75
+ record = EpisodeStepRecord(
76
+ step_index=env.step_count,
77
+ reasoning=decision.reasoning,
78
+ action=decision.action.model_dump(),
79
+ observation=next_observation.model_dump(),
80
+ snapshot=env.workspace.snapshot(),
81
+ reward=reward.model_dump(),
82
+ status=next_observation.last_action_status,
83
+ )
84
+ return decision, next_observation, reward, record
85
+
86
+ def run(self, task_name: str) -> EpisodeTrace:
87
+ """
88
+ Agent workflow loop:
89
+ 1. Load environment state
90
+ 2. Generate observation
91
+ 3. Send to policy/LLM
92
+ 4. Receive structured action
93
+ 5. Execute action in workspace
94
+ 6. Update state
95
+ 7. Repeat until task complete
96
+ """
97
+ env, observation = self.initialize(task_name)
98
+ steps: list[EpisodeStepRecord] = []
99
+
100
+ while True:
101
+ _, observation, reward, record = self.advance(task_name, env, observation)
102
+ steps.append(record)
103
+ if reward.is_done:
104
+ return EpisodeTrace(
105
+ task_name=task_name,
106
+ policy_name=type(self.policy).__name__,
107
+ steps=steps,
108
+ final_score=reward.total_score,
109
+ completed=reward.total_score >= 1.0,
110
+ termination_reason=reward.reasoning,
111
+ )
112
+
113
+
114
+ def run_policy_suite(
115
+ policy: AssistantPolicy,
116
+ task_names: list[str],
117
+ max_steps: int = 12,
118
+ ) -> dict[str, EpisodeTrace]:
119
+ runner = EpisodeRunner(policy=policy, max_steps=max_steps)
120
+ return {task_name: runner.run(task_name) for task_name in task_names}
121
+
122
+
123
+ def export_traces_jsonl(traces: list[EpisodeTrace], output_path: str | Path) -> Path:
124
+ path = Path(output_path)
125
+ path.parent.mkdir(parents=True, exist_ok=True)
126
+ lines = [json.dumps(trace.to_dict()) for trace in traces]
127
+ path.write_text("\n".join(lines) + ("\n" if lines else ""))
128
+ return path
src/executive_assistant/seeds.py ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+
4
+ TASK_SEEDS = {
5
+ "easy_deadline_extraction": {
6
+ "emails": [
7
+ {
8
+ "sender": "prof.smith@university.edu",
9
+ "recipient": "assistant@workspace.local",
10
+ "subject": "Course project milestones",
11
+ "body": (
12
+ "Please track these deadlines: proposal due 2026-04-10, "
13
+ "prototype due 2026-04-20, and final report due 2026-04-30."
14
+ ),
15
+ "timestamp": "2026-04-04T09:00:00Z",
16
+ }
17
+ ],
18
+ "files": [],
19
+ },
20
+ "medium_triage_and_negotiation": {
21
+ "emails": [
22
+ {
23
+ "sender": "news@updates.example",
24
+ "recipient": "assistant@workspace.local",
25
+ "subject": "Weekly industry digest",
26
+ "body": "Newsletter content 1",
27
+ "timestamp": "2026-04-04T08:00:00Z",
28
+ },
29
+ {
30
+ "sender": "promotions@vendor.example",
31
+ "recipient": "assistant@workspace.local",
32
+ "subject": "Exclusive offer",
33
+ "body": "Newsletter content 2",
34
+ "timestamp": "2026-04-04T08:05:00Z",
35
+ },
36
+ {
37
+ "sender": "events@community.example",
38
+ "recipient": "assistant@workspace.local",
39
+ "subject": "Upcoming events",
40
+ "body": "Newsletter content 3",
41
+ "timestamp": "2026-04-04T08:10:00Z",
42
+ },
43
+ {
44
+ "sender": "client@company.com",
45
+ "recipient": "assistant@workspace.local",
46
+ "subject": "Urgent: delivery issue",
47
+ "body": "A critical complaint needs escalation.",
48
+ "timestamp": "2026-04-04T08:20:00Z",
49
+ },
50
+ {
51
+ "sender": "teammate@company.com",
52
+ "recipient": "assistant@workspace.local",
53
+ "subject": "Need to reschedule",
54
+ "body": "Can we move our sync? Please propose a new time.",
55
+ "timestamp": "2026-04-04T08:30:00Z",
56
+ },
57
+ ],
58
+ "files": [],
59
+ },
60
+ "hard_rag_reply": {
61
+ "emails": [
62
+ {
63
+ "sender": "vip.stakeholder@company.com",
64
+ "recipient": "assistant@workspace.local",
65
+ "subject": "Need Q3 architecture metrics",
66
+ "body": "Please share the key Q3 architecture metrics from the report.",
67
+ "timestamp": "2026-04-04T07:30:00Z",
68
+ }
69
+ ],
70
+ "files": [
71
+ {
72
+ "filename": "Q3_Architecture_Report.txt",
73
+ "content_text": (
74
+ "Q3 Architecture Report\n"
75
+ "System availability: 99.95%\n"
76
+ "Mean API latency: 182ms\n"
77
+ "Infrastructure cost reduction: 14%\n"
78
+ ),
79
+ }
80
+ ],
81
+ },
82
+ }
src/executive_assistant/training.py ADDED
@@ -0,0 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import random
5
+ from collections import defaultdict
6
+ from dataclasses import dataclass
7
+ from pathlib import Path
8
+
9
+ from src.executive_assistant.agent import ActionCatalog, BaselineAgent
10
+ from src.executive_assistant.env import ExecutiveAssistantEnv
11
+ from src.executive_assistant.models import AssistantAction, PolicyDecision, WorkspaceObservation
12
+ from src.executive_assistant.runner import EpisodeRunner, EpisodeTrace
13
+
14
+
15
+ ACTION_NAMES = [
16
+ "read_first_unread",
17
+ "archive_first_unread",
18
+ "forward_client_to_manager",
19
+ "reply_meeting_time",
20
+ "add_deadline_todo",
21
+ "archive_current_email",
22
+ "search_q3_architecture",
23
+ "reply_with_metrics",
24
+ ]
25
+
26
+
27
+ def _current_email_sender(observation: WorkspaceObservation) -> str:
28
+ return observation.current_email.sender if observation.current_email else "none"
29
+
30
+
31
+ def encode_observation(task_name: str, observation: WorkspaceObservation) -> str:
32
+ unread_senders = ",".join(sorted(email.sender for email in observation.unread_emails)) or "none"
33
+ return "|".join(
34
+ [
35
+ task_name,
36
+ f"unread={len(observation.unread_emails)}",
37
+ f"senders={unread_senders}",
38
+ f"todos={len(observation.active_todos)}",
39
+ f"current={_current_email_sender(observation)}",
40
+ f"search={int(bool(observation.search_results))}",
41
+ f"history={'/'.join(observation.action_history[-3:]) or 'none'}",
42
+ ]
43
+ )
44
+
45
+
46
+ def valid_action_names(task_name: str, observation: WorkspaceObservation) -> list[str]:
47
+ valid: list[str] = []
48
+
49
+ if task_name == "easy_deadline_extraction":
50
+ if observation.current_email is None and observation.unread_emails:
51
+ valid.append("read_first_unread")
52
+ if observation.current_email is not None:
53
+ body = observation.current_email.body.lower()
54
+ existing = {todo.lower() for todo in observation.active_todos}
55
+ missing_todo = False
56
+ if "proposal due" in body and "proposal due" not in existing:
57
+ valid.append("add_deadline_todo")
58
+ missing_todo = True
59
+ elif "prototype due" in body and "prototype due" not in existing:
60
+ valid.append("add_deadline_todo")
61
+ missing_todo = True
62
+ elif "final report due" in body and "final report due" not in existing:
63
+ valid.append("add_deadline_todo")
64
+ missing_todo = True
65
+ if not missing_todo:
66
+ valid.append("archive_current_email")
67
+ elif task_name == "medium_triage_and_negotiation":
68
+ newsletter_senders = {
69
+ "news@updates.example",
70
+ "promotions@vendor.example",
71
+ "events@community.example",
72
+ }
73
+ if any(email.sender in newsletter_senders for email in observation.unread_emails):
74
+ valid.append("archive_first_unread")
75
+ if any(email.sender == "client@company.com" for email in observation.unread_emails):
76
+ valid.append("forward_client_to_manager")
77
+ if any(email.sender == "teammate@company.com" for email in observation.unread_emails):
78
+ valid.append("reply_meeting_time")
79
+ elif task_name == "hard_rag_reply":
80
+ if observation.current_email is None and observation.unread_emails:
81
+ valid.append("read_first_unread")
82
+ if observation.current_email is not None and not observation.search_results:
83
+ valid.append("search_q3_architecture")
84
+ if observation.current_email is not None and observation.search_results:
85
+ valid.append("reply_with_metrics")
86
+
87
+ return valid or ACTION_NAMES.copy()
88
+
89
+
90
+ def make_action(action_name: str, observation: WorkspaceObservation) -> AssistantAction:
91
+ if action_name == "read_first_unread":
92
+ if observation.unread_emails:
93
+ return AssistantAction(action_type="read_email", target_id=observation.unread_emails[0].id)
94
+ elif action_name == "archive_first_unread":
95
+ if observation.unread_emails:
96
+ return AssistantAction(action_type="archive", target_id=observation.unread_emails[0].id)
97
+ elif action_name == "forward_client_to_manager":
98
+ for email in observation.unread_emails:
99
+ if email.sender == "client@company.com":
100
+ return AssistantAction(
101
+ action_type="forward",
102
+ target_id=email.id,
103
+ secondary_payload="manager@company.com",
104
+ payload="Urgent client complaint. Please take over immediately.",
105
+ )
106
+ elif action_name == "reply_meeting_time":
107
+ target_id = observation.current_email.id if observation.current_email else None
108
+ if target_id is None:
109
+ for email in observation.unread_emails:
110
+ if email.sender == "teammate@company.com":
111
+ target_id = email.id
112
+ break
113
+ if target_id is not None:
114
+ return AssistantAction(
115
+ action_type="reply",
116
+ target_id=target_id,
117
+ payload="Hello, 3:30 PM IST works for me. Regards, Executive Assistant",
118
+ )
119
+ elif action_name == "add_deadline_todo":
120
+ if observation.current_email:
121
+ body = observation.current_email.body.lower()
122
+ candidates = [
123
+ ("Proposal Due", "2026-04-10", "proposal due"),
124
+ ("Prototype Due", "2026-04-20", "prototype due"),
125
+ ("Final Report Due", "2026-04-30", "final report due"),
126
+ ]
127
+ existing = {todo.lower() for todo in observation.active_todos}
128
+ for task_name, deadline, marker in candidates:
129
+ if marker in body and task_name.lower() not in existing:
130
+ return AssistantAction(
131
+ action_type="add_todo",
132
+ payload=task_name,
133
+ secondary_payload=deadline,
134
+ )
135
+ elif action_name == "archive_current_email":
136
+ if observation.current_email:
137
+ return AssistantAction(action_type="archive", target_id=observation.current_email.id)
138
+ elif action_name == "search_q3_architecture":
139
+ return AssistantAction(action_type="search_files", payload="Q3 Architecture")
140
+ elif action_name == "reply_with_metrics":
141
+ if observation.current_email and observation.search_results:
142
+ snippet = observation.search_results[0].snippet
143
+ availability = "99.95%" if "99.95%" in snippet else "unknown"
144
+ latency = "182ms" if "182ms" in snippet else "unknown"
145
+ cost = "14%" if "14%" in snippet else "unknown"
146
+ return AssistantAction(
147
+ action_type="reply",
148
+ target_id=observation.current_email.id,
149
+ payload=(
150
+ "Hello,\n"
151
+ f"Here are the requested Q3 architecture metrics: availability {availability}, "
152
+ f"mean API latency {latency}, and infrastructure cost reduction {cost}.\n"
153
+ "Regards,\nExecutive Assistant"
154
+ ),
155
+ )
156
+ return AssistantAction(action_type="search_files")
157
+
158
+
159
+ @dataclass
160
+ class QLearningPolicy:
161
+ epsilon: float = 0.2
162
+ alpha: float = 0.3
163
+ gamma: float = 0.95
164
+ seed: int = 7
165
+
166
+ def __post_init__(self) -> None:
167
+ self.q_values: dict[str, dict[str, float]] = defaultdict(
168
+ lambda: {action_name: 0.0 for action_name in ACTION_NAMES}
169
+ )
170
+ self.random = random.Random(self.seed)
171
+
172
+ def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
173
+ state = encode_observation(task_name, observation)
174
+ candidates = valid_action_names(task_name, observation)
175
+ if self.random.random() < self.epsilon:
176
+ action_name = self.random.choice(candidates)
177
+ return PolicyDecision(
178
+ reasoning=f"Exploring action template {action_name}.",
179
+ action=make_action(action_name, observation),
180
+ )
181
+
182
+ action_name = max(candidates, key=lambda name: self.q_values[state][name])
183
+ return PolicyDecision(
184
+ reasoning=f"Selecting greedy action template {action_name}.",
185
+ action=make_action(action_name, observation),
186
+ )
187
+
188
+ def update(
189
+ self,
190
+ state: str,
191
+ action_name: str,
192
+ reward: float,
193
+ next_state: str,
194
+ done: bool,
195
+ ) -> None:
196
+ next_best = 0.0 if done else max(self.q_values[next_state].values())
197
+ current = self.q_values[state][action_name]
198
+ target = reward + self.gamma * next_best
199
+ self.q_values[state][action_name] = current + self.alpha * (target - current)
200
+
201
+ def save(self, path: str | Path) -> Path:
202
+ output = Path(path)
203
+ output.parent.mkdir(parents=True, exist_ok=True)
204
+ payload = {
205
+ "metadata": {
206
+ "action_names": ACTION_NAMES,
207
+ "seed": self.seed,
208
+ "alpha": self.alpha,
209
+ "gamma": self.gamma,
210
+ "epsilon": 0.0,
211
+ },
212
+ "q_values": self.q_values,
213
+ }
214
+ output.write_text(json.dumps(payload, indent=2))
215
+ return output
216
+
217
+ @classmethod
218
+ def load(cls, path: str | Path) -> "QLearningPolicy":
219
+ checkpoint_path = Path(path)
220
+ policy = cls(epsilon=0.0)
221
+ raw_payload = json.loads(checkpoint_path.read_text())
222
+ raw_values = raw_payload["q_values"] if "q_values" in raw_payload else raw_payload
223
+ policy.q_values = defaultdict(
224
+ lambda: {action_name: 0.0 for action_name in ACTION_NAMES}
225
+ )
226
+ for state, action_map in raw_values.items():
227
+ policy.q_values[state] = {
228
+ action_name: float(action_map.get(action_name, 0.0))
229
+ for action_name in ACTION_NAMES
230
+ }
231
+ policy.epsilon = 0.0
232
+ return policy
233
+
234
+
235
+ def action_name_from_decision(decision: PolicyDecision, observation: WorkspaceObservation) -> str:
236
+ for action_name in ACTION_NAMES:
237
+ candidate = make_action(action_name, observation)
238
+ if candidate == decision.action:
239
+ return action_name
240
+ return "search_q3_architecture"
241
+
242
+
243
+ def warm_start_from_teacher(
244
+ learner: QLearningPolicy,
245
+ teacher: BaselineAgent,
246
+ task_names: list[str],
247
+ episodes_per_task: int = 4,
248
+ ) -> None:
249
+ runner = EpisodeRunner(policy=teacher)
250
+ for _ in range(episodes_per_task):
251
+ for task_name in task_names:
252
+ trace = runner.run(task_name)
253
+ for index, step in enumerate(trace.steps):
254
+ current_observation = WorkspaceObservation.model_validate(step.observation)
255
+ previous_observation = (
256
+ WorkspaceObservation.model_validate(trace.steps[index - 1].observation)
257
+ if index > 0
258
+ else None
259
+ )
260
+ observation = previous_observation or current_observation
261
+ state = encode_observation(task_name, observation)
262
+ next_state = encode_observation(task_name, current_observation)
263
+ reward_delta = step.reward["total_score"]
264
+ action_name = action_name_from_decision(
265
+ PolicyDecision(
266
+ reasoning=step.reasoning,
267
+ action=AssistantAction.model_validate(step.action),
268
+ ),
269
+ observation,
270
+ )
271
+ learner.update(
272
+ state=state,
273
+ action_name=action_name,
274
+ reward=reward_delta,
275
+ next_state=next_state,
276
+ done=bool(step.reward["is_done"]),
277
+ )
278
+
279
+
280
+ def train_q_learning(
281
+ episodes: int = 200,
282
+ epsilon: float = 0.15,
283
+ teacher: BaselineAgent | None = None,
284
+ ) -> tuple[QLearningPolicy, dict[str, float]]:
285
+ learner = QLearningPolicy(epsilon=epsilon)
286
+ task_names = [
287
+ "easy_deadline_extraction",
288
+ "medium_triage_and_negotiation",
289
+ "hard_rag_reply",
290
+ ]
291
+ if teacher is not None:
292
+ warm_start_from_teacher(learner, teacher, task_names)
293
+
294
+ scores: dict[str, float] = {}
295
+ for episode in range(episodes):
296
+ task_name = task_names[episode % len(task_names)]
297
+ env = ExecutiveAssistantEnv(task_name=task_name)
298
+ observation = env.reset()
299
+ previous_total_score = 0.0
300
+
301
+ while True:
302
+ state = encode_observation(task_name, observation)
303
+ decision = learner.choose_action(task_name, observation)
304
+ action_name = action_name_from_decision(decision, observation)
305
+ next_observation, reward = env.step(decision.action)
306
+ next_state = encode_observation(task_name, next_observation)
307
+ reward_delta = reward.total_score - previous_total_score - 0.01
308
+ previous_total_score = reward.total_score
309
+ learner.update(
310
+ state=state,
311
+ action_name=action_name,
312
+ reward=reward_delta,
313
+ next_state=next_state,
314
+ done=reward.is_done,
315
+ )
316
+ observation = next_observation
317
+ if reward.is_done:
318
+ scores[task_name] = reward.total_score
319
+ break
320
+ return learner, scores
321
+
322
+
323
+ def evaluate_q_policy(policy: QLearningPolicy) -> dict[str, float]:
324
+ original_epsilon = policy.epsilon
325
+ policy.epsilon = 0.0
326
+ try:
327
+ traces = {
328
+ task_name: EpisodeRunner(policy=policy).run(task_name)
329
+ for task_name in [
330
+ "easy_deadline_extraction",
331
+ "medium_triage_and_negotiation",
332
+ "hard_rag_reply",
333
+ ]
334
+ }
335
+ finally:
336
+ policy.epsilon = original_epsilon
337
+ return {task_name: trace.final_score for task_name, trace in traces.items()}
338
+
339
+
340
+ def default_checkpoint_path(checkpoint_dir: str | Path, checkpoint_name: str) -> Path:
341
+ return Path(checkpoint_dir) / checkpoint_name
src/executive_assistant/workspace.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import sqlite3
4
+ from typing import Any
5
+
6
+
7
+ class MockWorkspace:
8
+ def __init__(self) -> None:
9
+ # Gradio executes callbacks in worker threads, so the in-memory
10
+ # workspace connection needs to remain usable across that boundary.
11
+ self.connection = sqlite3.connect(":memory:", check_same_thread=False)
12
+ self.connection.row_factory = sqlite3.Row
13
+ self._create_tables()
14
+
15
+ def _create_tables(self) -> None:
16
+ self.connection.executescript(
17
+ """
18
+ CREATE TABLE Emails (
19
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
20
+ sender TEXT NOT NULL,
21
+ recipient TEXT NOT NULL,
22
+ subject TEXT NOT NULL,
23
+ body TEXT NOT NULL,
24
+ timestamp TEXT NOT NULL,
25
+ is_read INTEGER NOT NULL DEFAULT 0,
26
+ is_archived INTEGER NOT NULL DEFAULT 0
27
+ );
28
+
29
+ CREATE TABLE Todos (
30
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
31
+ task_name TEXT NOT NULL,
32
+ deadline_date TEXT,
33
+ context TEXT NOT NULL
34
+ );
35
+
36
+ CREATE TABLE Files (
37
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
38
+ filename TEXT NOT NULL,
39
+ content_text TEXT NOT NULL
40
+ );
41
+
42
+ CREATE TABLE ActionLog (
43
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
44
+ action_type TEXT NOT NULL,
45
+ target_id INTEGER,
46
+ payload TEXT,
47
+ secondary_payload TEXT,
48
+ status TEXT NOT NULL
49
+ );
50
+ """
51
+ )
52
+ self.connection.commit()
53
+
54
+ def seed(self, emails: list[dict[str, Any]], files: list[dict[str, Any]]) -> None:
55
+ self.connection.executemany(
56
+ """
57
+ INSERT INTO Emails (sender, recipient, subject, body, timestamp)
58
+ VALUES (:sender, :recipient, :subject, :body, :timestamp)
59
+ """,
60
+ emails,
61
+ )
62
+ self.connection.executemany(
63
+ """
64
+ INSERT INTO Files (filename, content_text)
65
+ VALUES (:filename, :content_text)
66
+ """,
67
+ files,
68
+ )
69
+ self.connection.commit()
70
+
71
+ def get_unread_emails(self) -> list[sqlite3.Row]:
72
+ return self.connection.execute(
73
+ """
74
+ SELECT id, sender, subject, substr(body, 1, 80) AS snippet
75
+ FROM Emails
76
+ WHERE is_read = 0 AND is_archived = 0
77
+ ORDER BY timestamp ASC
78
+ """
79
+ ).fetchall()
80
+
81
+ def read_email(self, email_id: int) -> sqlite3.Row | None:
82
+ self.connection.execute("UPDATE Emails SET is_read = 1 WHERE id = ?", (email_id,))
83
+ self.connection.commit()
84
+ row = self.connection.execute("SELECT * FROM Emails WHERE id = ?", (email_id,)).fetchone()
85
+ status = "email read" if row else "email not found"
86
+ self.log_action("read_email", email_id, None, None, status)
87
+ return row
88
+
89
+ def send_reply(self, email_id: int, text: str) -> str:
90
+ row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
91
+ if row is None:
92
+ self.log_action("reply", email_id, text, None, "reply failed: email not found")
93
+ return "reply failed: email not found"
94
+ self.log_action("reply", email_id, text, None, "reply drafted")
95
+ return "reply drafted"
96
+
97
+ def forward_email(self, email_id: int, recipient: str, note: str | None = None) -> str:
98
+ row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
99
+ if row is None:
100
+ self.log_action(
101
+ "forward",
102
+ email_id,
103
+ note,
104
+ recipient,
105
+ "forward failed: email not found",
106
+ )
107
+ return "forward failed: email not found"
108
+ self.log_action("forward", email_id, note, recipient, f"forwarded to {recipient}")
109
+ return f"forwarded to {recipient}"
110
+
111
+ def create_todo(self, task_name: str, deadline_date: str | None, context: str) -> str:
112
+ self.connection.execute(
113
+ "INSERT INTO Todos (task_name, deadline_date, context) VALUES (?, ?, ?)",
114
+ (task_name, deadline_date, context),
115
+ )
116
+ self.connection.commit()
117
+ self.log_action("add_todo", None, task_name, deadline_date, "todo created")
118
+ return "todo created"
119
+
120
+ def archive_email(self, email_id: int) -> str:
121
+ row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
122
+ if row is None:
123
+ self.log_action("archive", email_id, None, None, "archive failed: email not found")
124
+ return "archive failed: email not found"
125
+ self.connection.execute("UPDATE Emails SET is_archived = 1 WHERE id = ?", (email_id,))
126
+ self.connection.commit()
127
+ self.log_action("archive", email_id, None, None, "email archived")
128
+ return "email archived"
129
+
130
+ def search_documents(self, query: str) -> list[sqlite3.Row]:
131
+ results = self.connection.execute(
132
+ """
133
+ SELECT * FROM Files
134
+ WHERE filename LIKE ? OR content_text LIKE ?
135
+ ORDER BY id ASC
136
+ """,
137
+ (f"%{query}%", f"%{query}%"),
138
+ ).fetchall()
139
+ self.log_action("search_files", None, query, None, f"{len(results)} file(s) matched")
140
+ return results
141
+
142
+ def list_todos(self) -> list[sqlite3.Row]:
143
+ return self.connection.execute(
144
+ "SELECT id, task_name, deadline_date, context FROM Todos ORDER BY id ASC"
145
+ ).fetchall()
146
+
147
+ def list_recent_actions(self, limit: int = 6) -> list[sqlite3.Row]:
148
+ return self.connection.execute(
149
+ """
150
+ SELECT id, action_type, target_id, payload, secondary_payload, status
151
+ FROM ActionLog
152
+ ORDER BY id DESC
153
+ LIMIT ?
154
+ """,
155
+ (limit,),
156
+ ).fetchall()
157
+
158
+ def log_action(
159
+ self,
160
+ action_type: str,
161
+ target_id: int | None,
162
+ payload: str | None,
163
+ secondary_payload: str | None,
164
+ status: str,
165
+ ) -> None:
166
+ self.connection.execute(
167
+ """
168
+ INSERT INTO ActionLog (action_type, target_id, payload, secondary_payload, status)
169
+ VALUES (?, ?, ?, ?, ?)
170
+ """,
171
+ (action_type, target_id, payload, secondary_payload, status),
172
+ )
173
+ self.connection.commit()
174
+
175
+ def snapshot(self) -> dict[str, list[dict[str, Any]]]:
176
+ return {
177
+ "emails": [
178
+ dict(row)
179
+ for row in self.connection.execute("SELECT * FROM Emails ORDER BY id ASC")
180
+ ],
181
+ "todos": [
182
+ dict(row)
183
+ for row in self.connection.execute("SELECT * FROM Todos ORDER BY id ASC")
184
+ ],
185
+ "files": [
186
+ dict(row)
187
+ for row in self.connection.execute("SELECT * FROM Files ORDER BY id ASC")
188
+ ],
189
+ "action_log": [
190
+ dict(row)
191
+ for row in self.connection.execute("SELECT * FROM ActionLog ORDER BY id ASC")
192
+ ],
193
+ }
tests/test_agent.py ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pytest
2
+
3
+ from src.executive_assistant.agent import (
4
+ ActionCatalog,
5
+ BaselineAgent,
6
+ OpenRouterPolicy,
7
+ smoke_test_training_pipeline,
8
+ )
9
+ from src.executive_assistant.config import OpenRouterConfig
10
+ from src.executive_assistant.env import ExecutiveAssistantEnv
11
+ from src.executive_assistant.models import AssistantAction, PolicyDecision
12
+ from src.executive_assistant.runner import EpisodeRunner, export_traces_jsonl
13
+
14
+
15
+ def test_action_catalog_exposes_candidate_actions() -> None:
16
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
17
+ observation = env.reset()
18
+ actions = ActionCatalog.enumerate_actions(observation)
19
+ assert any(action.action_type == "read_email" for action in actions)
20
+
21
+
22
+ def test_baseline_pipeline_solves_seeded_tasks() -> None:
23
+ traces = smoke_test_training_pipeline()
24
+ assert traces["easy_deadline_extraction"].completed is True
25
+ assert traces["medium_triage_and_negotiation"].completed is True
26
+ assert traces["hard_rag_reply"].completed is True
27
+
28
+
29
+ def test_episode_runner_produces_trace_records() -> None:
30
+ trace = EpisodeRunner(policy=BaselineAgent()).run("easy_deadline_extraction")
31
+ assert trace.steps
32
+ assert trace.steps[-1].reward["is_done"] is True
33
+
34
+
35
+ def test_export_traces_jsonl_writes_output(tmp_path) -> None:
36
+ trace = EpisodeRunner(policy=BaselineAgent()).run("hard_rag_reply")
37
+ output_path = export_traces_jsonl([trace], tmp_path / "traces.jsonl")
38
+ assert output_path.exists()
39
+ assert output_path.read_text().strip()
40
+
41
+
42
+ def test_openrouter_policy_uses_service() -> None:
43
+ class StubService:
44
+ def generate_policy_decision(self, task_name, observation):
45
+ return BaselineAgent().choose_action(task_name, observation)
46
+
47
+ policy = OpenRouterPolicy(
48
+ config=OpenRouterConfig(api_key="test-key"),
49
+ service=StubService(),
50
+ )
51
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
52
+ observation = env.reset()
53
+ decision = policy.choose_action("easy_deadline_extraction", observation)
54
+ assert decision.action.action_type == "read_email"
55
+
56
+
57
+ def test_openrouter_policy_sanitizes_hard_reply_payload() -> None:
58
+ class StubService:
59
+ def generate_policy_decision(self, task_name, observation):
60
+ return PolicyDecision(
61
+ reasoning="Reply with metrics.",
62
+ action=AssistantAction(
63
+ action_type="reply",
64
+ target_id=1,
65
+ payload="System availability: 99.95%, Mean API latency: 182ms, Infrastructure cost reduction: 14%.",
66
+ secondary_payload=None,
67
+ ),
68
+ )
69
+
70
+ policy = OpenRouterPolicy(
71
+ config=OpenRouterConfig(api_key="test-key"),
72
+ service=StubService(),
73
+ )
74
+ env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
75
+ observation = env.reset()
76
+ observation, _ = env.step(AssistantAction(action_type="read_email", target_id=1))
77
+ observation, _ = env.step(AssistantAction(action_type="search_files", payload="Q3 Architecture"))
78
+ decision = policy.choose_action("hard_rag_reply", observation)
79
+ assert decision.action.payload is not None
80
+ assert decision.action.payload.lower().startswith("hello")
81
+ assert "regards" in decision.action.payload.lower()
82
+
83
+
84
+ def test_openrouter_policy_clears_unused_search_fields() -> None:
85
+ class StubService:
86
+ def generate_policy_decision(self, task_name, observation):
87
+ return PolicyDecision(
88
+ reasoning="Search for the report.",
89
+ action=AssistantAction(
90
+ action_type="search_files",
91
+ target_id=99,
92
+ payload="Q3 architecture report",
93
+ secondary_payload="unused",
94
+ ),
95
+ )
96
+
97
+ policy = OpenRouterPolicy(
98
+ config=OpenRouterConfig(api_key="test-key"),
99
+ service=StubService(),
100
+ )
101
+ env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
102
+ observation = env.reset()
103
+ decision = policy.choose_action("hard_rag_reply", observation)
104
+ assert decision.action.target_id is None
105
+ assert decision.action.secondary_payload is None
106
+
107
+
108
+ def test_openrouter_policy_normalizes_easy_todo_payload() -> None:
109
+ class StubService:
110
+ def generate_policy_decision(self, task_name, observation):
111
+ return PolicyDecision(
112
+ reasoning="Track the proposal deadline.",
113
+ action=AssistantAction(
114
+ action_type="add_todo",
115
+ payload="proposal",
116
+ secondary_payload=None,
117
+ ),
118
+ )
119
+
120
+ policy = OpenRouterPolicy(
121
+ config=OpenRouterConfig(api_key="test-key"),
122
+ service=StubService(),
123
+ )
124
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
125
+ observation = env.reset()
126
+ observation, _ = env.step(AssistantAction(action_type="read_email", target_id=1))
127
+ decision = policy.choose_action("easy_deadline_extraction", observation)
128
+ assert decision.action.payload == "Proposal Due"
129
+ assert decision.action.secondary_payload == "2026-04-10"
130
+
131
+
132
+ def test_openrouter_policy_repairs_medium_forward_fields() -> None:
133
+ class StubService:
134
+ def generate_policy_decision(self, task_name, observation):
135
+ return PolicyDecision(
136
+ reasoning="Forward the complaint.",
137
+ action=AssistantAction(
138
+ action_type="forward",
139
+ target_id=None,
140
+ payload=None,
141
+ secondary_payload=None,
142
+ ),
143
+ )
144
+
145
+ policy = OpenRouterPolicy(
146
+ config=OpenRouterConfig(api_key="test-key"),
147
+ service=StubService(),
148
+ )
149
+ env = ExecutiveAssistantEnv(task_name="medium_triage_and_negotiation")
150
+ observation = env.reset()
151
+ observation, _ = env.step(AssistantAction(action_type="archive", target_id=1))
152
+ observation, _ = env.step(AssistantAction(action_type="archive", target_id=2))
153
+ observation, _ = env.step(AssistantAction(action_type="archive", target_id=3))
154
+ observation, _ = env.step(AssistantAction(action_type="read_email", target_id=4))
155
+ decision = policy.choose_action("medium_triage_and_negotiation", observation)
156
+ assert decision.action.target_id == 4
157
+ assert decision.action.secondary_payload == "manager@company.com"
158
+ assert "Urgent client complaint" in (decision.action.payload or "")
tests/test_app.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ from src.executive_assistant.agent import BaselineAgent
4
+ from src.executive_assistant.training import train_q_learning
5
+
6
+
7
+ def test_app_builds_rl_policy_from_checkpoint(tmp_path) -> None:
8
+ from app import _build_policy
9
+
10
+ policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
11
+ checkpoint = policy.save(tmp_path / "q_policy.json")
12
+ loaded_policy = _build_policy(
13
+ provider="rl",
14
+ model_name="google/gemma-4-31b-it",
15
+ api_key="",
16
+ checkpoint_path=str(checkpoint),
17
+ )
18
+ assert loaded_policy.epsilon == 0.0
19
+
20
+
21
+ def test_app_stepwise_episode_generator_yields_updates(tmp_path) -> None:
22
+ from app import run_live_episode
23
+
24
+ policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
25
+ checkpoint = policy.save(tmp_path / "q_policy.json")
26
+ generator = run_live_episode(
27
+ task_name="hard_rag_reply",
28
+ provider="rl",
29
+ model_name="google/gemma-4-31b-it",
30
+ api_key="",
31
+ max_steps=12,
32
+ checkpoint_path=str(checkpoint),
33
+ )
34
+ first_frame = next(generator)
35
+ assert "scenario reset" in first_frame[0]
36
+ assert "requested_provider" in first_frame[-1]
37
+ assert "Run pending" in first_frame[1] or "Run " in first_frame[1]
38
+ later_frame = None
39
+ for later_frame in generator:
40
+ pass
41
+ assert later_frame is not None
42
+ assert "reply drafted" in later_frame[0] or "search returned" in later_frame[0]
tests/test_config.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ from src.executive_assistant.config import OpenRouterConfig, load_env_file
4
+
5
+
6
+ def test_load_env_file_sets_openrouter_values(tmp_path, monkeypatch) -> None:
7
+ env_file = tmp_path / ".env.training"
8
+ env_file.write_text(
9
+ "\n".join(
10
+ [
11
+ "OPENROUTER_API_KEY=test-key",
12
+ "OPENROUTER_MODEL=google/gemma-4-31b-it",
13
+ "OPENROUTER_SITE_URL=http://localhost:8888",
14
+ ]
15
+ )
16
+ )
17
+ monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
18
+ monkeypatch.delenv("OPENROUTER_MODEL", raising=False)
19
+ monkeypatch.delenv("OPENROUTER_SITE_URL", raising=False)
20
+
21
+ loaded = load_env_file(env_file)
22
+ config = OpenRouterConfig.from_env()
23
+
24
+ assert loaded is True
25
+ assert os.environ["OPENROUTER_API_KEY"] == "test-key"
26
+ assert config.model_name == "google/gemma-4-31b-it"
tests/test_deployment.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ from src.executive_assistant.deployment import (
4
+ HFSpaceDeployConfig,
5
+ parse_hf_usernames,
6
+ render_space_readme,
7
+ stage_space_bundle,
8
+ )
9
+
10
+
11
+ def test_parse_hf_usernames_strips_at_signs() -> None:
12
+ usernames = parse_hf_usernames("@alice, bob , ,@carol")
13
+ assert usernames == ("alice", "bob", "carol")
14
+
15
+
16
+ def test_render_space_readme_includes_project_epsilon_placeholders() -> None:
17
+ config = HFSpaceDeployConfig(
18
+ repo_id="placeholder/project-epsilon-executive-assistant",
19
+ hf_usernames=("HF_USERNAME_1", "HF_USERNAME_2"),
20
+ )
21
+ rendered = render_space_readme(config)
22
+ assert "Project Epsilon" in rendered
23
+ assert "@HF_USERNAME_1" in rendered
24
+ assert "sdk: docker" in rendered
25
+ assert "OpenEnv Scaler x Meta x PyTorch Hack" in rendered
26
+
27
+
28
+ def test_stage_space_bundle_writes_hf_readme_and_checkpoint(tmp_path: Path) -> None:
29
+ config = HFSpaceDeployConfig(
30
+ repo_id="placeholder/project-epsilon-executive-assistant",
31
+ hf_usernames=("HF_USERNAME_1",),
32
+ )
33
+ checkpoint_path = stage_space_bundle(config, tmp_path)
34
+ assert checkpoint_path is not None
35
+ assert (tmp_path / "README.md").exists()
36
+ assert (tmp_path / "app.py").exists()
37
+ assert (tmp_path / "src" / "executive_assistant" / "env.py").exists()
38
+ assert (tmp_path / "artifacts" / "checkpoints" / config.checkpoint_name).exists()
39
+ assert not (tmp_path / ".env.app").exists()
40
+ assert not (tmp_path / ".env.training").exists()
41
+ assert not (tmp_path / ".env.hf.space.example").exists()
tests/test_env.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.executive_assistant.env import ExecutiveAssistantEnv
2
+ from src.executive_assistant.models import AssistantAction
3
+
4
+
5
+ def test_easy_env_reset_exposes_seeded_email() -> None:
6
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
7
+ observation = env.reset()
8
+ assert len(observation.unread_emails) == 1
9
+
10
+
11
+ def test_easy_env_can_add_todo() -> None:
12
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
13
+ env.reset()
14
+ observation, reward = env.step(
15
+ AssistantAction(
16
+ action_type="add_todo",
17
+ payload="Proposal due",
18
+ secondary_payload="2026-04-10",
19
+ )
20
+ )
21
+ assert "Proposal due" in observation.active_todos
22
+ assert reward.total_score >= 0.0
23
+
24
+
25
+ def test_read_email_populates_current_email() -> None:
26
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
27
+ observation = env.reset()
28
+ observation, _ = env.step(
29
+ AssistantAction(action_type="read_email", target_id=observation.unread_emails[0].id)
30
+ )
31
+ assert observation.current_email is not None
32
+ assert "proposal due" in observation.current_email.body.lower()
33
+
34
+
35
+ def test_search_files_populates_results() -> None:
36
+ env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
37
+ env.reset()
38
+ observation, _ = env.step(AssistantAction(action_type="search_files", payload="Q3 Architecture"))
39
+ assert observation.search_results
40
+ assert observation.search_results[0].filename == "Q3_Architecture_Report.txt"
tests/test_llm_service.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.executive_assistant.config import OpenRouterConfig
2
+ from src.executive_assistant.env import ExecutiveAssistantEnv
3
+ from src.executive_assistant.llm_service import OpenRouterLLMService
4
+
5
+
6
+ def test_openrouter_service_parses_policy_decision() -> None:
7
+ class FakeCompletions:
8
+ def create(self, **kwargs):
9
+ class Message:
10
+ content = (
11
+ '{"reasoning":"Read first","action":{"action_type":"read_email","target_id":1,'
12
+ '"payload":null,"secondary_payload":null}}'
13
+ )
14
+
15
+ class Choice:
16
+ message = Message()
17
+
18
+ class Response:
19
+ choices = [Choice()]
20
+
21
+ return Response()
22
+
23
+ class FakeClient:
24
+ class chat:
25
+ completions = FakeCompletions()
26
+
27
+ service = OpenRouterLLMService(
28
+ config=OpenRouterConfig(api_key="test-key"),
29
+ client=FakeClient(),
30
+ )
31
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
32
+ observation = env.reset()
33
+ decision = service.generate_policy_decision("easy_deadline_extraction", observation)
34
+ assert decision.action.action_type == "read_email"
35
+
36
+
37
+ def test_openrouter_service_repairs_invalid_json() -> None:
38
+ class FakeCompletions:
39
+ def __init__(self):
40
+ self.calls = 0
41
+
42
+ def create(self, **kwargs):
43
+ self.calls += 1
44
+
45
+ class Message:
46
+ content = "not valid json" if self.calls == 1 else (
47
+ '{"reasoning":"Recovered","action":{"action_type":"read_email","target_id":1,'
48
+ '"payload":null,"secondary_payload":null}}'
49
+ )
50
+
51
+ class Choice:
52
+ message = Message()
53
+
54
+ class Response:
55
+ choices = [Choice()]
56
+
57
+ return Response()
58
+
59
+ fake_completions = FakeCompletions()
60
+
61
+ class FakeClient:
62
+ class chat:
63
+ completions = fake_completions
64
+
65
+ service = OpenRouterLLMService(
66
+ config=OpenRouterConfig(api_key="test-key"),
67
+ client=FakeClient(),
68
+ )
69
+ env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
70
+ observation = env.reset()
71
+ decision = service.generate_policy_decision("easy_deadline_extraction", observation)
72
+ assert decision.action.action_type == "read_email"
tests/test_models.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ from src.executive_assistant.models import AssistantAction
2
+
3
+
4
+ def test_action_model_accepts_known_action_type() -> None:
5
+ action = AssistantAction(action_type="archive", target_id=1)
6
+ assert action.action_type == "archive"
tests/test_runner.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.executive_assistant.agent import BaselineAgent
2
+ from src.executive_assistant.runner import run_policy_suite
3
+
4
+
5
+ def test_run_policy_suite_returns_all_requested_tasks() -> None:
6
+ traces = run_policy_suite(
7
+ policy=BaselineAgent(),
8
+ task_names=["easy_deadline_extraction", "hard_rag_reply"],
9
+ )
10
+ assert set(traces) == {"easy_deadline_extraction", "hard_rag_reply"}
11
+
12
+
13
+ def test_episode_runner_exposes_explicit_workflow_steps() -> None:
14
+ from src.executive_assistant.runner import EpisodeRunner
15
+
16
+ runner = EpisodeRunner(policy=BaselineAgent(), max_steps=12)
17
+ env, observation = runner.initialize("easy_deadline_extraction")
18
+ _, next_observation, reward, record = runner.advance(
19
+ "easy_deadline_extraction",
20
+ env,
21
+ observation,
22
+ )
23
+ assert record.step_index == 1
24
+ assert next_observation.last_action_status == "email read"
25
+ assert reward.is_done is False
tests/test_training.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.executive_assistant.agent import BaselineAgent
2
+ from src.executive_assistant.training import QLearningPolicy, evaluate_q_policy, train_q_learning
3
+
4
+
5
+ def test_train_q_learning_returns_scores() -> None:
6
+ policy, scores = train_q_learning(episodes=24, epsilon=0.1, teacher=BaselineAgent())
7
+ evaluation = evaluate_q_policy(policy)
8
+ assert scores
9
+ assert set(evaluation) == {
10
+ "easy_deadline_extraction",
11
+ "medium_triage_and_negotiation",
12
+ "hard_rag_reply",
13
+ }
14
+ assert evaluation == {
15
+ "easy_deadline_extraction": 1.0,
16
+ "medium_triage_and_negotiation": 1.0,
17
+ "hard_rag_reply": 1.0,
18
+ }
19
+
20
+
21
+ def test_q_learning_policy_checkpoint_roundtrip(tmp_path) -> None:
22
+ policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
23
+ checkpoint = policy.save(tmp_path / "q_policy.json")
24
+ loaded = QLearningPolicy.load(checkpoint)
25
+ evaluation = evaluate_q_policy(loaded)
26
+ assert set(evaluation) == {
27
+ "easy_deadline_extraction",
28
+ "medium_triage_and_negotiation",
29
+ "hard_rag_reply",
30
+ }
31
+ assert evaluation == {
32
+ "easy_deadline_extraction": 1.0,
33
+ "medium_triage_and_negotiation": 1.0,
34
+ "hard_rag_reply": 1.0,
35
+ }
tests/test_workspace.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import threading
2
+
3
+ from src.executive_assistant.workspace import MockWorkspace
4
+
5
+
6
+ def test_workspace_seed_and_snapshot() -> None:
7
+ workspace = MockWorkspace()
8
+ workspace.seed(
9
+ emails=[
10
+ {
11
+ "sender": "a@example.com",
12
+ "recipient": "b@example.com",
13
+ "subject": "Test",
14
+ "body": "Hello",
15
+ "timestamp": "2026-04-04T00:00:00Z",
16
+ }
17
+ ],
18
+ files=[{"filename": "doc.txt", "content_text": "hello world"}],
19
+ )
20
+
21
+ snapshot = workspace.snapshot()
22
+ assert len(snapshot["emails"]) == 1
23
+ assert len(snapshot["files"]) == 1
24
+
25
+
26
+ def test_read_email_is_logged() -> None:
27
+ workspace = MockWorkspace()
28
+ workspace.seed(
29
+ emails=[
30
+ {
31
+ "sender": "a@example.com",
32
+ "recipient": "b@example.com",
33
+ "subject": "Test",
34
+ "body": "Hello",
35
+ "timestamp": "2026-04-04T00:00:00Z",
36
+ }
37
+ ],
38
+ files=[],
39
+ )
40
+
41
+ row = workspace.read_email(1)
42
+ assert row is not None
43
+ snapshot = workspace.snapshot()
44
+ assert snapshot["action_log"][0]["action_type"] == "read_email"
45
+
46
+
47
+ def test_workspace_can_be_used_from_worker_thread() -> None:
48
+ workspace = MockWorkspace()
49
+ workspace.seed(
50
+ emails=[
51
+ {
52
+ "sender": "a@example.com",
53
+ "recipient": "b@example.com",
54
+ "subject": "Thread Test",
55
+ "body": "Hello",
56
+ "timestamp": "2026-04-04T00:00:00Z",
57
+ }
58
+ ],
59
+ files=[],
60
+ )
61
+ errors: list[Exception] = []
62
+
63
+ def _read_email() -> None:
64
+ try:
65
+ row = workspace.read_email(1)
66
+ assert row is not None
67
+ except Exception as exc: # pragma: no cover - assertion path is the test failure
68
+ errors.append(exc)
69
+
70
+ worker = threading.Thread(target=_read_email)
71
+ worker.start()
72
+ worker.join()
73
+
74
+ assert errors == []
training_env.ipynb ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "id": "intro",
5
+ "cell_type": "markdown",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Autonomous Executive Assistant Sandbox\n",
9
+ "\n",
10
+ "Notebook for OpenRouter Gemma rollouts, checkpoint export, and RL training. Use the `scalerhack2-training` kernel so the environment matches the validated training pipeline."
11
+ ]
12
+ },
13
+ {
14
+ "id": "workflow",
15
+ "cell_type": "markdown",
16
+ "metadata": {},
17
+ "source": [
18
+ "## Workflow\n",
19
+ "\n",
20
+ "1. Load `.env.training` directly from the repository root.\n",
21
+ "2. Run the baseline suite to confirm the environment is stable.\n",
22
+ "3. Run an OpenRouter Gemma rollout if the API key is available.\n",
23
+ "4. Export traces for analysis or imitation-style warm starts.\n",
24
+ "5. Train the tabular RL agent and save a checkpoint.\n",
25
+ "6. Promote stable changes back into `src/` and keep tests green."
26
+ ]
27
+ },
28
+ {
29
+ "id": "imports",
30
+ "cell_type": "code",
31
+ "execution_count": null,
32
+ "metadata": {},
33
+ "outputs": [],
34
+ "source": [
35
+ "import json\n",
36
+ "import os\n",
37
+ "from pathlib import Path\n",
38
+ "\n",
39
+ "from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy\n",
40
+ "from src.executive_assistant.config import OpenRouterConfig, load_env_file\n",
41
+ "from src.executive_assistant.runner import EpisodeRunner, export_traces_jsonl, run_policy_suite\n",
42
+ "from src.executive_assistant.training import evaluate_q_policy, train_q_learning\n"
43
+ ]
44
+ },
45
+ {
46
+ "id": "config",
47
+ "cell_type": "code",
48
+ "execution_count": null,
49
+ "metadata": {},
50
+ "outputs": [],
51
+ "source": [
52
+ "ENV_FILE = Path('.env.training')\n",
53
+ "ENV_LOADED = load_env_file(ENV_FILE)\n",
54
+ "HAS_OPENROUTER_KEY = bool(os.environ.get('OPENROUTER_API_KEY'))\n",
55
+ "\n",
56
+ "TASK_NAME = 'hard_rag_reply'\n",
57
+ "POLICY_PROVIDER = 'openrouter' if HAS_OPENROUTER_KEY else 'baseline'\n",
58
+ "MODEL_NAME = os.environ.get('OPENROUTER_MODEL', 'google/gemma-4-31b-it')\n",
59
+ "MAX_STEPS = 12\n",
60
+ "TRACE_DIR = Path('artifacts/traces')\n",
61
+ "CHECKPOINT_DIR = Path('artifacts/checkpoints')\n",
62
+ "TRACE_DIR.mkdir(parents=True, exist_ok=True)\n",
63
+ "CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)\n",
64
+ "\n",
65
+ "{\n",
66
+ " 'env_file_found': ENV_LOADED,\n",
67
+ " 'has_openrouter_key': HAS_OPENROUTER_KEY,\n",
68
+ " 'policy_provider': POLICY_PROVIDER,\n",
69
+ " 'model_name': MODEL_NAME,\n",
70
+ "}\n"
71
+ ]
72
+ },
73
+ {
74
+ "id": "policy-builder",
75
+ "cell_type": "code",
76
+ "execution_count": null,
77
+ "metadata": {},
78
+ "outputs": [],
79
+ "source": [
80
+ "def build_policy(provider: str, model_name: str):\n",
81
+ " if provider == 'baseline':\n",
82
+ " return BaselineAgent()\n",
83
+ " if provider == 'openrouter':\n",
84
+ " config = OpenRouterConfig.from_env(ENV_FILE)\n",
85
+ " config = OpenRouterConfig(\n",
86
+ " api_key=config.api_key,\n",
87
+ " model_name=model_name,\n",
88
+ " base_url=config.base_url,\n",
89
+ " site_url=config.site_url,\n",
90
+ " app_name=config.app_name,\n",
91
+ " temperature=config.temperature,\n",
92
+ " max_tokens=config.max_tokens,\n",
93
+ " )\n",
94
+ " return OpenRouterPolicy(config=config)\n",
95
+ " raise ValueError(f'Unsupported provider: {provider}')\n"
96
+ ]
97
+ },
98
+ {
99
+ "id": "baseline-note",
100
+ "cell_type": "markdown",
101
+ "metadata": {},
102
+ "source": [
103
+ "## Baseline validation\n",
104
+ "\n",
105
+ "Run this first. If the baseline is not still solving the seeded tasks, stop and fix the environment before trusting any LLM or RL results."
106
+ ]
107
+ },
108
+ {
109
+ "id": "baseline-run",
110
+ "cell_type": "code",
111
+ "execution_count": null,
112
+ "metadata": {},
113
+ "outputs": [],
114
+ "source": [
115
+ "baseline_traces = run_policy_suite(\n",
116
+ " policy=BaselineAgent(),\n",
117
+ " task_names=[\n",
118
+ " 'easy_deadline_extraction',\n",
119
+ " 'medium_triage_and_negotiation',\n",
120
+ " 'hard_rag_reply',\n",
121
+ " ],\n",
122
+ " max_steps=MAX_STEPS,\n",
123
+ ")\n",
124
+ "\n",
125
+ "{name: {'completed': trace.completed, 'score': trace.final_score, 'steps': len(trace.steps)} for name, trace in baseline_traces.items()}\n"
126
+ ]
127
+ },
128
+ {
129
+ "id": "rollout-note",
130
+ "cell_type": "markdown",
131
+ "metadata": {},
132
+ "source": [
133
+ "## Policy rollout\n",
134
+ "\n",
135
+ "This uses OpenRouter Gemma automatically when `.env.training` provides the key. Otherwise it falls back to the baseline policy."
136
+ ]
137
+ },
138
+ {
139
+ "id": "rollout-run",
140
+ "cell_type": "code",
141
+ "execution_count": null,
142
+ "metadata": {},
143
+ "outputs": [],
144
+ "source": [
145
+ "policy = build_policy(POLICY_PROVIDER, MODEL_NAME)\n",
146
+ "runner = EpisodeRunner(policy=policy, max_steps=MAX_STEPS)\n",
147
+ "trace = runner.run(TASK_NAME)\n",
148
+ "\n",
149
+ "print(json.dumps(trace.to_dict(), indent=2))\n"
150
+ ]
151
+ },
152
+ {
153
+ "id": "rollout-snapshot",
154
+ "cell_type": "code",
155
+ "execution_count": null,
156
+ "metadata": {},
157
+ "outputs": [],
158
+ "source": [
159
+ "trace.steps[-1].snapshot\n"
160
+ ]
161
+ },
162
+ {
163
+ "id": "export-note",
164
+ "cell_type": "markdown",
165
+ "metadata": {},
166
+ "source": [
167
+ "## Export traces\n",
168
+ "\n",
169
+ "These JSONL traces are the main interface between rollout collection and downstream training or regression analysis."
170
+ ]
171
+ },
172
+ {
173
+ "id": "export-run",
174
+ "cell_type": "code",
175
+ "execution_count": null,
176
+ "metadata": {},
177
+ "outputs": [],
178
+ "source": [
179
+ "suite_traces = run_policy_suite(\n",
180
+ " policy=build_policy(POLICY_PROVIDER, MODEL_NAME),\n",
181
+ " task_names=[TASK_NAME],\n",
182
+ " max_steps=MAX_STEPS,\n",
183
+ ")\n",
184
+ "\n",
185
+ "output_path = export_traces_jsonl(\n",
186
+ " list(suite_traces.values()),\n",
187
+ " TRACE_DIR / f'{POLICY_PROVIDER}_{TASK_NAME}_traces.jsonl',\n",
188
+ ")\n",
189
+ "\n",
190
+ "print(output_path)\n"
191
+ ]
192
+ },
193
+ {
194
+ "id": "train-note",
195
+ "cell_type": "markdown",
196
+ "metadata": {},
197
+ "source": [
198
+ "## RL training\n",
199
+ "\n",
200
+ "This trains the tabular Q-learning policy with a baseline-teacher warm start, saves a checkpoint, and evaluates the trained policy on all seeded tasks."
201
+ ]
202
+ },
203
+ {
204
+ "id": "train-run",
205
+ "cell_type": "code",
206
+ "execution_count": null,
207
+ "metadata": {},
208
+ "outputs": [],
209
+ "source": [
210
+ "q_policy, training_scores = train_q_learning(\n",
211
+ " episodes=300,\n",
212
+ " epsilon=0.15,\n",
213
+ " teacher=BaselineAgent(),\n",
214
+ ")\n",
215
+ "checkpoint_path = q_policy.save(CHECKPOINT_DIR / 'q_policy_notebook.json')\n",
216
+ "evaluation = evaluate_q_policy(q_policy)\n",
217
+ "\n",
218
+ "{\n",
219
+ " 'checkpoint': str(checkpoint_path),\n",
220
+ " 'training_scores': training_scores,\n",
221
+ " 'evaluation': evaluation,\n",
222
+ "}\n"
223
+ ]
224
+ },
225
+ {
226
+ "id": "env-note",
227
+ "cell_type": "markdown",
228
+ "metadata": {},
229
+ "source": [
230
+ "## Environment note\n",
231
+ "\n",
232
+ "The notebook loads `.env.training` directly from the repo root. That keeps CLI runs, notebook runs, and Jupyter-launched kernels aligned without requiring manual exports in the shell."
233
+ ]
234
+ }
235
+ ],
236
+ "metadata": {
237
+ "kernelspec": {
238
+ "display_name": "Python (scalerhack2-training)",
239
+ "language": "python",
240
+ "name": "scalerhack2-training"
241
+ },
242
+ "language_info": {
243
+ "codemirror_mode": {
244
+ "name": "ipython",
245
+ "version": 3
246
+ },
247
+ "file_extension": ".py",
248
+ "mimetype": "text/x-python",
249
+ "name": "python",
250
+ "nbconvert_exporter": "python",
251
+ "pygments_lexer": "ipython3",
252
+ "version": "3.14"
253
+ }
254
+ },
255
+ "nbformat": 4,
256
+ "nbformat_minor": 5
257
+ }