Spaces:

Flickinshots
/

EmailMaestro

Running

App Files Files Community

Flickinshots commited on 14 days ago

Commit

38c9982

verified ·

1 Parent(s): b1ec107

Deploy Project Epsilon Space bundle

Browse files

Files changed (47) hide show

.env.app.example +3 -0
.env.training.example +6 -0
.gitignore +8 -0
AGENTS.md +43 -0
Dockerfile +14 -0
PRD.md +154 -0
README.md +50 -10
app.py +915 -0
docs/HF_SPACE_README.md +53 -0
openenv.yaml +10 -0
pytest.ini +2 -0
requirements.app.txt +1 -0
requirements.training.txt +6 -0
requirements.txt +5 -0
run.py +29 -0
scripts/deploy_hf_space.py +178 -0
scripts/deploy_hf_space.sh +25 -0
scripts/evaluate_policies.py +74 -0
scripts/run_policy_episode.py +51 -0
scripts/setup_app_env.sh +8 -0
scripts/setup_training_env.sh +9 -0
scripts/train_rl_agent.py +47 -0
src/__init__.py +2 -0
src/executive_assistant/__init__.py +2 -0
src/executive_assistant/agent.py +363 -0
src/executive_assistant/config.py +78 -0
src/executive_assistant/deployment.py +201 -0
src/executive_assistant/env.py +123 -0
src/executive_assistant/graders.py +172 -0
src/executive_assistant/llm_service.py +76 -0
src/executive_assistant/models.py +63 -0
src/executive_assistant/prompts.py +83 -0
src/executive_assistant/runner.py +128 -0
src/executive_assistant/seeds.py +82 -0
src/executive_assistant/training.py +341 -0
src/executive_assistant/workspace.py +193 -0
tests/test_agent.py +158 -0
tests/test_app.py +42 -0
tests/test_config.py +26 -0
tests/test_deployment.py +41 -0
tests/test_env.py +40 -0
tests/test_llm_service.py +72 -0
tests/test_models.py +6 -0
tests/test_runner.py +25 -0
tests/test_training.py +35 -0
tests/test_workspace.py +74 -0
training_env.ipynb +257 -0

.env.app.example ADDED Viewed

	@@ -0,0 +1,3 @@

+OPENROUTER_API_KEY=
+OPENROUTER_SITE_URL=http://localhost:7860
+OPENROUTER_APP_NAME=Autonomous Executive Assistant Sandbox

.env.training.example ADDED Viewed

	@@ -0,0 +1,6 @@

+OPENROUTER_API_KEY=
+OPENROUTER_MODEL=google/gemma-4-31b-it
+OPENROUTER_SITE_URL=http://localhost:8888
+OPENROUTER_APP_NAME=Autonomous Executive Assistant Sandbox Training
+OPENROUTER_TEMPERATURE=0.1
+OPENROUTER_MAX_TOKENS=600

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+.venv-app/
+.venv-training/
+artifacts/
+.pytest_cache/
+__pycache__/
+.env
+.env.app
+.env.training

AGENTS.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# Repository Guidelines
+## Project Structure & Module Organization
+Core application code lives in `src/executive_assistant/`. Keep environment logic in `env.py`, SQLite workspace behavior in `workspace.py`, reward logic in `graders.py`, typed contracts in `models.py`, provider configuration in `config.py`, prompt construction in `prompts.py`, OpenRouter calls in `llm_service.py`, shared episode execution in `runner.py`, policies in `agent.py`, and RL logic in `training.py`. Tests live in `tests/` and should mirror the module they validate. Operational scripts live in `scripts/`. Use `training_env.ipynb` with the `scalerhack2-training` kernel for experiments and rollout export only; move stable logic back into `src/`. Top-level runtime files include `app.py`, `openenv.yaml`, `requirements*.txt`, and `PRD.md`.
+## Build, Test, and Development Commands
+Set up the separate app and training environments with:
+```bash
+bash scripts/setup_app_env.sh
+bash scripts/setup_training_env.sh
+```
+Run the test suite with `.venv-training/bin/pytest -q`. Start the local Gradio entrypoint with `.venv-app/bin/python app.py`. Evaluate the deterministic baseline across all seeded tasks with `.venv-training/bin/python scripts/evaluate_policies.py --provider baseline`. Run one full episode trace with `.venv-training/bin/python scripts/run_policy_episode.py --task hard_rag_reply --provider baseline`. Train the tabular RL policy with `.venv-training/bin/python scripts/train_rl_agent.py --episodes 300`. To exercise the Gemma model through OpenRouter, set `OPENROUTER_API_KEY` first, then switch `--provider openrouter` or set `POLICY_PROVIDER = "openrouter"` in the notebook.
+```bash
+.venv-training/bin/python scripts/evaluate_policies.py --provider baseline
+```
+## Coding Style & Naming Conventions
+Target Python 3.11+ and use 4-space indentation. Prefer explicit types and small, single-purpose functions. Follow existing naming patterns: `snake_case` for functions, variables, and modules; `PascalCase` for Pydantic models and environment classes; uppercase for constants such as `TASK_SEEDS`. Keep comments brief and only where behavior is not obvious. There is no formatter configured yet, so match the existing style and keep imports tidy.
+## Testing Guidelines
+Tests use `pytest`. Add or update tests with every behavioral change, especially for environment transitions, reward shaping, seeded task completion, runner traces, OpenRouter service behavior, and RL training smoke paths. Name test files `test_*.py` and test functions `test_*`. Prefer deterministic assertions against observations, snapshots, action logs, checkpoints, and scores over loose text checks. If you change notebook-driven workflows, validate the underlying module or script rather than testing notebook JSON behavior only.
+## Commit & Pull Request Guidelines
+Current history uses short, imperative commit subjects such as `Initial RL agent sandbox scaffold` and `Add PRD progress checkpoint note`. Continue that style: concise subject line, capitalized first word, no trailing period. Pull requests should include a brief summary, note any changed scenarios or rewards, list validation steps run (`pytest -q`, smoke tests), and attach screenshots only when UI behavior in `app.py` changes.
+## Agent-Specific Notes
+Preserve determinism in the environment, graders, and baseline policy. Live API access belongs in policy layers such as `OpenRouterPolicy`, not in the workspace or reward path. Keep `EpisodeRunner` as the shared execution path for scripts, tests, Gradio, and notebook workflows. Treat OpenRouter calls as optional runtime behavior: tests and RL smoke runs must stay runnable without network access. If notebook experiments uncover a useful change, codify it in `src/` and cover it with tests before treating it as part of the baseline.
+## Agent Workflow Loop
+All execution surfaces in this repository should follow the same loop:
+1. Load environment state
+2. Generate observation
+3. Send to LLM or policy
+4. Receive structured action
+5. Execute action in workspace
+6. Update state
+7. Repeat until task complete
+In code, keep this flow inside `EpisodeRunner`. Use `initialize()` for steps 1-2, `choose_action()` for steps 3-4, and `advance()` plus `env.step()` for steps 5-6. Do not duplicate bespoke episode loops in notebooks, scripts, or UI handlers.

Dockerfile ADDED Viewed

	@@ -0,0 +1,14 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+COPY requirements.app.txt .
+RUN pip install --no-cache-dir -r requirements.app.txt
+COPY . .
+EXPOSE 7860
+ENV GRADIO_SERVER_NAME=0.0.0.0
+CMD ["python", "app.py"]

PRD.md ADDED Viewed

	@@ -0,0 +1,154 @@

+# Product Requirements Document (PRD): Autonomous Executive Assistant Sandbox
+**Target Deployment:** Hugging Face Spaces (Gradio UI + OpenEnv Container)
+**Primary Dev Environment:** Kaggle / Jupyter Notebooks (`training_env.ipynb`)
+---
+## Progress Note
+Status as of 2026-04-08:
+- The deterministic SQLite-backed workspace is implemented with action logging, seeded scenarios, snapshots, and richer step semantics.
+- The OpenEnv contract is represented in typed Pydantic models for observations, actions, rewards, and policy decisions.
+- Deterministic graders are implemented for all three seeded tasks with dense reward shaping and terminal success checks.
+- A shared `EpisodeRunner` now owns the agent workflow loop across scripts, tests, the notebook, and Gradio.
+- A deterministic baseline policy is implemented and solves all three seeded tasks end to end.
+- An OpenRouter-backed `google/gemma-4-31b-it` policy path is integrated, prompt-hardened, and validated on the hard task.
+- Separate app and training environments are in place, including a registered `scalerhack2-training` Jupyter kernel.
+- The training notebook loads `.env.training`, exports traces, runs RL training, and saves checkpoints.
+- A tabular Q-learning policy exists as a seeded-task RL prototype and can be trained, evaluated, and checkpointed.
+- The current Gradio app can reset scenarios and run full episodes for baseline and OpenRouter policies.
+Resume from here:
+- Make the trained RL checkpoint a first-class runtime policy in the app and scripts.
+- Refine the Gradio UI from one-shot episode execution into a stepwise or streaming judge-facing experience.
+- Ensure the app, notebook, and scripts can all use the same trained RL artifact without drift.
+- Expand notebook analysis cells and runtime metrics for stronger model-vs-baseline-vs-RL comparisons.
+- Keep the current tabular RL policy as a prototype while leaving room for a richer learned policy after hackathon delivery.
+---
+## 1. Executive Summary
+We are building a deterministic, isolated OpenEnv simulation of a corporate or academic workflow. Instead of wrapping a brittle, live API like Gmail (which causes rate limits and non-deterministic grading), we will engineer an **in-memory SQLite Mock Mail Server & Local File System**.
+The AI agent will act as an Autonomous Executive Assistant. It must navigate a chaotic mock inbox, extract deadlines to a mock task manager, negotiate meeting times, and perform Retrieval-Augmented Generation (RAG) over a mock file system to draft intelligent replies.
+This environment proves the agent's ability to act as a *router* and a *tool-user*, moving beyond text generation into full workflow automation.
+---
+## 2. Core Architecture & Stack
+* **State Management:** In-memory SQLite (`sqlite3`) simulating a mail server, calendar, and file system.
+* **Typing & Validation:** `pydantic` (Strictly defining Observations, Actions, and Rewards per OpenEnv spec).
+* **Development & Debugging:** Jupyter Notebooks plus scriptable runners. The state machine, model prompts, rollout export, and RL smoke training are exercised from `training_env.ipynb` and mirrored by CLI scripts.
+* **Model Runtime:** OpenRouter using `google/gemma-4-31b-it` for live policy inference, with prompt/schema hardening and response repair.
+* **RL Prototype:** Tabular Q-learning over a finite action template catalog, with teacher warm-start from the deterministic baseline and JSON checkpoint persistence.
+* **Deployment & Visualization:** Gradio (to visualize the inbox state for judges) packaged within a Docker container on Hugging Face Spaces.
+---
+## 3. Step-by-Step Implementation Plan
+### Phase 1: The Mock Server Setup (Notebook Environment)
+**Goal:** Build the deterministic world the agent will live in. Do this entirely in the first few cells of your Kaggle notebook so you can instantly query and reset the state.
+1. **Database Initialization:** Create an in-memory SQLite database (`sqlite3.connect(':memory:')`).
+2. **Table Creation:**
+   * `Emails` (id, sender, recipient, subject, body, timestamp, is_read, is_archived)
+   * `Todos` (id, task_name, deadline_date, context)
+   * `Files` (id, filename, content_text) - *This acts as the local knowledge base.*
+3. **The Wrapper Class (`MockWorkspace`):** Write Python methods to interact with this DB safely.
+   * `get_unread_emails()`
+   * `send_reply(email_id, text)`
+   * `create_todo(task, date)`
+   * `search_documents(query)`
+### Phase 2: OpenEnv Specifications (Pydantic Models)
+**Goal:** Define the strict APIs the agent must use. This is the core of the hackathon requirement.
+**Observation Space:**
+```python
+class WorkspaceObservation(BaseModel):
+    current_time: str
+    unread_emails: List[Dict[str, str]] # ID, Sender, Subject snippet
+    active_todos: List[str]
+    last_action_status: str # e.g., "Email successfully sent to Manager"
+```
+**Action Space:**
+```python
+class AssistantAction(BaseModel):
+    action_type: Literal["read_email", "reply", "forward", "add_todo", "archive", "search_files"]
+    target_id: Optional[str] = None # email_id or file_id
+    payload: Optional[str] = None # The body of the reply, or the search query
+    secondary_payload: Optional[str] = None # Date for todos, or recipient for forwards
+```
+**Reward Space:**
+```python
+class TaskReward(BaseModel):
+    step_reward: float
+    total_score: float
+    is_done: bool
+    reasoning: str
+```
+### Phase 3: Task Definitions & Deterministic Graders
+Implement the three required difficulty tiers. The grader simply runs SQL queries against your mock database to verify the agent's actions.
+#### Task 1: Easy (Syllabus & Deadline Extraction)
+* **Initial State:** DB injected with an email from `prof.smith@university.edu` containing 3 specific project deadlines.
+* **Agent Goal:** Read email, create 3 corresponding tasks in the `Todos` table, and archive the email.
+* **Grader Logic:** `SELECT COUNT(*) FROM Todos WHERE deadline_date IS NOT NULL;` -> If 3, return `+1.0`.
+#### Task 2: Medium (Triage & Meeting Negotiation)
+* **Initial State:** DB injected with 5 emails: 3 newsletters, 1 urgent client complaint, 1 team meeting reschedule request.
+* **Agent Goal:** Archive newsletters, forward the client complaint to `manager@company.com`, and reply to the reschedule request proposing a time.
+* **Grader Logic:** Check if newsletters are marked `is_archived=True` (+0.3). Check if complaint is in the DB as sent to manager (+0.4). Check if reply contains a valid time string (+0.3).
+#### Task 3: Hard (Autonomous RAG & Drafting)
+* **Initial State:** DB injected with an email from a VIP stakeholder asking for specific metrics from the "Q3 Architecture Report".
+* **Agent Goal:** Use `action_type: "search_files"` with query "Q3 Architecture", read the file contents, and use `action_type: "reply"` synthesizing the exact metrics from the file into a professional response.
+* **Grader Logic:** Check if `search_files` was called (+0.3). Use regex to verify the specific metric string from the mock file exists in the sent reply body (+0.7).
+### Phase 4: Baseline Agent Testing (Notebook Environment)
+**Goal:** Prove the environment works using both a deterministic policy and a live model-backed policy.
+1. Use the deterministic `BaselineAgent` to verify seeded tasks and grader behavior.
+2. Use a standard `while not done:` loop, now centralized in `EpisodeRunner`.
+3. Pass the `WorkspaceObservation` to the live model policy through OpenRouter using strict JSON outputs.
+4. Pass the model action into the environment's `step()` function.
+5. Print and export the interaction loop directly in the notebook to debug prompt formatting, policy behavior, and reward shaping.
+#### Agent Workflow Loop
+1. Load environment state
+2. Generate observation
+3. Send to LLM
+4. Receive structured action
+5. Execute action in workspace
+6. Update state
+7. Repeat until task complete
+Implementation note: this loop is now represented directly in the shared `EpisodeRunner` so the notebook, scripts, tests, and Gradio app all execute the same control flow.
+### Phase 5: Hugging Face Spaces & Gradio Deployment
+**Goal:** Package the OpenEnv logic and build a visual interface so judges can physically see the agent working, including deterministic, model-backed, and learned-policy runs.
+1. **The Gradio Wrapper (`app.py`):**
+   * Build a Gradio UI that exposes selectable policies (`baseline`, `openrouter`, and trained `rl`) and visually represents the `Emails`, `Todos`, `Files`, and action history tables.
+   * As the OpenEnv `step()` function runs, update the Gradio state step by step so judges can watch the inbox drain, the to-do list populate, and the replies send in real time.
+   * Ensure the app can load the same trained RL checkpoint artifact produced by the notebook and CLI training scripts.
+2. **Containerization (`Dockerfile`):**
+   ```dockerfile
+   FROM python:3.11-slim
+   WORKDIR /app
+   COPY requirements.app.txt .
+   RUN pip install --no-cache-dir -r requirements.app.txt
+   COPY . .
+   # OpenEnv requires specific metadata handling, Gradio runs on 7860
+   EXPOSE 7860
+   ENV GRADIO_SERVER_NAME="0.0.0.0"
+   CMD ["python", "app.py"]
+   ```
+3. **OpenEnv Spec Compliance:** Ensure your `openenv.yaml` is correctly mapped to your Pydantic classes at the root of the repository.
+4. **Push to HF:** Commit the repo to a Hugging Face Space, tag it with `openenv`, and ensure the policy runners and training instructions are easily executable via the README instructions.

README.md CHANGED Viewed

@@ -1,14 +1,54 @@
 ---
-title: EmailMaestro
-emoji: 🔥
-colorFrom: red
-colorTo: purple
-sdk: gradio
-sdk_version: 6.11.0
-app_file: app.py
 pinned: false
-license: mit
-short_description: ' Deterministic RL-style workspace for an exec assist agent'
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: EmailMaestro | Executive Assistant Sandbox
+emoji: "🧭"
+colorFrom: yellow
+colorTo: gray
+sdk: docker
+app_port: 7860
 pinned: false
+short_description: OpenEnv executive assistant sandbox demo for judges.
 ---
+# Project Epsilon
+Discrete Hugging Face Space for the **Autonomous Executive Assistant Sandbox**, built for the **OpenEnv Scaler x Meta x PyTorch Hack**.
+## Team
+- Team name: `Project Epsilon`
+- Hugging Face usernames: `@Flickinshots`, `@HF_USERNAME_2`, `@HF_USERNAME_3`
+- Space repo: `Flickinshots/EmailMaestro`
+Replace the placeholder usernames above once the final team accounts are ready.
+## What This Space Shows
+- A deterministic OpenEnv-style executive assistant environment backed by an isolated SQLite workspace
+- A judge-friendly Gradio interface that replays the shared `EpisodeRunner` loop step by step
+- Side-by-side policy execution for `baseline`, `rl`, and optional `openrouter`
+- Visible inbox, todo, file-search, and action-log state so evaluators can inspect each mutation
+## Hack Context
+OpenEnv was announced by Hugging Face and Meta as an open source framework for building agent environments with typed observations, actions, and rewards. The Scaler dashboard for this hack lists the submission round as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space packages our environment to match that workflow: deterministic tasks, structured actions, visible state transitions, and reproducible judge demos.
+## Runtime Notes
+- SDK: `docker`
+- App port: `7860`
+- Entry point: `python app.py`
+- Optional secret: `OPENROUTER_API_KEY`
+- A trained RL checkpoint is bundled in `artifacts/checkpoints/` so the `rl` policy is available immediately in the demo.
+## Judge Flow
+1. Open the Space and choose one of the seeded scenarios.
+2. Run the deterministic `baseline` policy for a guaranteed reference trace.
+3. Switch to `rl` to replay the bundled learned checkpoint.
+4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed path.
+## References
+- Hack dashboard: https://www.scaler.com/openenv-hackathon
+- OpenEnv launch: https://huggingface.co/blog/openenv
+- Space URL: https://huggingface.co/spaces/Flickinshots/EmailMaestro

app.py ADDED Viewed

	@@ -0,0 +1,915 @@

+from __future__ import annotations
+import json
+import os
+import time
+import uuid
+from html import escape
+import gradio as gr
+from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
+from src.executive_assistant.config import AppRuntimeConfig, OpenRouterConfig, load_env_file
+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.runner import EpisodeRunner
+from src.executive_assistant.training import QLearningPolicy, default_checkpoint_path
+load_env_file(AppRuntimeConfig().env_file)
+APP_RUNTIME = AppRuntimeConfig()
+EMAIL_COLUMNS = ["id", "sender", "recipient", "subject", "body", "timestamp", "is_read", "is_archived"]
+TODO_COLUMNS = ["id", "task_name", "deadline_date", "context"]
+FILE_COLUMNS = ["id", "filename", "content_text"]
+ACTION_LOG_COLUMNS = ["id", "action_type", "target_id", "payload", "secondary_payload", "status"]
+TRACE_COLUMNS = ["step", "reasoning", "action_type", "status", "score", "done"]
+APP_CSS = """
+:root {
+  color-scheme: dark;
+  --ea-bg: #120f0c;
+  --ea-bg-soft: #1a1511;
+  --ea-panel: rgba(28, 22, 18, 0.88);
+  --ea-panel-strong: #241c17;
+  --ea-ink: #f5ede2;
+  --ea-muted: #b7a796;
+  --ea-border: rgba(236, 214, 188, 0.12);
+  --ea-border-strong: rgba(236, 214, 188, 0.24);
+  --ea-accent: #c97943;
+  --ea-accent-deep: #e1a16f;
+  --ea-highlight: #3a2a1f;
+  --ea-success: #72c79a;
+  --ea-danger: #ef8d76;
+  --ea-shadow: 0 24px 70px rgba(0, 0, 0, 0.34);
+}
+.gradio-container {
+  min-height: 100vh;
+  background:
+    radial-gradient(circle at top left, rgba(124, 73, 39, 0.22), transparent 24%),
+    radial-gradient(circle at 85% 10%, rgba(201, 121, 67, 0.16), transparent 22%),
+    linear-gradient(180deg, #17120f 0%, #0f0c0a 100%);
+  color: var(--ea-ink);
+  font-family: "Avenir Next", "Segoe UI", sans-serif;
+}
+.gradio-container .prose,
+.gradio-container .gr-markdown,
+.gradio-container .gr-button,
+.gradio-container .gr-input,
+.gradio-container .gr-box,
+.gradio-container .gr-form,
+.gradio-container .gr-panel {
+  color: var(--ea-ink);
+}
+.app-shell {
+  max-width: 1480px;
+  margin: 0 auto;
+  padding: 18px 18px 28px;
+}
+.hero {
+  background:
+    linear-gradient(140deg, rgba(33, 25, 20, 0.96), rgba(21, 17, 14, 0.96)),
+    linear-gradient(90deg, rgba(201, 121, 67, 0.12), transparent);
+  border: 1px solid var(--ea-border);
+  border-radius: 32px;
+  padding: 34px;
+  box-shadow: var(--ea-shadow);
+  margin-bottom: 20px;
+  position: relative;
+  overflow: hidden;
+}
+.hero::after {
+  content: "";
+  position: absolute;
+  inset: auto -10% -44% 34%;
+  height: 220px;
+  background: radial-gradient(circle, rgba(201, 121, 67, 0.18), transparent 62%);
+  pointer-events: none;
+}
+.hero-grid {
+  display: grid;
+  grid-template-columns: minmax(0, 1.7fr) minmax(280px, 0.95fr);
+  gap: 22px;
+  align-items: end;
+}
+.hero-kicker {
+  display: inline-flex;
+  align-items: center;
+  gap: 10px;
+  padding: 7px 12px;
+  border-radius: 999px;
+  background: rgba(201, 121, 67, 0.10);
+  border: 1px solid rgba(201, 121, 67, 0.18);
+  color: var(--ea-accent-deep);
+  font-size: 0.76rem;
+  letter-spacing: 0.14em;
+  text-transform: uppercase;
+  margin-bottom: 16px;
+}
+.hero-copy {
+  position: relative;
+  z-index: 1;
+}
+.hero h1 {
+  margin: 0 0 12px;
+  font-family: "Baskerville", "Times New Roman", serif;
+  font-size: clamp(2.6rem, 5vw, 4.5rem);
+  line-height: 1.05;
+  letter-spacing: -0.05em;
+  max-width: 10ch;
+}
+.hero p {
+  margin: 0;
+  max-width: 760px;
+  color: var(--ea-muted);
+  font-size: 1.02rem;
+  line-height: 1.65;
+}
+.hero-strip {
+  display: flex;
+  gap: 12px;
+  flex-wrap: wrap;
+  margin-top: 22px;
+}
+.hero-pill {
+  background: rgba(255, 255, 255, 0.05);
+  color: var(--ea-ink);
+  border: 1px solid rgba(236, 214, 188, 0.08);
+  border-radius: 999px;
+  padding: 10px 14px;
+  font-size: 0.84rem;
+  backdrop-filter: blur(12px);
+}
+.hero-aside {
+  position: relative;
+  z-index: 1;
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid rgba(236, 214, 188, 0.08);
+  border-radius: 24px;
+  padding: 20px;
+  backdrop-filter: blur(12px);
+}
+.hero-aside-label {
+  margin: 0 0 10px;
+  color: var(--ea-accent-deep);
+  font-size: 0.8rem;
+  letter-spacing: 0.14em;
+  text-transform: uppercase;
+}
+.hero-aside-value {
+  margin: 0 0 14px;
+  font-family: "Baskerville", "Times New Roman", serif;
+  font-size: 1.6rem;
+  line-height: 1.05;
+}
+.hero-aside-copy {
+  margin: 0;
+  color: var(--ea-muted);
+  line-height: 1.6;
+}
+.panel-card,
+.status-card {
+  background: var(--ea-panel);
+  border: 1px solid var(--ea-border);
+  border-radius: 24px;
+  box-shadow: var(--ea-shadow);
+  backdrop-filter: blur(10px);
+}
+.panel-card {
+  padding: 18px;
+}
+.status-card {
+  padding: 22px 22px 18px;
+}
+.panel-title {
+  margin: 0 0 6px;
+  font-family: "Baskerville", "Times New Roman", serif;
+  font-size: 1.5rem;
+  letter-spacing: -0.03em;
+}
+.panel-copy {
+  margin: 0 0 16px;
+  color: var(--ea-muted);
+  line-height: 1.55;
+}
+.surface-card {
+  background: rgba(23, 18, 14, 0.84);
+  border: 1px solid var(--ea-border);
+  border-radius: 24px;
+  box-shadow: var(--ea-shadow);
+  overflow: hidden;
+}
+.surface-card .gr-tab-nav {
+  background: rgba(255, 255, 255, 0.03);
+  padding: 10px 10px 0;
+  border-bottom: 1px solid var(--ea-border);
+}
+.surface-card .gr-tab-nav button {
+  border-radius: 16px 16px 0 0;
+  border: 1px solid transparent;
+  color: var(--ea-muted);
+  font-weight: 600;
+}
+.surface-card .gr-tab-nav button.selected {
+  background: var(--ea-panel-strong);
+  color: var(--ea-ink);
+  border-color: var(--ea-border);
+}
+.surface-card .gr-tabitem {
+  padding: 18px;
+}
+.status-topline {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: 14px;
+  margin-bottom: 12px;
+}
+.status-title {
+  font-family: "Baskerville", "Times New Roman", serif;
+  font-size: 1.7rem;
+  letter-spacing: -0.04em;
+}
+.status-badge {
+  display: inline-flex;
+  align-items: center;
+  border-radius: 999px;
+  padding: 8px 13px;
+  font-size: 0.78rem;
+  text-transform: uppercase;
+  letter-spacing: 0.12em;
+  border: 1px solid transparent;
+  background: rgba(201, 121, 67, 0.10);
+}
+.status-badge.running,
+.status-badge.initialized {
+  border-color: rgba(180, 95, 45, 0.18);
+  color: var(--ea-accent-deep);
+}
+.status-badge.completed.success {
+  background: rgba(45, 122, 88, 0.10);
+  border-color: rgba(45, 122, 88, 0.18);
+  color: var(--ea-success);
+}
+.status-badge.completed.failure {
+  background: rgba(178, 76, 56, 0.10);
+  border-color: rgba(178, 76, 56, 0.16);
+  color: var(--ea-danger);
+}
+.metric-grid {
+  display: grid;
+  grid-template-columns: repeat(4, minmax(0, 1fr));
+  gap: 12px;
+  margin-bottom: 12px;
+}
+.metric {
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid rgba(236, 214, 188, 0.08);
+  border-radius: 18px;
+  padding: 14px;
+}
+.metric-label {
+  color: var(--ea-muted);
+  font-size: 0.72rem;
+  text-transform: uppercase;
+  letter-spacing: 0.11em;
+  margin-bottom: 7px;
+}
+.metric-value {
+  font-size: 1rem;
+  line-height: 1.25;
+}
+.status-reason {
+  background: rgba(201, 121, 67, 0.08);
+  border: 1px solid rgba(236, 214, 188, 0.08);
+  border-radius: 18px;
+  padding: 14px 15px;
+  color: var(--ea-muted);
+  line-height: 1.55;
+}
+.scenario-brief {
+  background: linear-gradient(180deg, rgba(32, 25, 20, 0.92), rgba(22, 18, 14, 0.94));
+  border: 1px solid var(--ea-border);
+  border-radius: 24px;
+  padding: 22px;
+  color: var(--ea-ink);
+  box-shadow: var(--ea-shadow);
+}
+.scenario-brief h3 {
+  margin: 0 0 10px;
+  font-family: "Baskerville", "Times New Roman", serif;
+  font-size: 1.5rem;
+  letter-spacing: -0.03em;
+}
+.scenario-brief p {
+  margin: 0 0 14px;
+  color: var(--ea-muted);
+  line-height: 1.6;
+}
+.scenario-brief ul {
+  margin: 0;
+  padding-left: 18px;
+  color: var(--ea-ink);
+}
+.scenario-brief li {
+  margin-bottom: 8px;
+  line-height: 1.5;
+}
+.panel-card .gr-form,
+.panel-card .gr-box,
+.panel-card .gr-group {
+  border: 0;
+  background: transparent;
+  box-shadow: none;
+}
+.panel-card .gr-button,
+.gradio-container .gr-button {
+  min-height: 48px;
+  border-radius: 999px;
+  font-weight: 700;
+  letter-spacing: 0.02em;
+}
+.gradio-container button.primary {
+  background: linear-gradient(135deg, var(--ea-accent) 0%, var(--ea-accent-deep) 100%);
+  border: 0;
+  box-shadow: 0 14px 30px rgba(138, 62, 23, 0.18);
+}
+.gradio-container button.secondary {
+  background: rgba(255, 255, 255, 0.05);
+  border: 1px solid var(--ea-border-strong);
+  color: var(--ea-ink);
+}
+.gradio-container label,
+.gradio-container .gr-block-label,
+.gradio-container .gr-form > label {
+  color: var(--ea-muted);
+  font-size: 0.76rem;
+  text-transform: uppercase;
+  letter-spacing: 0.12em;
+}
+.gradio-container input,
+.gradio-container textarea,
+.gradio-container select {
+  background: rgba(255, 255, 255, 0.05) !important;
+  border: 1px solid rgba(236, 214, 188, 0.12) !important;
+  border-radius: 16px !important;
+  color: var(--ea-ink) !important;
+}
+.gradio-container .gr-accordion,
+.gradio-container .gr-panel,
+.gradio-container .gr-box,
+.gradio-container .block {
+  border-color: var(--ea-border) !important;
+}
+.workspace-grid .gr-dataframe,
+.workspace-grid .gr-code,
+.workspace-grid .gr-box,
+.workspace-grid .gr-panel {
+  border-radius: 20px !important;
+  overflow: hidden;
+}
+.workspace-grid .gr-code,
+.workspace-grid .gr-dataframe {
+  box-shadow: inset 0 0 0 1px rgba(58, 43, 28, 0.06);
+}
+.workspace-grid table {
+  font-size: 0.92rem;
+}
+.footnote {
+  margin-top: 14px;
+  color: var(--ea-muted);
+  font-size: 0.85rem;
+  line-height: 1.6;
+}
+@media (max-width: 1120px) {
+  .hero-grid {
+    grid-template-columns: 1fr;
+  }
+}
+@media (max-width: 980px) {
+  .metric-grid {
+    grid-template-columns: repeat(2, minmax(0, 1fr));
+  }
+}
+@media (max-width: 640px) {
+  .hero {
+    padding: 24px 18px;
+  }
+  .metric-grid {
+    grid-template-columns: 1fr;
+  }
+  .app-shell {
+    padding: 12px 12px 20px;
+  }
+}
+"""
+SCENARIO_GUIDANCE = {
+    "easy_deadline_extraction": {
+        "title": "Deadline Extraction",
+        "description": "Read the professor email, capture the three exact milestones as todos, then archive the source email once the list is complete.",
+        "checks": [
+            "Read the source email before creating todos.",
+            "Create exactly three canonical todos with ISO dates.",
+            "Archive the email only after all deadlines are captured.",
+        ],
+    },
+    "medium_triage_and_negotiation": {
+        "title": "Inbox Triage And Negotiation",
+        "description": "Clear low-value newsletters, escalate the client complaint to the manager, and send a concrete meeting time to the teammate without archiving unresolved important mail too early.",
+        "checks": [
+            "Archive all three newsletters.",
+            "Forward the client complaint to manager@company.com.",
+            "Reply to the teammate with a specific meeting time.",
+        ],
+    },
+    "hard_rag_reply": {
+        "title": "RAG Reply",
+        "description": "Read the stakeholder request, search the local report store, and reply with the exact Q3 metrics from the matching file.",
+        "checks": [
+            "Read the VIP email first.",
+            "Search for the Q3 architecture report before replying.",
+            "Reply with 99.95%, 182ms, and 14% plus a greeting and signoff.",
+        ],
+    },
+}
+def _records_to_rows(records: list[dict], columns: list[str]) -> list[list[object]]:
+    return [[record.get(column) for column in columns] for record in records]
+def render_scenario_brief(task_name: str) -> str:
+    guidance = SCENARIO_GUIDANCE[task_name]
+    checks = "".join(f"<li>{escape(item)}</li>" for item in guidance["checks"])
+    return (
+        '<div class="scenario-brief">'
+        f"<h3>{escape(guidance['title'])}</h3>"
+        f"<p>{escape(guidance['description'])}</p>"
+        f"<ul>{checks}</ul>"
+        "</div>"
+    )
+def render_status_card(summary_payload: dict) -> str:
+    status = str(summary_payload["status"])
+    completed = bool(summary_payload["completed"])
+    badge_class = f"status-badge {status} {'success' if completed else 'failure'}".strip()
+    return (
+        '<div class="status-card">'
+        '<div class="status-topline">'
+        f'<div class="status-title">Run {escape(str(summary_payload["run_id"]))}</div>'
+        f'<div class="{badge_class}">{escape(status)}</div>'
+        "</div>"
+        '<div class="metric-grid">'
+        f'<div class="metric"><div class="metric-label">Requested Provider</div><div class="metric-value">{escape(str(summary_payload["requested_provider"]))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Effective Policy</div><div class="metric-value">{escape(str(summary_payload["policy_name"]))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Scenario</div><div class="metric-value">{escape(str(summary_payload["task_name"]))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Final Score</div><div class="metric-value">{summary_payload["final_score"]:.2f}</div></div>'
+        "</div>"
+        '<div class="metric-grid">'
+        f'<div class="metric"><div class="metric-label">Model</div><div class="metric-value">{escape(str(summary_payload["model_name"] or "n/a"))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Checkpoint</div><div class="metric-value">{escape(str(summary_payload["checkpoint_path"] or "n/a"))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Completed</div><div class="metric-value">{escape(str(completed))}</div></div>'
+        f'<div class="metric"><div class="metric-label">Status</div><div class="metric-value">{escape(status)}</div></div>'
+        "</div>"
+        f'<div class="status-reason">{escape(str(summary_payload["termination_reason"]))}</div>'
+        "</div>"
+    )
+def build_snapshot(task_name: str) -> tuple[str, list[list[object]], list[list[object]], list[list[object]], list[list[object]]]:
+    env = ExecutiveAssistantEnv(task_name=task_name)
+    observation = env.reset()
+    snapshot = env.workspace.snapshot()
+    return (
+        json.dumps(observation.model_dump(), indent=2),
+        _records_to_rows(snapshot["emails"], EMAIL_COLUMNS),
+        _records_to_rows(snapshot["todos"], TODO_COLUMNS),
+        _records_to_rows(snapshot["files"], FILE_COLUMNS),
+        _records_to_rows(snapshot["action_log"], ACTION_LOG_COLUMNS),
+    )
+def _default_rl_checkpoint() -> str:
+    return str(
+        default_checkpoint_path(
+            APP_RUNTIME.checkpoint_dir,
+            APP_RUNTIME.default_checkpoint_name,
+        )
+    )
+def _build_policy(
+    provider: str,
+    model_name: str,
+    api_key: str,
+    checkpoint_path: str,
+) -> object:
+    if provider == "baseline":
+        return BaselineAgent()
+    if provider == "rl":
+        return QLearningPolicy.load(checkpoint_path or _default_rl_checkpoint())
+    env_api_key = api_key or os.environ.get("OPENROUTER_API_KEY", "")
+    config = OpenRouterConfig(
+        api_key=env_api_key,
+        model_name=model_name,
+        site_url=os.environ.get("OPENROUTER_SITE_URL", "http://localhost:7860"),
+        app_name=os.environ.get(
+            "OPENROUTER_APP_NAME",
+            "Autonomous Executive Assistant Sandbox",
+        ),
+    )
+    return OpenRouterPolicy(config=config)
+def _trace_to_rows(trace: object) -> list[dict]:
+    return [
+        {
+            "step": step.step_index,
+            "reasoning": step.reasoning,
+            "action_type": step.action["action_type"],
+            "status": step.status,
+            "score": step.reward["total_score"],
+            "done": step.reward["is_done"],
+        }
+        for step in trace.steps
+    ]
+def _summary_payload(
+    *,
+    run_id: str,
+    task_name: str,
+    provider: str,
+    policy_name: str,
+    model_name: str,
+    checkpoint_path: str,
+    status: str,
+    final_score: float,
+    completed: bool,
+    termination_reason: str,
+) -> dict[str, object]:
+    return {
+        "run_id": run_id,
+        "task_name": task_name,
+        "requested_provider": provider,
+        "policy_name": policy_name,
+        "model_name": model_name if provider == "openrouter" else None,
+        "checkpoint_path": checkpoint_path if provider == "rl" else None,
+        "status": status,
+        "final_score": final_score,
+        "completed": completed,
+        "termination_reason": termination_reason,
+    }
+def _step_payload(
+    observation_payload: dict,
+    snapshot_payload: dict,
+    trace_rows: list[dict],
+    summary_payload: dict,
+) -> tuple[str, str, list[list[object]], list[list[object]], list[list[object]], list[list[object]], list[list[object]], str]:
+    return (
+        json.dumps(observation_payload, indent=2),
+        render_status_card(summary_payload),
+        _records_to_rows(snapshot_payload["emails"], EMAIL_COLUMNS),
+        _records_to_rows(snapshot_payload["todos"], TODO_COLUMNS),
+        _records_to_rows(snapshot_payload["files"], FILE_COLUMNS),
+        _records_to_rows(snapshot_payload["action_log"], ACTION_LOG_COLUMNS),
+        _records_to_rows(trace_rows, TRACE_COLUMNS),
+        json.dumps(summary_payload, indent=2),
+    )
+def configure_provider_inputs(provider: str) -> tuple[dict, dict, dict]:
+    is_openrouter = provider == "openrouter"
+    is_rl = provider == "rl"
+    return (
+        gr.update(visible=is_openrouter, interactive=is_openrouter),
+        gr.update(visible=is_openrouter, interactive=is_openrouter),
+        gr.update(visible=is_rl, interactive=is_rl),
+    )
+def build_initial_status(task_name: str, provider: str, model_name: str, checkpoint_path: str) -> str:
+    return render_status_card(
+        _summary_payload(
+            run_id="pending",
+            task_name=task_name,
+            provider=provider,
+            policy_name="not started",
+            model_name=model_name,
+            checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
+            status="initialized",
+            final_score=0.0,
+            completed=False,
+            termination_reason="Choose a policy and start an episode.",
+        )
+    )
+def run_live_episode(
+    task_name: str,
+    provider: str,
+    model_name: str,
+    api_key: str,
+    max_steps: int,
+    checkpoint_path: str,
+):
+    run_id = uuid.uuid4().hex[:8]
+    runner = EpisodeRunner(
+        policy=_build_policy(
+            provider=provider,
+            model_name=model_name,
+            api_key=api_key,
+            checkpoint_path=checkpoint_path,
+        ),
+        max_steps=max_steps,
+    )
+    env, observation = runner.initialize(task_name)
+    trace_rows: list[dict] = []
+    initial_snapshot = env.workspace.snapshot()
+    yield _step_payload(
+        observation_payload=observation.model_dump(),
+        snapshot_payload=initial_snapshot,
+        trace_rows=trace_rows,
+        summary_payload=_summary_payload(
+            run_id=run_id,
+            task_name=task_name,
+            provider=provider,
+            policy_name=type(runner.policy).__name__,
+            model_name=model_name,
+            checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
+            status="initialized",
+            final_score=0.0,
+            completed=False,
+            termination_reason="episode not started",
+        ),
+    )
+    while True:
+        _, observation, reward, record = runner.advance(task_name, env, observation)
+        trace_rows.append(
+            {
+                "step": record.step_index,
+                "reasoning": record.reasoning,
+                "action_type": record.action["action_type"],
+                "status": record.status,
+                "score": record.reward["total_score"],
+                "done": record.reward["is_done"],
+            }
+        )
+        yield _step_payload(
+            observation_payload=record.observation,
+            snapshot_payload=record.snapshot,
+            trace_rows=trace_rows,
+            summary_payload=_summary_payload(
+                run_id=run_id,
+                task_name=task_name,
+                provider=provider,
+                policy_name=type(runner.policy).__name__,
+                model_name=model_name,
+                checkpoint_path=checkpoint_path or _default_rl_checkpoint(),
+                status="running" if not reward.is_done else "completed",
+                final_score=reward.total_score,
+                completed=reward.total_score >= 1.0,
+                termination_reason=reward.reasoning,
+            ),
+        )
+        if reward.is_done:
+            return
+        time.sleep(0.15)
+with gr.Blocks(title="Autonomous Executive Assistant Sandbox") as demo:
+    with gr.Column(elem_classes=["app-shell"]):
+        gr.HTML(
+            """
+            <section class="hero">
+              <div class="hero-grid">
+                <div class="hero-copy">
+                  <div class="hero-kicker">Deterministic Eval Console</div>
+                  <h1>Executive Assistant Sandbox</h1>
+                  <p>
+                    Run the exact same episode loop used in training, inspect each workspace mutation in real time,
+                    and compare baseline, RL, and OpenRouter-backed policies without losing the structure of the task.
+                  </p>
+                  <div class="hero-strip">
+                    <div class="hero-pill">Shared EpisodeRunner path</div>
+                    <div class="hero-pill">Seeded scenarios with visible state</div>
+                    <div class="hero-pill">Policy debugging without notebook sprawl</div>
+                  </div>
+                </div>
+                <aside class="hero-aside">
+                  <p class="hero-aside-label">What This UI Optimizes For</p>
+                  <p class="hero-aside-value">Fast policy comparison with readable state.</p>
+                  <p class="hero-aside-copy">
+                    The interface is intentionally light, structured, and editorial rather than “chat app” themed.
+                    Controls stay compact while the workspace and trace remain the visual priority.
+                  </p>
+                </aside>
+              </div>
+            </section>
+            """
+        )
+        with gr.Row(equal_height=True):
+            with gr.Column(scale=4):
+                with gr.Group(elem_classes=["panel-card"]):
+                    gr.HTML(
+                        """
+                        <h2 class="panel-title">Control Room</h2>
+                        <p class="panel-copy">
+                          Pick a scenario, choose a policy provider, and run a stepwise episode against the same environment used by training and evaluation.
+                        </p>
+                        """
+                    )
+                    task = gr.Dropdown(
+                        choices=[
+                            "easy_deadline_extraction",
+                            "medium_triage_and_negotiation",
+                            "hard_rag_reply",
+                        ],
+                        value="easy_deadline_extraction",
+                        label="Scenario",
+                    )
+                    provider = gr.Dropdown(
+                        choices=["baseline", "openrouter", "rl"],
+                        value="baseline",
+                        label="Policy",
+                    )
+                    max_steps = gr.Number(value=12, precision=0, label="Max Steps")
+                    with gr.Accordion("Provider Settings", open=False):
+                        model_name = gr.Textbox(
+                            value="google/gemma-4-31b-it",
+                            label="OpenRouter Model",
+                        )
+                        checkpoint_path = gr.Textbox(
+                            value=_default_rl_checkpoint(),
+                            label="RL Checkpoint Path",
+                        )
+                        api_key = gr.Textbox(
+                            type="password",
+                            label="OPENROUTER_API_KEY",
+                        )
+                    with gr.Row():
+                        reset = gr.Button("Reset Scenario", variant="secondary")
+                        run_episode_btn = gr.Button("Run Episode", variant="primary")
+                    gr.HTML(
+                        """
+                        <p class="footnote">
+                          OpenRouter inputs appear only when needed. RL checkpoint selection stays available for policy replay without changing the execution path.
+                        </p>
+                        """
+                    )
+            with gr.Column(scale=5):
+                scenario_brief = gr.HTML(render_scenario_brief("easy_deadline_extraction"))
+                status_card = gr.HTML(
+                    build_initial_status(
+                        "easy_deadline_extraction",
+                        "baseline",
+                        "google/gemma-4-31b-it",
+                        _default_rl_checkpoint(),
+                    )
+                )
+        with gr.Group(elem_classes=["surface-card", "workspace-grid"]):
+            with gr.Tabs():
+                with gr.Tab("Live Workspace"):
+                    with gr.Row():
+                        observation = gr.Code(label="Observation", language="json")
+                        summary = gr.Code(label="Run Summary", language="json")
+                    with gr.Row():
+                        emails = gr.Dataframe(headers=EMAIL_COLUMNS, label="Unread Emails")
+                        todos = gr.Dataframe(headers=TODO_COLUMNS, label="Todos")
+                    with gr.Row():
+                        files = gr.Dataframe(headers=FILE_COLUMNS, label="Search Results")
+                        action_log = gr.Dataframe(headers=ACTION_LOG_COLUMNS, label="Action Log")
+                with gr.Tab("Episode Trace"):
+                    trace_table = gr.Dataframe(headers=TRACE_COLUMNS, label="Episode Trace")
+    reset.click(
+        fn=build_snapshot,
+        inputs=[task],
+        outputs=[observation, emails, todos, files, action_log],
+    )
+    reset.click(
+        fn=render_scenario_brief,
+        inputs=[task],
+        outputs=[scenario_brief],
+    )
+    reset.click(
+        fn=build_initial_status,
+        inputs=[task, provider, model_name, checkpoint_path],
+        outputs=[status_card],
+    )
+    provider.change(
+        fn=configure_provider_inputs,
+        inputs=[provider],
+        outputs=[model_name, api_key, checkpoint_path],
+    )
+    provider.change(
+        fn=build_initial_status,
+        inputs=[task, provider, model_name, checkpoint_path],
+        outputs=[status_card],
+    )
+    task.change(
+        fn=render_scenario_brief,
+        inputs=[task],
+        outputs=[scenario_brief],
+    )
+    task.change(
+        fn=build_initial_status,
+        inputs=[task, provider, model_name, checkpoint_path],
+        outputs=[status_card],
+    )
+    run_episode_btn.click(
+        fn=run_live_episode,
+        inputs=[task, provider, model_name, api_key, max_steps, checkpoint_path],
+        outputs=[observation, status_card, emails, todos, files, action_log, trace_table, summary],
+    )
+    demo.load(
+        fn=build_snapshot,
+        inputs=[task],
+        outputs=[observation, emails, todos, files, action_log],
+    )
+    demo.load(
+        fn=configure_provider_inputs,
+        inputs=[provider],
+        outputs=[model_name, api_key, checkpoint_path],
+    )
+    demo.load(
+        fn=render_scenario_brief,
+        inputs=[task],
+        outputs=[scenario_brief],
+    )
+    demo.load(
+        fn=build_initial_status,
+        inputs=[task, provider, model_name, checkpoint_path],
+        outputs=[status_card],
+    )
+if __name__ == "__main__":
+    demo.launch(
+        server_name=APP_RUNTIME.host,
+        server_port=APP_RUNTIME.port,
+        show_error=True,
+        css=APP_CSS,
+    )

docs/HF_SPACE_README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+title: Project Epsilon | Executive Assistant Sandbox
+emoji: "🧭"
+colorFrom: yellow
+colorTo: gray
+sdk: docker
+app_port: 7860
+pinned: false
+short_description: OpenEnv executive assistant sandbox demo for judges.
+---
+# Project Epsilon
+Discrete Hugging Face Space README for the **Autonomous Executive Assistant Sandbox**, prepared for the **OpenEnv Scaler x Meta x PyTorch Hack**.
+## Team
+- Team name: `Project Epsilon`
+- Hugging Face usernames: `@HF_USERNAME_1`, `@HF_USERNAME_2`, `@HF_USERNAME_3`
+- Space repo: `HF_USERNAME_PLACEHOLDER/project-epsilon-executive-assistant`
+Replace the placeholder usernames and repo owner when the final team accounts are ready.
+## What This Space Shows
+- Deterministic OpenEnv-style tasks over a SQLite-backed executive assistant workspace
+- A Gradio judge console that replays the shared `EpisodeRunner` loop step by step
+- Policy switching across `baseline`, bundled `rl`, and optional `openrouter`
+- Visible inbox, todo, file-search, and action-log state transitions
+## Hack Context
+OpenEnv was introduced by Hugging Face and Meta as an open source framework for typed agent environments. The Scaler hack dashboard lists the build window as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space is tuned for that style of evaluation: deterministic tasks, structured actions, reproducible runs, and a judge-friendly visual trace.
+## Runtime Notes
+- SDK: `docker`
+- App port: `7860`
+- Entry point: `python app.py`
+- Optional secret: `OPENROUTER_API_KEY`
+- Bundled RL checkpoint path: `artifacts/checkpoints/q_policy_notebook.json`
+## Judge Flow
+1. Open the Space and choose one of the seeded scenarios.
+2. Run `baseline` first for the reference trace.
+3. Switch to `rl` to replay the trained checkpoint bundled with the Space.
+4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed policy.
+## References
+- Hack dashboard: https://www.scaler.com/openenv-hackathon
+- OpenEnv launch: https://huggingface.co/blog/openenv

openenv.yaml ADDED Viewed

	@@ -0,0 +1,10 @@

+name: autonomous-executive-assistant-sandbox
+description: Deterministic executive assistant environment backed by an in-memory SQLite workspace.
+entrypoint: src.executive_assistant.env:ExecutiveAssistantEnv
+observation_model: src.executive_assistant.models:WorkspaceObservation
+action_model: src.executive_assistant.models:AssistantAction
+reward_model: src.executive_assistant.models:TaskReward
+tasks:
+  - easy_deadline_extraction
+  - medium_triage_and_negotiation
+  - hard_rag_reply

pytest.ini ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [pytest]
2	+ pythonpath = .

requirements.app.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ -r requirements.txt

requirements.training.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+-r requirements.txt
+huggingface_hub>=0.31.0
+jupyterlab>=4.2.0
+ipykernel>=6.29.0
+pandas>=2.2.0
+matplotlib>=3.9.0

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+gradio>=5.0.0
+openai>=1.76.0
+pydantic>=2.8.0
+pytest>=8.0.0
+PyYAML>=6.0.0

run.py ADDED Viewed

	@@ -0,0 +1,29 @@

+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.agent import BaselineAgent
+# Create env
+env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+# Create agent
+agent = BaselineAgent()
+# Reset env
+obs = env.reset()
+print("STARTING...\n")
+# Run loop
+for step in range(10):
+    decision = agent.choose_action(env.task_name, obs)
+    print(f"\nSTEP {step+1}")
+    print("Reasoning:", decision.reasoning)
+    print("Action:", decision.action)
+    obs, reward = env.step(decision.action)
+    print("Reward:", reward)
+    if reward.is_done:
+        print("\nTASK COMPLETE ✅")
+        break

scripts/deploy_hf_space.py ADDED Viewed

	@@ -0,0 +1,178 @@

+from __future__ import annotations
+import argparse
+import os
+import shutil
+import sys
+import tempfile
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.executive_assistant.deployment import (
+    DEFAULT_SPACE_TITLE,
+    HFSpaceDeployConfig,
+    parse_hf_usernames,
+    stage_space_bundle,
+)
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Create or update a Hugging Face Space from this repository in one command."
+    )
+    parser.add_argument(
+        "--repo-id",
+        default=os.environ.get("HF_SPACE_REPO", "").strip(),
+        help="Target Space repo in owner/name form. Defaults to HF_SPACE_REPO.",
+    )
+    parser.add_argument(
+        "--token",
+        default=os.environ.get("HF_TOKEN", "").strip(),
+        help="Hugging Face token. Defaults to HF_TOKEN.",
+    )
+    parser.add_argument(
+        "--title",
+        default=os.environ.get("HF_SPACE_TITLE", DEFAULT_SPACE_TITLE),
+        help="Space title used in the generated HF README.",
+    )
+    parser.add_argument(
+        "--team-name",
+        default=os.environ.get("HF_SPACE_TEAM_NAME", "Project Epsilon"),
+        help="Team name shown in the generated HF README.",
+    )
+    parser.add_argument(
+        "--hf-usernames",
+        default=os.environ.get(
+            "HF_SPACE_TEAM_USERNAMES",
+            "HF_USERNAME_1,HF_USERNAME_2,HF_USERNAME_3",
+        ),
+        help="Comma-separated HF usernames for the HF README placeholders.",
+    )
+    parser.add_argument(
+        "--checkpoint-name",
+        default=os.environ.get("HF_SPACE_CHECKPOINT_NAME", "q_policy_notebook.json"),
+        help="Checkpoint filename staged into artifacts/checkpoints/ for RL replay.",
+    )
+    parser.add_argument(
+        "--openrouter-api-key",
+        default=os.environ.get("OPENROUTER_API_KEY", "").strip(),
+        help="Optional secret to set on the Space during deployment.",
+    )
+    parser.add_argument(
+        "--private",
+        action="store_true",
+        default=os.environ.get("HF_SPACE_PRIVATE", "").strip().lower() == "true",
+        help="Create or keep the Space private.",
+    )
+    parser.add_argument(
+        "--skip-checkpoint",
+        action="store_true",
+        help="Skip bundling the RL checkpoint.",
+    )
+    parser.add_argument(
+        "--keep-stage-dir",
+        default="",
+        help="Optional local folder where the prepared Space bundle should be copied after upload.",
+    )
+    return parser
+def require_huggingface_hub():
+    try:
+        from huggingface_hub import HfApi  # type: ignore
+    except ImportError as exc:
+        raise SystemExit(
+            "huggingface_hub is required for deployment. Install the training environment "
+            "or run `python -m pip install huggingface_hub` first."
+        ) from exc
+    return HfApi
+def maybe_set_space_secret(api, repo_id: str, key: str, value: str) -> str:
+    if not value.strip():
+        return f"Skipped secret {key} because no value was provided."
+    add_secret = getattr(api, "add_space_secret", None)
+    if add_secret is None:
+        return f"Upload succeeded, but this huggingface_hub version cannot set {key} automatically."
+    add_secret(repo_id=repo_id, key=key, value=value)
+    return f"Set Space secret {key}."
+def maybe_set_space_variable(api, repo_id: str, key: str, value: str) -> str:
+    add_variable = getattr(api, "add_space_variable", None)
+    if add_variable is None:
+        return f"Upload succeeded, but this huggingface_hub version cannot set variable {key} automatically."
+    add_variable(repo_id=repo_id, key=key, value=value)
+    return f"Set Space variable {key}={value}."
+def main() -> int:
+    parser = build_parser()
+    args = parser.parse_args()
+    if not args.repo_id:
+        parser.error("A Space repo id is required. Pass --repo-id or set HF_SPACE_REPO.")
+    if "/" not in args.repo_id:
+        parser.error("Space repo id must be in owner/name form.")
+    if not args.token:
+        parser.error("A Hugging Face token is required. Pass --token or set HF_TOKEN.")
+    config = HFSpaceDeployConfig(
+        repo_id=args.repo_id,
+        title=args.title,
+        team_name=args.team_name,
+        hf_usernames=parse_hf_usernames(args.hf_usernames),
+        checkpoint_name=args.checkpoint_name,
+        private=args.private,
+        include_checkpoint=not args.skip_checkpoint,
+    )
+    HfApi = require_huggingface_hub()
+    api = HfApi(token=args.token)
+    with tempfile.TemporaryDirectory(prefix="hf-space-stage-") as tmp_dir:
+        stage_dir = Path(tmp_dir)
+        checkpoint_path = stage_space_bundle(config, stage_dir)
+        api.create_repo(
+            repo_id=config.repo_id,
+            repo_type="space",
+            space_sdk="docker",
+            private=config.private,
+            exist_ok=True,
+        )
+        api.upload_folder(
+            folder_path=str(stage_dir),
+            repo_id=config.repo_id,
+            repo_type="space",
+            commit_message="Deploy Project Epsilon Space bundle",
+            delete_patterns=["*", "**/*"],
+        )
+        messages = [
+            f"Uploaded Space bundle to {config.space_url}",
+            f"App URL: {config.app_url}",
+        ]
+        if checkpoint_path is not None:
+            messages.append(f"Bundled RL checkpoint: {checkpoint_path.relative_to(stage_dir)}")
+        messages.append(maybe_set_space_secret(api, config.repo_id, "OPENROUTER_API_KEY", args.openrouter_api_key))
+        messages.append(maybe_set_space_variable(api, config.repo_id, "OPENROUTER_APP_NAME", config.title))
+        messages.append(maybe_set_space_variable(api, config.repo_id, "OPENROUTER_SITE_URL", config.app_url))
+        if args.keep_stage_dir:
+            target_dir = Path(args.keep_stage_dir).resolve()
+            if target_dir.exists():
+                shutil.rmtree(target_dir)
+            shutil.copytree(stage_dir, target_dir)
+            messages.append(f"Saved staged bundle to {target_dir}")
+    for message in messages:
+        print(message)
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

scripts/deploy_hf_space.sh ADDED Viewed

	@@ -0,0 +1,25 @@

+#!/usr/bin/env bash
+set -euo pipefail
+if [[ -f ".env.hf.space" ]]; then
+  while IFS= read -r raw_line || [[ -n "${raw_line}" ]]; do
+    line="${raw_line#"${raw_line%%[![:space:]]*}"}"
+    line="${line%"${line##*[![:space:]]}"}"
+    if [[ -z "${line}" || "${line}" == \#* || "${line}" != *=* ]]; then
+      continue
+    fi
+    key="${line%%=*}"
+    value="${line#*=}"
+    key="${key%"${key##*[![:space:]]}"}"
+    value="${value#"${value%%[![:space:]]*}"}"
+    value="${value%"${value##*[![:space:]]}"}"
+    export "${key}=${value}"
+  done < .env.hf.space
+fi
+PYTHON_BIN="${PYTHON_BIN:-.venv-training/bin/python}"
+if [[ ! -x "${PYTHON_BIN}" ]]; then
+  PYTHON_BIN="python"
+fi
+exec "${PYTHON_BIN}" scripts/deploy_hf_space.py "$@"

scripts/evaluate_policies.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
+from src.executive_assistant.config import OpenRouterConfig, TrainingRuntimeConfig, load_env_file
+from src.executive_assistant.runner import export_traces_jsonl, run_policy_suite
+TASKS = [
+    "easy_deadline_extraction",
+    "medium_triage_and_negotiation",
+    "hard_rag_reply",
+]
+def build_policy(provider: str, model_name: str) -> object:
+    if provider == "baseline":
+        return BaselineAgent()
+    if provider == "openrouter":
+        load_env_file(TrainingRuntimeConfig().env_file)
+        config = OpenRouterConfig.from_env()
+        config = OpenRouterConfig(
+            api_key=config.api_key,
+            model_name=model_name,
+            base_url=config.base_url,
+            site_url=config.site_url,
+            app_name=config.app_name,
+            temperature=config.temperature,
+            max_tokens=config.max_tokens,
+        )
+        return OpenRouterPolicy(config=config)
+    raise ValueError(f"Unsupported provider: {provider}")
+def main() -> None:
+    load_env_file(TrainingRuntimeConfig().env_file)
+    parser = argparse.ArgumentParser(description="Evaluate a policy over all seeded tasks.")
+    parser.add_argument("--provider", choices=["baseline", "openrouter"], default="baseline")
+    parser.add_argument("--model", default="google/gemma-4-31b-it")
+    parser.add_argument("--max-steps", type=int, default=12)
+    parser.add_argument("--output", default="")
+    args = parser.parse_args()
+    traces = run_policy_suite(
+        policy=build_policy(args.provider, args.model),
+        task_names=TASKS,
+        max_steps=args.max_steps,
+    )
+    summary = {
+        task_name: {
+            "completed": trace.completed,
+            "final_score": trace.final_score,
+            "steps": len(trace.steps),
+            "termination_reason": trace.termination_reason,
+        }
+        for task_name, trace in traces.items()
+    }
+    print(json.dumps(summary, indent=2))
+    if args.output:
+        export_traces_jsonl(list(traces.values()), args.output)
+        print(f"Saved traces to {args.output}")
+if __name__ == "__main__":
+    main()

scripts/run_policy_episode.py ADDED Viewed

	@@ -0,0 +1,51 @@

+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy
+from src.executive_assistant.config import OpenRouterConfig, TrainingRuntimeConfig, load_env_file
+from src.executive_assistant.runner import EpisodeRunner
+def build_policy(provider: str, model_name: str) -> object:
+    if provider == "baseline":
+        return BaselineAgent()
+    if provider == "openrouter":
+        load_env_file(TrainingRuntimeConfig().env_file)
+        config = OpenRouterConfig.from_env()
+        config = OpenRouterConfig(
+            api_key=config.api_key,
+            model_name=model_name,
+            base_url=config.base_url,
+            site_url=config.site_url,
+            app_name=config.app_name,
+            temperature=config.temperature,
+            max_tokens=config.max_tokens,
+        )
+        return OpenRouterPolicy(config=config)
+    raise ValueError(f"Unsupported provider: {provider}")
+def main() -> None:
+    load_env_file(TrainingRuntimeConfig().env_file)
+    parser = argparse.ArgumentParser(description="Run a single policy episode.")
+    parser.add_argument("--task", required=True)
+    parser.add_argument("--provider", choices=["baseline", "openrouter"], default="baseline")
+    parser.add_argument("--model", default="google/gemma-4-31b-it")
+    parser.add_argument("--max-steps", type=int, default=12)
+    args = parser.parse_args()
+    runner = EpisodeRunner(policy=build_policy(args.provider, args.model), max_steps=args.max_steps)
+    trace = runner.run(args.task)
+    print(json.dumps(trace.to_dict(), indent=2))
+if __name__ == "__main__":
+    main()

scripts/setup_app_env.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/usr/bin/env bash
+set -euo pipefail
+python -m venv .venv-app
+source .venv-app/bin/activate
+python -m pip install --upgrade pip
+python -m pip install -r requirements.app.txt
+echo "App environment ready at .venv-app"

scripts/setup_training_env.sh ADDED Viewed

	@@ -0,0 +1,9 @@

+#!/usr/bin/env bash
+set -euo pipefail
+python -m venv .venv-training
+source .venv-training/bin/activate
+python -m pip install --upgrade pip
+python -m pip install -r requirements.training.txt
+python -m ipykernel install --user --name scalerhack2-training --display-name "Python (scalerhack2-training)"
+echo "Training environment ready at .venv-training with Jupyter kernel scalerhack2-training"

scripts/train_rl_agent.py ADDED Viewed

	@@ -0,0 +1,47 @@

+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.executive_assistant.agent import BaselineAgent
+from src.executive_assistant.config import TrainingRuntimeConfig, load_env_file
+from src.executive_assistant.training import evaluate_q_policy, train_q_learning
+def main() -> None:
+    load_env_file(TrainingRuntimeConfig().env_file)
+    parser = argparse.ArgumentParser(description="Train a tabular RL policy for seeded tasks.")
+    parser.add_argument("--episodes", type=int, default=300)
+    parser.add_argument("--epsilon", type=float, default=0.15)
+    parser.add_argument("--checkpoint", default="artifacts/checkpoints/q_policy.json")
+    parser.add_argument("--no-teacher", action="store_true")
+    args = parser.parse_args()
+    teacher = None if args.no_teacher else BaselineAgent()
+    policy, training_scores = train_q_learning(
+        episodes=args.episodes,
+        epsilon=args.epsilon,
+        teacher=teacher,
+    )
+    checkpoint_path = policy.save(args.checkpoint)
+    evaluation = evaluate_q_policy(policy)
+    print(
+        json.dumps(
+            {
+                "checkpoint": str(checkpoint_path),
+                "training_scores": training_scores,
+                "evaluation": evaluation,
+            },
+            indent=2,
+        )
+    )
+if __name__ == "__main__":
+    main()

src/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Top-level package namespace for local src-based imports."""
2	+

src/executive_assistant/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Core package for the autonomous executive assistant sandbox."""
2	+

src/executive_assistant/agent.py ADDED Viewed

	@@ -0,0 +1,363 @@

+from __future__ import annotations
+import re
+from src.executive_assistant.config import OpenRouterConfig
+from src.executive_assistant.llm_service import OpenRouterLLMService
+from src.executive_assistant.models import AssistantAction, PolicyDecision, WorkspaceObservation
+from src.executive_assistant.runner import EpisodeRunner, EpisodeTrace, run_policy_suite
+class ActionCatalog:
+    """Finite action templates for smoke-testing and future policy indexing."""
+    @staticmethod
+    def enumerate_actions(observation: WorkspaceObservation) -> list[AssistantAction]:
+        actions: list[AssistantAction] = []
+        for email in observation.unread_emails:
+            actions.append(AssistantAction(action_type="read_email", target_id=email.id))
+            actions.append(AssistantAction(action_type="archive", target_id=email.id))
+            actions.append(
+                AssistantAction(
+                    action_type="forward",
+                    target_id=email.id,
+                    secondary_payload="manager@company.com",
+                    payload="Escalating this for review.",
+                )
+            )
+        if observation.current_email is not None:
+            actions.append(
+                AssistantAction(
+                    action_type="reply",
+                    target_id=observation.current_email.id,
+                    payload="Hello, I will follow up shortly.\nRegards, Executive Assistant",
+                )
+            )
+        actions.extend(
+            [
+                AssistantAction(action_type="search_files", payload="Q3 Architecture"),
+                AssistantAction(action_type="search_files", payload="architecture metrics"),
+            ]
+        )
+        return actions
+class BaselineAgent:
+    """Deterministic baseline policy for seeded scenarios and training-pipeline smoke tests."""
+    def __init__(self, model_name: str = "deterministic-baseline-v1") -> None:
+        self.model_name = model_name
+    def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
+        if task_name == "easy_deadline_extraction":
+            return self._choose_easy_action(observation)
+        if task_name == "medium_triage_and_negotiation":
+            return self._choose_medium_action(observation)
+        if task_name == "hard_rag_reply":
+            return self._choose_hard_action(observation)
+        raise ValueError(f"Unsupported task: {task_name}")
+    def _choose_easy_action(self, observation: WorkspaceObservation) -> PolicyDecision:
+        if observation.current_email is None:
+            email = observation.unread_emails[0]
+            return PolicyDecision(
+                reasoning="Read the seeded deadline email before extracting any tasks.",
+                action=AssistantAction(action_type="read_email", target_id=email.id),
+            )
+        deadlines = self._extract_deadlines(observation.current_email.body)
+        existing = {todo.strip().lower() for todo in observation.active_todos}
+        for task_name, deadline_date in deadlines:
+            if task_name.lower() not in existing:
+                return PolicyDecision(
+                    reasoning=f"Add the missing todo '{task_name}' with deadline {deadline_date}.",
+                    action=AssistantAction(
+                        action_type="add_todo",
+                        payload=task_name,
+                        secondary_payload=deadline_date,
+                    ),
+                )
+        return PolicyDecision(
+            reasoning="All deadlines are captured, so archive the source email.",
+            action=AssistantAction(action_type="archive", target_id=observation.current_email.id),
+        )
+    def _choose_medium_action(self, observation: WorkspaceObservation) -> PolicyDecision:
+        newsletters = {
+            "news@updates.example",
+            "promotions@vendor.example",
+            "events@community.example",
+        }
+        action_history = " ".join(observation.action_history).lower()
+        for email in observation.unread_emails:
+            if email.sender in newsletters:
+                return PolicyDecision(
+                    reasoning=f"Archive non-actionable newsletter from {email.sender}.",
+                    action=AssistantAction(action_type="archive", target_id=email.id),
+                )
+        client_email = next(
+            (email for email in observation.unread_emails if email.sender == "client@company.com"),
+            None,
+        )
+        if client_email is not None and "forward: forwarded to manager@company.com" not in action_history:
+            return PolicyDecision(
+                reasoning="Escalate the urgent client complaint to the manager.",
+                action=AssistantAction(
+                    action_type="forward",
+                    target_id=client_email.id,
+                    secondary_payload="manager@company.com",
+                    payload="Urgent client complaint. Please take over immediately.",
+                ),
+            )
+        teammate_email = next(
+            (email for email in observation.unread_emails if email.sender == "teammate@company.com"),
+            None,
+        )
+        if teammate_email is not None and "reply: reply drafted" not in action_history:
+            return PolicyDecision(
+                reasoning="Reply to the reschedule request with a concrete proposed time.",
+                action=AssistantAction(
+                    action_type="reply",
+                    target_id=teammate_email.id,
+                    payload="Hello, 3:30 PM IST works for me. Regards, Executive Assistant",
+                ),
+            )
+        if observation.current_email is not None:
+            return PolicyDecision(
+                reasoning="Archive the currently open message to reduce inbox clutter.",
+                action=AssistantAction(action_type="archive", target_id=observation.current_email.id),
+            )
+        raise RuntimeError("No valid medium-task action available")
+    def _choose_hard_action(self, observation: WorkspaceObservation) -> PolicyDecision:
+        if observation.current_email is None:
+            email = observation.unread_emails[0]
+            return PolicyDecision(
+                reasoning="Read the stakeholder email to ground the response request.",
+                action=AssistantAction(action_type="read_email", target_id=email.id),
+            )
+        if not observation.search_results:
+            return PolicyDecision(
+                reasoning="Search the local report store for the Q3 architecture document.",
+                action=AssistantAction(action_type="search_files", payload="Q3 Architecture"),
+            )
+        metrics = self._extract_report_metrics(observation.search_results[0].snippet)
+        payload = (
+            "Hello,\n"
+            f"Here are the requested Q3 architecture metrics: availability {metrics['availability']}, "
+            f"mean API latency {metrics['latency']}, and infrastructure cost reduction {metrics['cost_reduction']}.\n"
+            "Regards,\nExecutive Assistant"
+        )
+        return PolicyDecision(
+            reasoning="Reply with the three requested metrics pulled from the report search results.",
+            action=AssistantAction(
+                action_type="reply",
+                target_id=observation.current_email.id,
+                payload=payload,
+            ),
+        )
+    @staticmethod
+    def _extract_deadlines(email_body: str) -> list[tuple[str, str]]:
+        pattern = re.compile(r"([a-z ]+ due)\s+(\d{4}-\d{2}-\d{2})", re.IGNORECASE)
+        cleaned: list[tuple[str, str]] = []
+        for task, date in pattern.findall(email_body):
+            normalized_task = re.sub(r"^(and\s+)", "", task.strip(), flags=re.IGNORECASE)
+            cleaned.append((normalized_task.title(), date))
+        return cleaned
+    @staticmethod
+    def _extract_report_metrics(snippet: str) -> dict[str, str]:
+        metrics = {
+            "availability": re.search(r"(\d+\.\d+%)", snippet),
+            "latency": re.search(r"(\d+ms)", snippet),
+            "cost_reduction": re.search(r"(\d+%)", snippet.split("Infrastructure cost reduction:")[-1]),
+        }
+        return {
+            "availability": metrics["availability"].group(1) if metrics["availability"] else "unknown",
+            "latency": metrics["latency"].group(1) if metrics["latency"] else "unknown",
+            "cost_reduction": (
+                metrics["cost_reduction"].group(1) if metrics["cost_reduction"] else "unknown"
+            ),
+        }
+class OpenRouterPolicy:
+    def __init__(
+        self,
+        config: OpenRouterConfig | None = None,
+        service: OpenRouterLLMService | None = None,
+    ) -> None:
+        self.config = config or OpenRouterConfig.from_env()
+        self.service = service or OpenRouterLLMService(self.config)
+    def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
+        decision = self.service.generate_policy_decision(task_name, observation)
+        return self._sanitize_decision(task_name, observation, decision)
+    def _sanitize_decision(
+        self,
+        task_name: str,
+        observation: WorkspaceObservation,
+        decision: PolicyDecision,
+    ) -> PolicyDecision:
+        action = decision.action
+        if action.action_type == "add_todo":
+            action = self._normalize_easy_todo_action(task_name, observation, action)
+        elif action.action_type == "search_files":
+            action = AssistantAction(
+                action_type=action.action_type,
+                target_id=None,
+                payload=action.payload,
+                secondary_payload=None,
+            )
+        elif action.action_type == "add_todo":
+            action = AssistantAction(
+                action_type=action.action_type,
+                target_id=None,
+                payload=action.payload,
+                secondary_payload=action.secondary_payload,
+            )
+        elif action.action_type in {"read_email", "archive"}:
+            action = AssistantAction(
+                action_type=action.action_type,
+                target_id=action.target_id,
+                payload=None,
+                secondary_payload=None,
+            )
+        elif action.action_type == "forward":
+            action = self._normalize_forward_action(task_name, observation, action)
+        if action.action_type == "reply" and action.payload:
+            payload = action.payload.strip()
+            target_id = action.target_id
+            if task_name == "hard_rag_reply":
+                if not payload.lower().startswith("hello"):
+                    payload = f"Hello,\n{payload}"
+                if "regards" not in payload.lower():
+                    payload = f"{payload}\nRegards,\nExecutive Assistant"
+            elif task_name == "medium_triage_and_negotiation":
+                if not re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", payload):
+                    payload = "Hello, 3:30 PM IST works for me."
+                if "regards" not in payload.lower():
+                    payload = f"{payload}\nRegards,\nExecutive Assistant"
+                target_id = self._resolve_teammate_email_id(observation, action.target_id)
+            action = AssistantAction(
+                action_type=action.action_type,
+                target_id=target_id,
+                payload=payload,
+                secondary_payload=action.secondary_payload,
+            )
+        return PolicyDecision(reasoning=decision.reasoning, action=action)
+    def _normalize_easy_todo_action(
+        self,
+        task_name: str,
+        observation: WorkspaceObservation,
+        action: AssistantAction,
+    ) -> AssistantAction:
+        if task_name != "easy_deadline_extraction":
+            return AssistantAction(
+                action_type=action.action_type,
+                target_id=None,
+                payload=action.payload,
+                secondary_payload=action.secondary_payload,
+            )
+        canonical_todos = [
+            ("proposal", "Proposal Due", "2026-04-10"),
+            ("prototype", "Prototype Due", "2026-04-20"),
+            ("final report", "Final Report Due", "2026-04-30"),
+        ]
+        payload = (action.payload or "").strip()
+        payload_lower = payload.lower()
+        for marker, canonical_name, canonical_deadline in canonical_todos:
+            if marker in payload_lower:
+                return AssistantAction(
+                    action_type="add_todo",
+                    target_id=None,
+                    payload=canonical_name,
+                    secondary_payload=canonical_deadline,
+                )
+        existing = {todo.strip().lower() for todo in observation.active_todos}
+        for _, canonical_name, canonical_deadline in canonical_todos:
+            if canonical_name.lower() not in existing:
+                return AssistantAction(
+                    action_type="add_todo",
+                    target_id=None,
+                    payload=canonical_name,
+                    secondary_payload=canonical_deadline,
+                )
+        return AssistantAction(
+            action_type="add_todo",
+            target_id=None,
+            payload=payload,
+            secondary_payload=action.secondary_payload,
+        )
+    def _normalize_forward_action(
+        self,
+        task_name: str,
+        observation: WorkspaceObservation,
+        action: AssistantAction,
+    ) -> AssistantAction:
+        target_id = action.target_id
+        recipient = action.secondary_payload
+        note = action.payload
+        if task_name == "medium_triage_and_negotiation":
+            if target_id is None and observation.current_email is not None:
+                target_id = observation.current_email.id
+            if recipient is None:
+                recipient = "manager@company.com"
+            if note is None or not note.strip():
+                note = "Urgent client complaint. Please take over immediately."
+        return AssistantAction(
+            action_type="forward",
+            target_id=target_id,
+            payload=note,
+            secondary_payload=recipient,
+        )
+    @staticmethod
+    def _resolve_teammate_email_id(
+        observation: WorkspaceObservation,
+        target_id: int | None,
+    ) -> int | None:
+        if target_id is not None:
+            return target_id
+        if observation.current_email and observation.current_email.sender == "teammate@company.com":
+            return observation.current_email.id
+        teammate_email = next(
+            (email for email in observation.unread_emails if email.sender == "teammate@company.com"),
+            None,
+        )
+        return teammate_email.id if teammate_email is not None else None
+OpenAIResponsesPolicy = OpenRouterPolicy
+def run_episode(task_name: str, max_steps: int = 12) -> EpisodeTrace:
+    runner = EpisodeRunner(policy=BaselineAgent(), max_steps=max_steps)
+    return runner.run(task_name)
+def smoke_test_training_pipeline() -> dict[str, EpisodeTrace]:
+    return run_policy_suite(
+        policy=BaselineAgent(),
+        task_names=[
+            "easy_deadline_extraction",
+            "medium_triage_and_negotiation",
+            "hard_rag_reply",
+        ],
+    )

src/executive_assistant/config.py ADDED Viewed

	@@ -0,0 +1,78 @@

+from __future__ import annotations
+import os
+from dataclasses import dataclass
+from pathlib import Path
+def load_env_file(env_path: str | Path, override: bool = False) -> bool:
+    path = Path(env_path)
+    if not path.exists():
+        return False
+    for raw_line in path.read_text().splitlines():
+        line = raw_line.strip()
+        if not line or line.startswith("#") or "=" not in line:
+            continue
+        key, value = line.split("=", 1)
+        key = key.strip()
+        value = value.strip().strip('"').strip("'")
+        if override or key not in os.environ:
+            os.environ[key] = value
+    return True
+@dataclass(frozen=True)
+class OpenRouterConfig:
+    api_key: str
+    model_name: str = "google/gemma-4-31b-it"
+    base_url: str = "https://openrouter.ai/api/v1"
+    site_url: str = "http://localhost:7860"
+    app_name: str = "Autonomous Executive Assistant Sandbox"
+    temperature: float = 0.1
+    max_tokens: int = 600
+    @classmethod
+    def from_env(cls, env_file: str | Path | None = None) -> "OpenRouterConfig":
+        if env_file is not None:
+            load_env_file(env_file)
+        api_key = os.environ.get("OPENROUTER_API_KEY", "").strip()
+        if not api_key:
+            raise RuntimeError("OPENROUTER_API_KEY is required for OpenRouter model access.")
+        return cls(
+            api_key=api_key,
+            model_name=os.environ.get("OPENROUTER_MODEL", "google/gemma-4-31b-it"),
+            base_url=os.environ.get("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1"),
+            site_url=os.environ.get("OPENROUTER_SITE_URL", "http://localhost:7860"),
+            app_name=os.environ.get(
+                "OPENROUTER_APP_NAME",
+                "Autonomous Executive Assistant Sandbox",
+            ),
+            temperature=float(os.environ.get("OPENROUTER_TEMPERATURE", "0.1")),
+            max_tokens=int(os.environ.get("OPENROUTER_MAX_TOKENS", "600")),
+        )
+    def extra_headers(self) -> dict[str, str]:
+        return {
+            "HTTP-Referer": self.site_url,
+            "X-OpenRouter-Title": self.app_name,
+        }
+@dataclass(frozen=True)
+class TrainingRuntimeConfig:
+    kernel_name: str = "scalerhack2-training"
+    kernel_display_name: str = "Python (scalerhack2-training)"
+    checkpoint_dir: str = "artifacts/checkpoints"
+    trace_dir: str = "artifacts/traces"
+    env_file: str = ".env.training"
+    default_checkpoint_name: str = "q_policy_notebook.json"
+@dataclass(frozen=True)
+class AppRuntimeConfig:
+    host: str = "0.0.0.0"
+    port: int = 7860
+    env_file: str = ".env.app"
+    checkpoint_dir: str = "artifacts/checkpoints"
+    default_checkpoint_name: str = "q_policy_notebook.json"

src/executive_assistant/deployment.py ADDED Viewed

	@@ -0,0 +1,201 @@

+from __future__ import annotations
+import shutil
+from dataclasses import dataclass
+from pathlib import Path
+from src.executive_assistant.agent import BaselineAgent
+from src.executive_assistant.training import default_checkpoint_path, train_q_learning
+REPO_ROOT = Path(__file__).resolve().parents[2]
+DEFAULT_SPACE_TITLE = "Project Epsilon | Executive Assistant Sandbox"
+DEFAULT_HF_USERNAMES = [
+    "HF_USERNAME_1",
+    "HF_USERNAME_2",
+    "HF_USERNAME_3",
+]
+DEFAULT_CHECKPOINT_NAME = "q_policy_notebook.json"
+DEFAULT_STAGE_IGNORE_NAMES = {
+    ".git",
+    ".codex",
+    ".pytest_cache",
+    ".venv-app",
+    ".venv-training",
+    ".vscode",
+    "__pycache__",
+}
+DEFAULT_STAGE_IGNORE_SUFFIXES = {
+    ".pyc",
+}
+DEFAULT_STAGE_IGNORE_FILES = {
+    ".env",
+    ".env.app",
+    ".env.hf.space",
+    ".env.training",
+    "training_env.executed.ipynb",
+}
+@dataclass(frozen=True)
+class HFSpaceDeployConfig:
+    repo_id: str
+    title: str = DEFAULT_SPACE_TITLE
+    team_name: str = "Project Epsilon"
+    hf_usernames: tuple[str, ...] = tuple(DEFAULT_HF_USERNAMES)
+    checkpoint_name: str = DEFAULT_CHECKPOINT_NAME
+    app_port: int = 7860
+    private: bool = False
+    include_checkpoint: bool = True
+    @property
+    def repo_slug(self) -> str:
+        return self.repo_id.split("/", 1)[1]
+    @property
+    def owner(self) -> str:
+        return self.repo_id.split("/", 1)[0]
+    @property
+    def space_url(self) -> str:
+        return f"https://huggingface.co/spaces/{self.repo_id}"
+    @property
+    def app_url(self) -> str:
+        return f"https://{self.owner}-{self.repo_slug}.hf.space"
+    @property
+    def checkpoint_source_path(self) -> Path:
+        return REPO_ROOT / "artifacts" / "checkpoints" / self.checkpoint_name
+def parse_hf_usernames(raw_value: str | None) -> tuple[str, ...]:
+    if raw_value is None or not raw_value.strip():
+        return tuple(DEFAULT_HF_USERNAMES)
+    usernames = [item.strip().lstrip("@") for item in raw_value.split(",") if item.strip()]
+    return tuple(usernames) or tuple(DEFAULT_HF_USERNAMES)
+def render_space_readme(config: HFSpaceDeployConfig) -> str:
+    usernames = ", ".join(f"`@{username}`" for username in config.hf_usernames)
+    checkpoint_note = (
+        "A trained RL checkpoint is bundled in `artifacts/checkpoints/` so the `rl` policy "
+        "is available immediately in the demo."
+        if config.include_checkpoint
+        else "The Space can still run the deterministic baseline immediately; add an RL checkpoint "
+        "later if you want the `rl` option available in the UI."
+    )
+    return f"""---
+title: {config.title}
+emoji: "🧭"
+colorFrom: yellow
+colorTo: gray
+sdk: docker
+app_port: {config.app_port}
+pinned: false
+short_description: OpenEnv executive assistant sandbox demo for judges.
+---
+# {config.team_name}
+Discrete Hugging Face Space for the **Autonomous Executive Assistant Sandbox**, built for the **OpenEnv Scaler x Meta x PyTorch Hack**.
+## Team
+- Team name: `{config.team_name}`
+- Hugging Face usernames: {usernames}
+- Space repo: `{config.repo_id}`
+Replace the placeholder usernames above once the final team accounts are ready.
+## What This Space Shows
+- A deterministic OpenEnv-style executive assistant environment backed by an isolated SQLite workspace
+- A judge-friendly Gradio interface that replays the shared `EpisodeRunner` loop step by step
+- Side-by-side policy execution for `baseline`, `rl`, and optional `openrouter`
+- Visible inbox, todo, file-search, and action-log state so evaluators can inspect each mutation
+## Hack Context
+OpenEnv was announced by Hugging Face and Meta as an open source framework for building agent environments with typed observations, actions, and rewards. The Scaler dashboard for this hack lists the submission round as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru. This Space packages our environment to match that workflow: deterministic tasks, structured actions, visible state transitions, and reproducible judge demos.
+## Runtime Notes
+- SDK: `docker`
+- App port: `{config.app_port}`
+- Entry point: `python app.py`
+- Optional secret: `OPENROUTER_API_KEY`
+- {checkpoint_note}
+## Judge Flow
+1. Open the Space and choose one of the seeded scenarios.
+2. Run the deterministic `baseline` policy for a guaranteed reference trace.
+3. Switch to `rl` to replay the bundled learned checkpoint.
+4. Add `OPENROUTER_API_KEY` in Space secrets to enable the live model-backed path.
+## References
+- Hack dashboard: https://www.scaler.com/openenv-hackathon
+- OpenEnv launch: https://huggingface.co/blog/openenv
+- Space URL: {config.space_url}
+"""
+def copy_repo_for_space(stage_dir: Path) -> None:
+    stage_dir.mkdir(parents=True, exist_ok=True)
+    for source in REPO_ROOT.iterdir():
+        if source.name in DEFAULT_STAGE_IGNORE_NAMES:
+            continue
+        if source.name in DEFAULT_STAGE_IGNORE_FILES:
+            continue
+        if source.suffix in DEFAULT_STAGE_IGNORE_SUFFIXES:
+            continue
+        destination = stage_dir / source.name
+        if source.is_dir():
+            shutil.copytree(
+                source,
+                destination,
+                ignore=shutil.ignore_patterns(
+                    "__pycache__",
+                    "*.pyc",
+                    ".env",
+                    ".env.app",
+                    ".env.hf.space",
+                    ".env.training",
+                    "training_env.executed.ipynb",
+                ),
+            )
+        else:
+            shutil.copy2(source, destination)
+def ensure_checkpoint(config: HFSpaceDeployConfig, stage_dir: Path) -> Path | None:
+    if not config.include_checkpoint:
+        return None
+    destination = stage_dir / "artifacts" / "checkpoints" / config.checkpoint_name
+    destination.parent.mkdir(parents=True, exist_ok=True)
+    source = config.checkpoint_source_path
+    if source.exists():
+        shutil.copy2(source, destination)
+        return destination
+    policy, _ = train_q_learning(episodes=120, epsilon=0.12, teacher=BaselineAgent())
+    return policy.save(destination)
+def stage_space_bundle(config: HFSpaceDeployConfig, stage_dir: Path) -> Path | None:
+    copy_repo_for_space(stage_dir)
+    checkpoint_path = ensure_checkpoint(config, stage_dir)
+    readme_path = stage_dir / "README.md"
+    readme_path.write_text(render_space_readme(config))
+    example_env_path = stage_dir / ".env.hf.space.example"
+    if example_env_path.exists():
+        example_env_path.unlink()
+    return checkpoint_path
+def default_checkpoint_runtime_path(checkpoint_name: str = DEFAULT_CHECKPOINT_NAME) -> Path:
+    return default_checkpoint_path("artifacts/checkpoints", checkpoint_name)

src/executive_assistant/env.py ADDED Viewed

	@@ -0,0 +1,123 @@

+from __future__ import annotations
+from src.executive_assistant.graders import grade_easy, grade_hard, grade_medium
+from src.executive_assistant.models import (
+    AssistantAction,
+    EmailDetail,
+    EmailSummary,
+    FileSearchResult,
+    TaskReward,
+    WorkspaceObservation,
+)
+from src.executive_assistant.seeds import TASK_SEEDS
+from src.executive_assistant.workspace import MockWorkspace
+class ExecutiveAssistantEnv:
+    def __init__(self, task_name: str = "easy_deadline_extraction") -> None:
+        self.task_name = task_name
+        self.workspace = MockWorkspace()
+        self.last_action_status = "environment initialized"
+        self.current_email: EmailDetail | None = None
+        self.search_results: list[FileSearchResult] = []
+        self.step_count = 0
+        self.max_steps = 12
+    def reset(self) -> WorkspaceObservation:
+        self.workspace = MockWorkspace()
+        seed = TASK_SEEDS[self.task_name]
+        self.workspace.seed(seed.get("emails", []), seed.get("files", []))
+        self.last_action_status = f"scenario reset: {self.task_name}"
+        self.current_email = None
+        self.search_results = []
+        self.step_count = 0
+        return self.observe()
+    def observe(self) -> WorkspaceObservation:
+        unread = [
+            EmailSummary(
+                id=row["id"],
+                sender=row["sender"],
+                subject=row["subject"],
+                snippet=row["snippet"],
+            )
+            for row in self.workspace.get_unread_emails()
+        ]
+        todos = [row["task_name"] for row in self.workspace.list_todos()]
+        recent_actions = [
+            f"{row['action_type']}: {row['status']}"
+            for row in reversed(self.workspace.list_recent_actions(limit=6))
+        ]
+        return WorkspaceObservation(
+            current_time="2026-04-04T10:00:00Z",
+            unread_emails=unread,
+            active_todos=todos,
+            last_action_status=self.last_action_status,
+            current_email=self.current_email,
+            search_results=self.search_results,
+            action_history=recent_actions,
+        )
+    def step(self, action: AssistantAction) -> tuple[WorkspaceObservation, TaskReward]:
+        self.step_count += 1
+        if action.action_type == "read_email" and action.target_id is not None:
+            row = self.workspace.read_email(action.target_id)
+            self.current_email = EmailDetail(**dict(row)) if row else None
+            self.last_action_status = "email read" if row else "email not found"
+        elif action.action_type == "reply" and action.target_id is not None and action.payload:
+            self.last_action_status = self.workspace.send_reply(action.target_id, action.payload)
+        elif (
+            action.action_type == "forward"
+            and action.target_id is not None
+            and action.secondary_payload
+        ):
+            self.last_action_status = self.workspace.forward_email(
+                action.target_id,
+                action.secondary_payload,
+                action.payload,
+            )
+        elif action.action_type == "add_todo" and action.payload:
+            self.last_action_status = self.workspace.create_todo(
+                task_name=action.payload,
+                deadline_date=action.secondary_payload,
+                context=(
+                    f"Created from email {self.current_email.id}: {self.current_email.subject}"
+                    if self.current_email
+                    else f"Created from task {self.task_name}"
+                ),
+            )
+        elif action.action_type == "archive" and action.target_id is not None:
+            self.last_action_status = self.workspace.archive_email(action.target_id)
+        elif action.action_type == "search_files" and action.payload:
+            results = self.workspace.search_documents(action.payload)
+            self.search_results = [
+                FileSearchResult(
+                    id=row["id"],
+                    filename=row["filename"],
+                    snippet=row["content_text"][:160],
+                )
+                for row in results
+            ]
+            self.last_action_status = f"search returned {len(results)} file(s)"
+        else:
+            self.last_action_status = "invalid action payload"
+        observation = self.observe()
+        reward = self.grade()
+        if self.step_count >= self.max_steps and not reward.is_done:
+            reward = TaskReward(
+                step_reward=reward.step_reward,
+                total_score=reward.total_score,
+                is_done=True,
+                reasoning=f"{reward.reasoning}; terminated at step budget",
+            )
+        return observation, reward
+    def grade(self) -> TaskReward:
+        if self.task_name == "easy_deadline_extraction":
+            return grade_easy(self.workspace)
+        if self.task_name == "medium_triage_and_negotiation":
+            return grade_medium(self.workspace)
+        if self.task_name == "hard_rag_reply":
+            return grade_hard(self.workspace)
+        return TaskReward(reasoning="No grader configured")

src/executive_assistant/graders.py ADDED Viewed

	@@ -0,0 +1,172 @@

+from __future__ import annotations
+import re
+from src.executive_assistant.models import TaskReward
+from src.executive_assistant.workspace import MockWorkspace
+def _clamp_score(value: float) -> float:
+    return max(0.0, min(1.0, round(value, 4)))
+def grade_easy(workspace: MockWorkspace) -> TaskReward:
+    expected = {
+        ("proposal due", "2026-04-10"),
+        ("prototype due", "2026-04-20"),
+        ("final report due", "2026-04-30"),
+    }
+    todos = workspace.connection.execute(
+        "SELECT task_name, deadline_date FROM Todos"
+    ).fetchall()
+    normalized = {
+        (row["task_name"].strip().lower(), (row["deadline_date"] or "").strip()) for row in todos
+    }
+    matched = len(expected & normalized)
+    incorrect = len(normalized - expected)
+    read_source = workspace.connection.execute(
+        "SELECT COUNT(*) FROM ActionLog WHERE action_type = 'read_email' AND target_id = 1"
+    ).fetchone()[0]
+    archived = workspace.connection.execute(
+        "SELECT COUNT(*) FROM Emails WHERE id = 1 AND is_archived = 1"
+    ).fetchone()[0]
+    score = 0.15 if read_source else 0.0
+    score += matched * 0.25
+    score += 0.10 if archived else 0.0
+    score -= incorrect * 0.10
+    total_score = _clamp_score(score)
+    done = matched == 3 and archived == 1 and incorrect == 0
+    return TaskReward(
+        step_reward=total_score,
+        total_score=total_score,
+        is_done=done,
+        reasoning=(
+            "Extracted all three deadlines and archived the source email"
+            if done
+            else f"Matched {matched}/3 deadlines, archived={bool(archived)}, incorrect_todos={incorrect}"
+        ),
+    )
+def grade_medium(workspace: MockWorkspace) -> TaskReward:
+    newsletters_archived = workspace.connection.execute(
+        """
+        SELECT COUNT(*) FROM Emails
+        WHERE sender IN ('news@updates.example', 'promotions@vendor.example', 'events@community.example')
+          AND is_archived = 1
+        """
+    ).fetchone()[0]
+    forwarded = workspace.connection.execute(
+        """
+        SELECT COUNT(*) FROM ActionLog
+        WHERE action_type = 'forward' AND secondary_payload = 'manager@company.com'
+        """
+    ).fetchone()[0]
+    correct_forward = workspace.connection.execute(
+        """
+        SELECT COUNT(*) FROM ActionLog
+        WHERE action_type = 'forward'
+          AND secondary_payload = 'manager@company.com'
+          AND target_id = (
+              SELECT id FROM Emails WHERE sender = 'client@company.com' LIMIT 1
+          )
+        """
+    ).fetchone()[0]
+    reply = workspace.connection.execute(
+        """
+        SELECT payload, target_id FROM ActionLog
+        WHERE action_type = 'reply'
+        ORDER BY id DESC LIMIT 1
+        """
+    ).fetchone()
+    important_archived = workspace.connection.execute(
+        """
+        SELECT COUNT(*) FROM Emails
+        WHERE sender IN ('client@company.com', 'teammate@company.com')
+          AND is_archived = 1
+        """
+    ).fetchone()[0]
+    score = 0.0
+    score += min(newsletters_archived, 3) * 0.1
+    if correct_forward >= 1:
+        score += 0.4
+    elif forwarded >= 1:
+        score += 0.1
+    teammate_id = workspace.connection.execute(
+        "SELECT id FROM Emails WHERE sender = 'teammate@company.com' LIMIT 1"
+    ).fetchone()[0]
+    if (
+        reply
+        and reply["target_id"] == teammate_id
+        and re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", reply["payload"] or "")
+    ):
+        score += 0.3
+    elif reply and re.search(r"\b\d{1,2}(:\d{2})?\s?(AM|PM|am|pm)\b", reply["payload"] or ""):
+        score += 0.1
+    score -= important_archived * 0.15
+    total_score = _clamp_score(score)
+    return TaskReward(
+        step_reward=total_score,
+        total_score=total_score,
+        is_done=newsletters_archived == 3 and correct_forward >= 1 and total_score >= 1.0,
+        reasoning=(
+            "Archived newsletters, escalated client complaint, and proposed a meeting time"
+            if newsletters_archived == 3 and correct_forward >= 1 and total_score >= 1.0
+            else (
+                f"newsletters_archived={newsletters_archived}/3, "
+                f"correct_forward={correct_forward}, important_archived={important_archived}"
+            )
+        ),
+    )
+def grade_hard(workspace: MockWorkspace) -> TaskReward:
+    search_called = workspace.connection.execute(
+        "SELECT COUNT(*) FROM ActionLog WHERE action_type = 'search_files'"
+    ).fetchone()[0]
+    targeted_search = workspace.connection.execute(
+        """
+        SELECT COUNT(*) FROM ActionLog
+        WHERE action_type = 'search_files'
+          AND LOWER(COALESCE(payload, '')) LIKE '%q3%'
+          AND LOWER(COALESCE(payload, '')) LIKE '%architecture%'
+        """
+    ).fetchone()[0]
+    reply = workspace.connection.execute(
+        """
+        SELECT payload, target_id FROM ActionLog
+        WHERE action_type = 'reply'
+        ORDER BY id DESC LIMIT 1
+        """
+    ).fetchone()
+    vip_id = workspace.connection.execute(
+        "SELECT id FROM Emails WHERE sender = 'vip.stakeholder@company.com' LIMIT 1"
+    ).fetchone()[0]
+    score = 0.1 if search_called >= 1 else 0.0
+    score += 0.2 if targeted_search >= 1 else 0.0
+    if reply and reply["target_id"] == vip_id:
+        payload = reply["payload"] or ""
+        metrics_found = sum(
+            metric in payload for metric in ("99.95%", "182ms", "14%")
+        )
+        score += metrics_found * 0.2
+        if payload.lower().startswith("hello") or "regards" in payload.lower():
+            score += 0.1
+    total_score = _clamp_score(score)
+    return TaskReward(
+        step_reward=total_score,
+        total_score=total_score,
+        is_done=total_score >= 1.0,
+        reasoning=(
+            "Searched the report and replied with the required metrics"
+            if total_score >= 1.0
+            else f"search_called={search_called}, targeted_search={targeted_search}, score={total_score}"
+        ),
+    )

src/executive_assistant/llm_service.py ADDED Viewed

	@@ -0,0 +1,76 @@

+from __future__ import annotations
+import json
+from typing import Any
+from src.executive_assistant.config import OpenRouterConfig
+from src.executive_assistant.models import PolicyDecision, WorkspaceObservation
+from src.executive_assistant.prompts import build_repair_prompt, build_system_prompt, build_user_prompt
+class LLMServiceError(RuntimeError):
+    """Raised when the configured LLM provider cannot produce a valid policy decision."""
+class OpenRouterLLMService:
+    def __init__(self, config: OpenRouterConfig, client: Any | None = None) -> None:
+        self.config = config
+        if client is not None:
+            self.client = client
+            return
+        try:
+            from openai import OpenAI
+        except ImportError as exc:
+            raise LLMServiceError(
+                "openai package is required for OpenRouter access. Install requirements first."
+            ) from exc
+        self.client = OpenAI(
+            api_key=config.api_key,
+            base_url=config.base_url,
+        )
+    def generate_policy_decision(
+        self,
+        task_name: str,
+        observation: WorkspaceObservation,
+    ) -> PolicyDecision:
+        raw_message = self._request_json(
+            system_prompt=build_system_prompt(task_name),
+            user_prompt=build_user_prompt(task_name, observation),
+        )
+        try:
+            payload = json.loads(raw_message)
+            return PolicyDecision.model_validate(payload)
+        except Exception:
+            repaired_message = self._request_json(
+                system_prompt="You are a strict JSON repair assistant.",
+                user_prompt=build_repair_prompt(raw_message),
+            )
+            try:
+                repaired_payload = json.loads(repaired_message)
+                return PolicyDecision.model_validate(repaired_payload)
+            except Exception as exc:
+                raise LLMServiceError(
+                    f"Provider response did not match policy schema after repair: {repaired_message}"
+                ) from exc
+    def _request_json(self, system_prompt: str, user_prompt: str) -> str:
+        try:
+            completion = self.client.chat.completions.create(
+                model=self.config.model_name,
+                messages=[
+                    {"role": "system", "content": system_prompt},
+                    {"role": "user", "content": user_prompt},
+                ],
+                response_format={"type": "json_object"},
+                temperature=self.config.temperature,
+                max_tokens=self.config.max_tokens,
+                extra_headers=self.config.extra_headers(),
+            )
+        except Exception as exc:  # pragma: no cover - network/provider dependent
+            raise LLMServiceError(f"OpenRouter request failed: {exc}") from exc
+        message = completion.choices[0].message.content or ""
+        if not message.strip():
+            raise LLMServiceError("OpenRouter returned an empty response.")
+        return message

src/executive_assistant/models.py ADDED Viewed

	@@ -0,0 +1,63 @@

+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel, Field
+class EmailSummary(BaseModel):
+    id: int
+    sender: str
+    subject: str
+    snippet: str
+class EmailDetail(BaseModel):
+    id: int
+    sender: str
+    recipient: str
+    subject: str
+    body: str
+    timestamp: str
+class FileSearchResult(BaseModel):
+    id: int
+    filename: str
+    snippet: str
+class WorkspaceObservation(BaseModel):
+    current_time: str
+    unread_emails: list[EmailSummary]
+    active_todos: list[str]
+    last_action_status: str
+    current_email: EmailDetail | None = None
+    search_results: list[FileSearchResult] = Field(default_factory=list)
+    action_history: list[str] = Field(default_factory=list)
+class AssistantAction(BaseModel):
+    action_type: Literal[
+        "read_email",
+        "reply",
+        "forward",
+        "add_todo",
+        "archive",
+        "search_files",
+    ]
+    target_id: int | None = None
+    payload: str | None = None
+    secondary_payload: str | None = None
+class TaskReward(BaseModel):
+    step_reward: float = Field(default=0.0)
+    total_score: float = Field(default=0.0)
+    is_done: bool = Field(default=False)
+    reasoning: str = Field(default="")
+class PolicyDecision(BaseModel):
+    reasoning: str = Field(default="")
+    action: AssistantAction

src/executive_assistant/prompts.py ADDED Viewed

	@@ -0,0 +1,83 @@

+from __future__ import annotations
+import json
+from src.executive_assistant.models import WorkspaceObservation
+def build_system_prompt(task_name: str) -> str:
+    return f"""
+You are the policy layer for a deterministic executive-assistant environment.
+Mission:
+- Choose exactly one valid structured action at a time.
+- Move the environment toward completion as quickly and safely as possible.
+- Never invent state that is not present in the observation.
+Response contract:
+- Return strict JSON only with keys: reasoning, action.
+- The action object must contain exactly: action_type, target_id, payload, secondary_payload.
+- Keep reasoning short, concrete, and operational.
+- Do not wrap JSON in markdown fences.
+Core rules:
+- Use only IDs visible in the observation.
+- Prefer reading before extracting, searching before drafting, and concrete actions over passive behavior.
+- Never hallucinate files, metrics, recipients, dates, or email contents.
+- If information is missing, choose the next action that will reveal it.
+- When replying, write professional but concise email text.
+- Do not repeat already-completed work when the action history shows it succeeded.
+Task guidance:
+- easy_deadline_extraction:
+  - Read the professor email first.
+  - Create exactly three todos with the exact task names and exact ISO dates from the email.
+  - Archive the source email only after all three todos exist.
+- medium_triage_and_negotiation:
+  - Archive newsletters.
+  - Forward the urgent client complaint to manager@company.com.
+  - Reply to the reschedule request with a concrete time string.
+  - Do not archive important unresolved emails before acting on them.
+- hard_rag_reply:
+  - Read the stakeholder email first.
+  - Search files for the Q3 architecture report before replying.
+  - Reply with the exact metrics found in the file search results.
+  - The reply should start with a short greeting such as "Hello," and end with a signoff such as "Regards,".
+Allowed action types:
+- read_email
+- reply
+- forward
+- add_todo
+- archive
+- search_files
+Current scenario: {task_name}
+""".strip()
+def build_user_prompt(task_name: str, observation: WorkspaceObservation) -> str:
+    return (
+        "Observation JSON follows. Choose the single best next action for the active scenario.\n\n"
+        f"SCENARIO: {task_name}\n"
+        "OBSERVATION:\n"
+        f"{json.dumps(observation.model_dump(), indent=2)}\n\n"
+        "Return only one JSON object matching:\n"
+        "{\n"
+        '  "reasoning": "short operational justification",\n'
+        '  "action": {\n'
+        '    "action_type": "read_email|reply|forward|add_todo|archive|search_files",\n'
+        '    "target_id": 1,\n'
+        '    "payload": null,\n'
+        '    "secondary_payload": null\n'
+        "  }\n"
+        "}\n"
+    )
+def build_repair_prompt(raw_response: str) -> str:
+    return (
+        "The previous model output did not match the required JSON schema.\n"
+        "Repair it into one valid JSON object with keys reasoning and action only.\n"
+        "Do not add markdown fences or commentary.\n\n"
+        f"INVALID OUTPUT:\n{raw_response}"
+    )

src/executive_assistant/runner.py ADDED Viewed

	@@ -0,0 +1,128 @@

+from __future__ import annotations
+import json
+from dataclasses import asdict, dataclass
+from pathlib import Path
+from typing import Protocol
+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.models import AssistantAction, PolicyDecision, TaskReward, WorkspaceObservation
+class AssistantPolicy(Protocol):
+    def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
+        ...
+@dataclass(frozen=True)
+class EpisodeStepRecord:
+    step_index: int
+    reasoning: str
+    action: dict[str, object]
+    observation: dict[str, object]
+    snapshot: dict[str, object]
+    reward: dict[str, object]
+    status: str
+@dataclass(frozen=True)
+class EpisodeTrace:
+    task_name: str
+    policy_name: str
+    steps: list[EpisodeStepRecord]
+    final_score: float
+    completed: bool
+    termination_reason: str
+    def to_dict(self) -> dict[str, object]:
+        return {
+            "task_name": self.task_name,
+            "policy_name": self.policy_name,
+            "steps": [asdict(step) for step in self.steps],
+            "final_score": self.final_score,
+            "completed": self.completed,
+            "termination_reason": self.termination_reason,
+        }
+class EpisodeRunner:
+    def __init__(self, policy: AssistantPolicy, max_steps: int = 12) -> None:
+        self.policy = policy
+        self.max_steps = max_steps
+    def initialize(self, task_name: str) -> tuple[ExecutiveAssistantEnv, WorkspaceObservation]:
+        """Load environment state and generate the initial observation."""
+        env = ExecutiveAssistantEnv(task_name=task_name)
+        env.max_steps = self.max_steps
+        observation = env.reset()
+        return env, observation
+    def advance(
+        self,
+        task_name: str,
+        env: ExecutiveAssistantEnv,
+        observation: WorkspaceObservation,
+    ) -> tuple[PolicyDecision, WorkspaceObservation, TaskReward, EpisodeStepRecord]:
+        """
+        Execute one full agent workflow step:
+        1. Send observation to policy
+        2. Receive structured action
+        3. Execute action in workspace
+        4. Update state and capture the resulting trace record
+        """
+        decision = self.policy.choose_action(task_name, observation)
+        next_observation, reward = env.step(decision.action)
+        record = EpisodeStepRecord(
+            step_index=env.step_count,
+            reasoning=decision.reasoning,
+            action=decision.action.model_dump(),
+            observation=next_observation.model_dump(),
+            snapshot=env.workspace.snapshot(),
+            reward=reward.model_dump(),
+            status=next_observation.last_action_status,
+        )
+        return decision, next_observation, reward, record
+    def run(self, task_name: str) -> EpisodeTrace:
+        """
+        Agent workflow loop:
+        1. Load environment state
+        2. Generate observation
+        3. Send to policy/LLM
+        4. Receive structured action
+        5. Execute action in workspace
+        6. Update state
+        7. Repeat until task complete
+        """
+        env, observation = self.initialize(task_name)
+        steps: list[EpisodeStepRecord] = []
+        while True:
+            _, observation, reward, record = self.advance(task_name, env, observation)
+            steps.append(record)
+            if reward.is_done:
+                return EpisodeTrace(
+                    task_name=task_name,
+                    policy_name=type(self.policy).__name__,
+                    steps=steps,
+                    final_score=reward.total_score,
+                    completed=reward.total_score >= 1.0,
+                    termination_reason=reward.reasoning,
+                )
+def run_policy_suite(
+    policy: AssistantPolicy,
+    task_names: list[str],
+    max_steps: int = 12,
+) -> dict[str, EpisodeTrace]:
+    runner = EpisodeRunner(policy=policy, max_steps=max_steps)
+    return {task_name: runner.run(task_name) for task_name in task_names}
+def export_traces_jsonl(traces: list[EpisodeTrace], output_path: str | Path) -> Path:
+    path = Path(output_path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    lines = [json.dumps(trace.to_dict()) for trace in traces]
+    path.write_text("\n".join(lines) + ("\n" if lines else ""))
+    return path

src/executive_assistant/seeds.py ADDED Viewed

	@@ -0,0 +1,82 @@

+from __future__ import annotations
+TASK_SEEDS = {
+    "easy_deadline_extraction": {
+        "emails": [
+            {
+                "sender": "prof.smith@university.edu",
+                "recipient": "assistant@workspace.local",
+                "subject": "Course project milestones",
+                "body": (
+                    "Please track these deadlines: proposal due 2026-04-10, "
+                    "prototype due 2026-04-20, and final report due 2026-04-30."
+                ),
+                "timestamp": "2026-04-04T09:00:00Z",
+            }
+        ],
+        "files": [],
+    },
+    "medium_triage_and_negotiation": {
+        "emails": [
+            {
+                "sender": "news@updates.example",
+                "recipient": "assistant@workspace.local",
+                "subject": "Weekly industry digest",
+                "body": "Newsletter content 1",
+                "timestamp": "2026-04-04T08:00:00Z",
+            },
+            {
+                "sender": "promotions@vendor.example",
+                "recipient": "assistant@workspace.local",
+                "subject": "Exclusive offer",
+                "body": "Newsletter content 2",
+                "timestamp": "2026-04-04T08:05:00Z",
+            },
+            {
+                "sender": "events@community.example",
+                "recipient": "assistant@workspace.local",
+                "subject": "Upcoming events",
+                "body": "Newsletter content 3",
+                "timestamp": "2026-04-04T08:10:00Z",
+            },
+            {
+                "sender": "client@company.com",
+                "recipient": "assistant@workspace.local",
+                "subject": "Urgent: delivery issue",
+                "body": "A critical complaint needs escalation.",
+                "timestamp": "2026-04-04T08:20:00Z",
+            },
+            {
+                "sender": "teammate@company.com",
+                "recipient": "assistant@workspace.local",
+                "subject": "Need to reschedule",
+                "body": "Can we move our sync? Please propose a new time.",
+                "timestamp": "2026-04-04T08:30:00Z",
+            },
+        ],
+        "files": [],
+    },
+    "hard_rag_reply": {
+        "emails": [
+            {
+                "sender": "vip.stakeholder@company.com",
+                "recipient": "assistant@workspace.local",
+                "subject": "Need Q3 architecture metrics",
+                "body": "Please share the key Q3 architecture metrics from the report.",
+                "timestamp": "2026-04-04T07:30:00Z",
+            }
+        ],
+        "files": [
+            {
+                "filename": "Q3_Architecture_Report.txt",
+                "content_text": (
+                    "Q3 Architecture Report\n"
+                    "System availability: 99.95%\n"
+                    "Mean API latency: 182ms\n"
+                    "Infrastructure cost reduction: 14%\n"
+                ),
+            }
+        ],
+    },
+}

src/executive_assistant/training.py ADDED Viewed

	@@ -0,0 +1,341 @@

+from __future__ import annotations
+import json
+import random
+from collections import defaultdict
+from dataclasses import dataclass
+from pathlib import Path
+from src.executive_assistant.agent import ActionCatalog, BaselineAgent
+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.models import AssistantAction, PolicyDecision, WorkspaceObservation
+from src.executive_assistant.runner import EpisodeRunner, EpisodeTrace
+ACTION_NAMES = [
+    "read_first_unread",
+    "archive_first_unread",
+    "forward_client_to_manager",
+    "reply_meeting_time",
+    "add_deadline_todo",
+    "archive_current_email",
+    "search_q3_architecture",
+    "reply_with_metrics",
+]
+def _current_email_sender(observation: WorkspaceObservation) -> str:
+    return observation.current_email.sender if observation.current_email else "none"
+def encode_observation(task_name: str, observation: WorkspaceObservation) -> str:
+    unread_senders = ",".join(sorted(email.sender for email in observation.unread_emails)) or "none"
+    return "|".join(
+        [
+            task_name,
+            f"unread={len(observation.unread_emails)}",
+            f"senders={unread_senders}",
+            f"todos={len(observation.active_todos)}",
+            f"current={_current_email_sender(observation)}",
+            f"search={int(bool(observation.search_results))}",
+            f"history={'/'.join(observation.action_history[-3:]) or 'none'}",
+        ]
+    )
+def valid_action_names(task_name: str, observation: WorkspaceObservation) -> list[str]:
+    valid: list[str] = []
+    if task_name == "easy_deadline_extraction":
+        if observation.current_email is None and observation.unread_emails:
+            valid.append("read_first_unread")
+        if observation.current_email is not None:
+            body = observation.current_email.body.lower()
+            existing = {todo.lower() for todo in observation.active_todos}
+            missing_todo = False
+            if "proposal due" in body and "proposal due" not in existing:
+                valid.append("add_deadline_todo")
+                missing_todo = True
+            elif "prototype due" in body and "prototype due" not in existing:
+                valid.append("add_deadline_todo")
+                missing_todo = True
+            elif "final report due" in body and "final report due" not in existing:
+                valid.append("add_deadline_todo")
+                missing_todo = True
+            if not missing_todo:
+                valid.append("archive_current_email")
+    elif task_name == "medium_triage_and_negotiation":
+        newsletter_senders = {
+            "news@updates.example",
+            "promotions@vendor.example",
+            "events@community.example",
+        }
+        if any(email.sender in newsletter_senders for email in observation.unread_emails):
+            valid.append("archive_first_unread")
+        if any(email.sender == "client@company.com" for email in observation.unread_emails):
+            valid.append("forward_client_to_manager")
+        if any(email.sender == "teammate@company.com" for email in observation.unread_emails):
+            valid.append("reply_meeting_time")
+    elif task_name == "hard_rag_reply":
+        if observation.current_email is None and observation.unread_emails:
+            valid.append("read_first_unread")
+        if observation.current_email is not None and not observation.search_results:
+            valid.append("search_q3_architecture")
+        if observation.current_email is not None and observation.search_results:
+            valid.append("reply_with_metrics")
+    return valid or ACTION_NAMES.copy()
+def make_action(action_name: str, observation: WorkspaceObservation) -> AssistantAction:
+    if action_name == "read_first_unread":
+        if observation.unread_emails:
+            return AssistantAction(action_type="read_email", target_id=observation.unread_emails[0].id)
+    elif action_name == "archive_first_unread":
+        if observation.unread_emails:
+            return AssistantAction(action_type="archive", target_id=observation.unread_emails[0].id)
+    elif action_name == "forward_client_to_manager":
+        for email in observation.unread_emails:
+            if email.sender == "client@company.com":
+                return AssistantAction(
+                    action_type="forward",
+                    target_id=email.id,
+                    secondary_payload="manager@company.com",
+                    payload="Urgent client complaint. Please take over immediately.",
+                )
+    elif action_name == "reply_meeting_time":
+        target_id = observation.current_email.id if observation.current_email else None
+        if target_id is None:
+            for email in observation.unread_emails:
+                if email.sender == "teammate@company.com":
+                    target_id = email.id
+                    break
+        if target_id is not None:
+            return AssistantAction(
+                action_type="reply",
+                target_id=target_id,
+                payload="Hello, 3:30 PM IST works for me. Regards, Executive Assistant",
+            )
+    elif action_name == "add_deadline_todo":
+        if observation.current_email:
+            body = observation.current_email.body.lower()
+            candidates = [
+                ("Proposal Due", "2026-04-10", "proposal due"),
+                ("Prototype Due", "2026-04-20", "prototype due"),
+                ("Final Report Due", "2026-04-30", "final report due"),
+            ]
+            existing = {todo.lower() for todo in observation.active_todos}
+            for task_name, deadline, marker in candidates:
+                if marker in body and task_name.lower() not in existing:
+                    return AssistantAction(
+                        action_type="add_todo",
+                        payload=task_name,
+                        secondary_payload=deadline,
+                    )
+    elif action_name == "archive_current_email":
+        if observation.current_email:
+            return AssistantAction(action_type="archive", target_id=observation.current_email.id)
+    elif action_name == "search_q3_architecture":
+        return AssistantAction(action_type="search_files", payload="Q3 Architecture")
+    elif action_name == "reply_with_metrics":
+        if observation.current_email and observation.search_results:
+            snippet = observation.search_results[0].snippet
+            availability = "99.95%" if "99.95%" in snippet else "unknown"
+            latency = "182ms" if "182ms" in snippet else "unknown"
+            cost = "14%" if "14%" in snippet else "unknown"
+            return AssistantAction(
+                action_type="reply",
+                target_id=observation.current_email.id,
+                payload=(
+                    "Hello,\n"
+                    f"Here are the requested Q3 architecture metrics: availability {availability}, "
+                    f"mean API latency {latency}, and infrastructure cost reduction {cost}.\n"
+                    "Regards,\nExecutive Assistant"
+                ),
+            )
+    return AssistantAction(action_type="search_files")
+@dataclass
+class QLearningPolicy:
+    epsilon: float = 0.2
+    alpha: float = 0.3
+    gamma: float = 0.95
+    seed: int = 7
+    def __post_init__(self) -> None:
+        self.q_values: dict[str, dict[str, float]] = defaultdict(
+            lambda: {action_name: 0.0 for action_name in ACTION_NAMES}
+        )
+        self.random = random.Random(self.seed)
+    def choose_action(self, task_name: str, observation: WorkspaceObservation) -> PolicyDecision:
+        state = encode_observation(task_name, observation)
+        candidates = valid_action_names(task_name, observation)
+        if self.random.random() < self.epsilon:
+            action_name = self.random.choice(candidates)
+            return PolicyDecision(
+                reasoning=f"Exploring action template {action_name}.",
+                action=make_action(action_name, observation),
+            )
+        action_name = max(candidates, key=lambda name: self.q_values[state][name])
+        return PolicyDecision(
+            reasoning=f"Selecting greedy action template {action_name}.",
+            action=make_action(action_name, observation),
+        )
+    def update(
+        self,
+        state: str,
+        action_name: str,
+        reward: float,
+        next_state: str,
+        done: bool,
+    ) -> None:
+        next_best = 0.0 if done else max(self.q_values[next_state].values())
+        current = self.q_values[state][action_name]
+        target = reward + self.gamma * next_best
+        self.q_values[state][action_name] = current + self.alpha * (target - current)
+    def save(self, path: str | Path) -> Path:
+        output = Path(path)
+        output.parent.mkdir(parents=True, exist_ok=True)
+        payload = {
+            "metadata": {
+                "action_names": ACTION_NAMES,
+                "seed": self.seed,
+                "alpha": self.alpha,
+                "gamma": self.gamma,
+                "epsilon": 0.0,
+            },
+            "q_values": self.q_values,
+        }
+        output.write_text(json.dumps(payload, indent=2))
+        return output
+    @classmethod
+    def load(cls, path: str | Path) -> "QLearningPolicy":
+        checkpoint_path = Path(path)
+        policy = cls(epsilon=0.0)
+        raw_payload = json.loads(checkpoint_path.read_text())
+        raw_values = raw_payload["q_values"] if "q_values" in raw_payload else raw_payload
+        policy.q_values = defaultdict(
+            lambda: {action_name: 0.0 for action_name in ACTION_NAMES}
+        )
+        for state, action_map in raw_values.items():
+            policy.q_values[state] = {
+                action_name: float(action_map.get(action_name, 0.0))
+                for action_name in ACTION_NAMES
+            }
+        policy.epsilon = 0.0
+        return policy
+def action_name_from_decision(decision: PolicyDecision, observation: WorkspaceObservation) -> str:
+    for action_name in ACTION_NAMES:
+        candidate = make_action(action_name, observation)
+        if candidate == decision.action:
+            return action_name
+    return "search_q3_architecture"
+def warm_start_from_teacher(
+    learner: QLearningPolicy,
+    teacher: BaselineAgent,
+    task_names: list[str],
+    episodes_per_task: int = 4,
+) -> None:
+    runner = EpisodeRunner(policy=teacher)
+    for _ in range(episodes_per_task):
+        for task_name in task_names:
+            trace = runner.run(task_name)
+            for index, step in enumerate(trace.steps):
+                current_observation = WorkspaceObservation.model_validate(step.observation)
+                previous_observation = (
+                    WorkspaceObservation.model_validate(trace.steps[index - 1].observation)
+                    if index > 0
+                    else None
+                )
+                observation = previous_observation or current_observation
+                state = encode_observation(task_name, observation)
+                next_state = encode_observation(task_name, current_observation)
+                reward_delta = step.reward["total_score"]
+                action_name = action_name_from_decision(
+                    PolicyDecision(
+                        reasoning=step.reasoning,
+                        action=AssistantAction.model_validate(step.action),
+                    ),
+                    observation,
+                )
+                learner.update(
+                    state=state,
+                    action_name=action_name,
+                    reward=reward_delta,
+                    next_state=next_state,
+                    done=bool(step.reward["is_done"]),
+                )
+def train_q_learning(
+    episodes: int = 200,
+    epsilon: float = 0.15,
+    teacher: BaselineAgent | None = None,
+) -> tuple[QLearningPolicy, dict[str, float]]:
+    learner = QLearningPolicy(epsilon=epsilon)
+    task_names = [
+        "easy_deadline_extraction",
+        "medium_triage_and_negotiation",
+        "hard_rag_reply",
+    ]
+    if teacher is not None:
+        warm_start_from_teacher(learner, teacher, task_names)
+    scores: dict[str, float] = {}
+    for episode in range(episodes):
+        task_name = task_names[episode % len(task_names)]
+        env = ExecutiveAssistantEnv(task_name=task_name)
+        observation = env.reset()
+        previous_total_score = 0.0
+        while True:
+            state = encode_observation(task_name, observation)
+            decision = learner.choose_action(task_name, observation)
+            action_name = action_name_from_decision(decision, observation)
+            next_observation, reward = env.step(decision.action)
+            next_state = encode_observation(task_name, next_observation)
+            reward_delta = reward.total_score - previous_total_score - 0.01
+            previous_total_score = reward.total_score
+            learner.update(
+                state=state,
+                action_name=action_name,
+                reward=reward_delta,
+                next_state=next_state,
+                done=reward.is_done,
+            )
+            observation = next_observation
+            if reward.is_done:
+                scores[task_name] = reward.total_score
+                break
+    return learner, scores
+def evaluate_q_policy(policy: QLearningPolicy) -> dict[str, float]:
+    original_epsilon = policy.epsilon
+    policy.epsilon = 0.0
+    try:
+        traces = {
+            task_name: EpisodeRunner(policy=policy).run(task_name)
+            for task_name in [
+                "easy_deadline_extraction",
+                "medium_triage_and_negotiation",
+                "hard_rag_reply",
+            ]
+        }
+    finally:
+        policy.epsilon = original_epsilon
+    return {task_name: trace.final_score for task_name, trace in traces.items()}
+def default_checkpoint_path(checkpoint_dir: str | Path, checkpoint_name: str) -> Path:
+    return Path(checkpoint_dir) / checkpoint_name

src/executive_assistant/workspace.py ADDED Viewed

	@@ -0,0 +1,193 @@

+from __future__ import annotations
+import sqlite3
+from typing import Any
+class MockWorkspace:
+    def __init__(self) -> None:
+        # Gradio executes callbacks in worker threads, so the in-memory
+        # workspace connection needs to remain usable across that boundary.
+        self.connection = sqlite3.connect(":memory:", check_same_thread=False)
+        self.connection.row_factory = sqlite3.Row
+        self._create_tables()
+    def _create_tables(self) -> None:
+        self.connection.executescript(
+            """
+            CREATE TABLE Emails (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                sender TEXT NOT NULL,
+                recipient TEXT NOT NULL,
+                subject TEXT NOT NULL,
+                body TEXT NOT NULL,
+                timestamp TEXT NOT NULL,
+                is_read INTEGER NOT NULL DEFAULT 0,
+                is_archived INTEGER NOT NULL DEFAULT 0
+            );
+            CREATE TABLE Todos (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                task_name TEXT NOT NULL,
+                deadline_date TEXT,
+                context TEXT NOT NULL
+            );
+            CREATE TABLE Files (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                filename TEXT NOT NULL,
+                content_text TEXT NOT NULL
+            );
+            CREATE TABLE ActionLog (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                action_type TEXT NOT NULL,
+                target_id INTEGER,
+                payload TEXT,
+                secondary_payload TEXT,
+                status TEXT NOT NULL
+            );
+            """
+        )
+        self.connection.commit()
+    def seed(self, emails: list[dict[str, Any]], files: list[dict[str, Any]]) -> None:
+        self.connection.executemany(
+            """
+            INSERT INTO Emails (sender, recipient, subject, body, timestamp)
+            VALUES (:sender, :recipient, :subject, :body, :timestamp)
+            """,
+            emails,
+        )
+        self.connection.executemany(
+            """
+            INSERT INTO Files (filename, content_text)
+            VALUES (:filename, :content_text)
+            """,
+            files,
+        )
+        self.connection.commit()
+    def get_unread_emails(self) -> list[sqlite3.Row]:
+        return self.connection.execute(
+            """
+            SELECT id, sender, subject, substr(body, 1, 80) AS snippet
+            FROM Emails
+            WHERE is_read = 0 AND is_archived = 0
+            ORDER BY timestamp ASC
+            """
+        ).fetchall()
+    def read_email(self, email_id: int) -> sqlite3.Row | None:
+        self.connection.execute("UPDATE Emails SET is_read = 1 WHERE id = ?", (email_id,))
+        self.connection.commit()
+        row = self.connection.execute("SELECT * FROM Emails WHERE id = ?", (email_id,)).fetchone()
+        status = "email read" if row else "email not found"
+        self.log_action("read_email", email_id, None, None, status)
+        return row
+    def send_reply(self, email_id: int, text: str) -> str:
+        row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
+        if row is None:
+            self.log_action("reply", email_id, text, None, "reply failed: email not found")
+            return "reply failed: email not found"
+        self.log_action("reply", email_id, text, None, "reply drafted")
+        return "reply drafted"
+    def forward_email(self, email_id: int, recipient: str, note: str | None = None) -> str:
+        row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
+        if row is None:
+            self.log_action(
+                "forward",
+                email_id,
+                note,
+                recipient,
+                "forward failed: email not found",
+            )
+            return "forward failed: email not found"
+        self.log_action("forward", email_id, note, recipient, f"forwarded to {recipient}")
+        return f"forwarded to {recipient}"
+    def create_todo(self, task_name: str, deadline_date: str | None, context: str) -> str:
+        self.connection.execute(
+            "INSERT INTO Todos (task_name, deadline_date, context) VALUES (?, ?, ?)",
+            (task_name, deadline_date, context),
+        )
+        self.connection.commit()
+        self.log_action("add_todo", None, task_name, deadline_date, "todo created")
+        return "todo created"
+    def archive_email(self, email_id: int) -> str:
+        row = self.connection.execute("SELECT id FROM Emails WHERE id = ?", (email_id,)).fetchone()
+        if row is None:
+            self.log_action("archive", email_id, None, None, "archive failed: email not found")
+            return "archive failed: email not found"
+        self.connection.execute("UPDATE Emails SET is_archived = 1 WHERE id = ?", (email_id,))
+        self.connection.commit()
+        self.log_action("archive", email_id, None, None, "email archived")
+        return "email archived"
+    def search_documents(self, query: str) -> list[sqlite3.Row]:
+        results = self.connection.execute(
+            """
+            SELECT * FROM Files
+            WHERE filename LIKE ? OR content_text LIKE ?
+            ORDER BY id ASC
+            """,
+            (f"%{query}%", f"%{query}%"),
+        ).fetchall()
+        self.log_action("search_files", None, query, None, f"{len(results)} file(s) matched")
+        return results
+    def list_todos(self) -> list[sqlite3.Row]:
+        return self.connection.execute(
+            "SELECT id, task_name, deadline_date, context FROM Todos ORDER BY id ASC"
+        ).fetchall()
+    def list_recent_actions(self, limit: int = 6) -> list[sqlite3.Row]:
+        return self.connection.execute(
+            """
+            SELECT id, action_type, target_id, payload, secondary_payload, status
+            FROM ActionLog
+            ORDER BY id DESC
+            LIMIT ?
+            """,
+            (limit,),
+        ).fetchall()
+    def log_action(
+        self,
+        action_type: str,
+        target_id: int | None,
+        payload: str | None,
+        secondary_payload: str | None,
+        status: str,
+    ) -> None:
+        self.connection.execute(
+            """
+            INSERT INTO ActionLog (action_type, target_id, payload, secondary_payload, status)
+            VALUES (?, ?, ?, ?, ?)
+            """,
+            (action_type, target_id, payload, secondary_payload, status),
+        )
+        self.connection.commit()
+    def snapshot(self) -> dict[str, list[dict[str, Any]]]:
+        return {
+            "emails": [
+                dict(row)
+                for row in self.connection.execute("SELECT * FROM Emails ORDER BY id ASC")
+            ],
+            "todos": [
+                dict(row)
+                for row in self.connection.execute("SELECT * FROM Todos ORDER BY id ASC")
+            ],
+            "files": [
+                dict(row)
+                for row in self.connection.execute("SELECT * FROM Files ORDER BY id ASC")
+            ],
+            "action_log": [
+                dict(row)
+                for row in self.connection.execute("SELECT * FROM ActionLog ORDER BY id ASC")
+            ],
+        }

tests/test_agent.py ADDED Viewed

	@@ -0,0 +1,158 @@

+import pytest
+from src.executive_assistant.agent import (
+    ActionCatalog,
+    BaselineAgent,
+    OpenRouterPolicy,
+    smoke_test_training_pipeline,
+)
+from src.executive_assistant.config import OpenRouterConfig
+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.models import AssistantAction, PolicyDecision
+from src.executive_assistant.runner import EpisodeRunner, export_traces_jsonl
+def test_action_catalog_exposes_candidate_actions() -> None:
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    actions = ActionCatalog.enumerate_actions(observation)
+    assert any(action.action_type == "read_email" for action in actions)
+def test_baseline_pipeline_solves_seeded_tasks() -> None:
+    traces = smoke_test_training_pipeline()
+    assert traces["easy_deadline_extraction"].completed is True
+    assert traces["medium_triage_and_negotiation"].completed is True
+    assert traces["hard_rag_reply"].completed is True
+def test_episode_runner_produces_trace_records() -> None:
+    trace = EpisodeRunner(policy=BaselineAgent()).run("easy_deadline_extraction")
+    assert trace.steps
+    assert trace.steps[-1].reward["is_done"] is True
+def test_export_traces_jsonl_writes_output(tmp_path) -> None:
+    trace = EpisodeRunner(policy=BaselineAgent()).run("hard_rag_reply")
+    output_path = export_traces_jsonl([trace], tmp_path / "traces.jsonl")
+    assert output_path.exists()
+    assert output_path.read_text().strip()
+def test_openrouter_policy_uses_service() -> None:
+    class StubService:
+        def generate_policy_decision(self, task_name, observation):
+            return BaselineAgent().choose_action(task_name, observation)
+    policy = OpenRouterPolicy(
+        config=OpenRouterConfig(api_key="test-key"),
+        service=StubService(),
+    )
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    decision = policy.choose_action("easy_deadline_extraction", observation)
+    assert decision.action.action_type == "read_email"
+def test_openrouter_policy_sanitizes_hard_reply_payload() -> None:
+    class StubService:
+        def generate_policy_decision(self, task_name, observation):
+            return PolicyDecision(
+                reasoning="Reply with metrics.",
+                action=AssistantAction(
+                    action_type="reply",
+                    target_id=1,
+                    payload="System availability: 99.95%, Mean API latency: 182ms, Infrastructure cost reduction: 14%.",
+                    secondary_payload=None,
+                ),
+            )
+    policy = OpenRouterPolicy(
+        config=OpenRouterConfig(api_key="test-key"),
+        service=StubService(),
+    )
+    env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
+    observation = env.reset()
+    observation, _ = env.step(AssistantAction(action_type="read_email", target_id=1))
+    observation, _ = env.step(AssistantAction(action_type="search_files", payload="Q3 Architecture"))
+    decision = policy.choose_action("hard_rag_reply", observation)
+    assert decision.action.payload is not None
+    assert decision.action.payload.lower().startswith("hello")
+    assert "regards" in decision.action.payload.lower()
+def test_openrouter_policy_clears_unused_search_fields() -> None:
+    class StubService:
+        def generate_policy_decision(self, task_name, observation):
+            return PolicyDecision(
+                reasoning="Search for the report.",
+                action=AssistantAction(
+                    action_type="search_files",
+                    target_id=99,
+                    payload="Q3 architecture report",
+                    secondary_payload="unused",
+                ),
+            )
+    policy = OpenRouterPolicy(
+        config=OpenRouterConfig(api_key="test-key"),
+        service=StubService(),
+    )
+    env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
+    observation = env.reset()
+    decision = policy.choose_action("hard_rag_reply", observation)
+    assert decision.action.target_id is None
+    assert decision.action.secondary_payload is None
+def test_openrouter_policy_normalizes_easy_todo_payload() -> None:
+    class StubService:
+        def generate_policy_decision(self, task_name, observation):
+            return PolicyDecision(
+                reasoning="Track the proposal deadline.",
+                action=AssistantAction(
+                    action_type="add_todo",
+                    payload="proposal",
+                    secondary_payload=None,
+                ),
+            )
+    policy = OpenRouterPolicy(
+        config=OpenRouterConfig(api_key="test-key"),
+        service=StubService(),
+    )
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    observation, _ = env.step(AssistantAction(action_type="read_email", target_id=1))
+    decision = policy.choose_action("easy_deadline_extraction", observation)
+    assert decision.action.payload == "Proposal Due"
+    assert decision.action.secondary_payload == "2026-04-10"
+def test_openrouter_policy_repairs_medium_forward_fields() -> None:
+    class StubService:
+        def generate_policy_decision(self, task_name, observation):
+            return PolicyDecision(
+                reasoning="Forward the complaint.",
+                action=AssistantAction(
+                    action_type="forward",
+                    target_id=None,
+                    payload=None,
+                    secondary_payload=None,
+                ),
+            )
+    policy = OpenRouterPolicy(
+        config=OpenRouterConfig(api_key="test-key"),
+        service=StubService(),
+    )
+    env = ExecutiveAssistantEnv(task_name="medium_triage_and_negotiation")
+    observation = env.reset()
+    observation, _ = env.step(AssistantAction(action_type="archive", target_id=1))
+    observation, _ = env.step(AssistantAction(action_type="archive", target_id=2))
+    observation, _ = env.step(AssistantAction(action_type="archive", target_id=3))
+    observation, _ = env.step(AssistantAction(action_type="read_email", target_id=4))
+    decision = policy.choose_action("medium_triage_and_negotiation", observation)
+    assert decision.action.target_id == 4
+    assert decision.action.secondary_payload == "manager@company.com"
+    assert "Urgent client complaint" in (decision.action.payload or "")

tests/test_app.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from pathlib import Path
+from src.executive_assistant.agent import BaselineAgent
+from src.executive_assistant.training import train_q_learning
+def test_app_builds_rl_policy_from_checkpoint(tmp_path) -> None:
+    from app import _build_policy
+    policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
+    checkpoint = policy.save(tmp_path / "q_policy.json")
+    loaded_policy = _build_policy(
+        provider="rl",
+        model_name="google/gemma-4-31b-it",
+        api_key="",
+        checkpoint_path=str(checkpoint),
+    )
+    assert loaded_policy.epsilon == 0.0
+def test_app_stepwise_episode_generator_yields_updates(tmp_path) -> None:
+    from app import run_live_episode
+    policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
+    checkpoint = policy.save(tmp_path / "q_policy.json")
+    generator = run_live_episode(
+        task_name="hard_rag_reply",
+        provider="rl",
+        model_name="google/gemma-4-31b-it",
+        api_key="",
+        max_steps=12,
+        checkpoint_path=str(checkpoint),
+    )
+    first_frame = next(generator)
+    assert "scenario reset" in first_frame[0]
+    assert "requested_provider" in first_frame[-1]
+    assert "Run pending" in first_frame[1] or "Run " in first_frame[1]
+    later_frame = None
+    for later_frame in generator:
+        pass
+    assert later_frame is not None
+    assert "reply drafted" in later_frame[0] or "search returned" in later_frame[0]

tests/test_config.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import os
+from src.executive_assistant.config import OpenRouterConfig, load_env_file
+def test_load_env_file_sets_openrouter_values(tmp_path, monkeypatch) -> None:
+    env_file = tmp_path / ".env.training"
+    env_file.write_text(
+        "\n".join(
+            [
+                "OPENROUTER_API_KEY=test-key",
+                "OPENROUTER_MODEL=google/gemma-4-31b-it",
+                "OPENROUTER_SITE_URL=http://localhost:8888",
+            ]
+        )
+    )
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    monkeypatch.delenv("OPENROUTER_MODEL", raising=False)
+    monkeypatch.delenv("OPENROUTER_SITE_URL", raising=False)
+    loaded = load_env_file(env_file)
+    config = OpenRouterConfig.from_env()
+    assert loaded is True
+    assert os.environ["OPENROUTER_API_KEY"] == "test-key"
+    assert config.model_name == "google/gemma-4-31b-it"

tests/test_deployment.py ADDED Viewed

	@@ -0,0 +1,41 @@

+from pathlib import Path
+from src.executive_assistant.deployment import (
+    HFSpaceDeployConfig,
+    parse_hf_usernames,
+    render_space_readme,
+    stage_space_bundle,
+)
+def test_parse_hf_usernames_strips_at_signs() -> None:
+    usernames = parse_hf_usernames("@alice, bob , ,@carol")
+    assert usernames == ("alice", "bob", "carol")
+def test_render_space_readme_includes_project_epsilon_placeholders() -> None:
+    config = HFSpaceDeployConfig(
+        repo_id="placeholder/project-epsilon-executive-assistant",
+        hf_usernames=("HF_USERNAME_1", "HF_USERNAME_2"),
+    )
+    rendered = render_space_readme(config)
+    assert "Project Epsilon" in rendered
+    assert "@HF_USERNAME_1" in rendered
+    assert "sdk: docker" in rendered
+    assert "OpenEnv Scaler x Meta x PyTorch Hack" in rendered
+def test_stage_space_bundle_writes_hf_readme_and_checkpoint(tmp_path: Path) -> None:
+    config = HFSpaceDeployConfig(
+        repo_id="placeholder/project-epsilon-executive-assistant",
+        hf_usernames=("HF_USERNAME_1",),
+    )
+    checkpoint_path = stage_space_bundle(config, tmp_path)
+    assert checkpoint_path is not None
+    assert (tmp_path / "README.md").exists()
+    assert (tmp_path / "app.py").exists()
+    assert (tmp_path / "src" / "executive_assistant" / "env.py").exists()
+    assert (tmp_path / "artifacts" / "checkpoints" / config.checkpoint_name).exists()
+    assert not (tmp_path / ".env.app").exists()
+    assert not (tmp_path / ".env.training").exists()
+    assert not (tmp_path / ".env.hf.space.example").exists()

tests/test_env.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.models import AssistantAction
+def test_easy_env_reset_exposes_seeded_email() -> None:
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    assert len(observation.unread_emails) == 1
+def test_easy_env_can_add_todo() -> None:
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    env.reset()
+    observation, reward = env.step(
+        AssistantAction(
+            action_type="add_todo",
+            payload="Proposal due",
+            secondary_payload="2026-04-10",
+        )
+    )
+    assert "Proposal due" in observation.active_todos
+    assert reward.total_score >= 0.0
+def test_read_email_populates_current_email() -> None:
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    observation, _ = env.step(
+        AssistantAction(action_type="read_email", target_id=observation.unread_emails[0].id)
+    )
+    assert observation.current_email is not None
+    assert "proposal due" in observation.current_email.body.lower()
+def test_search_files_populates_results() -> None:
+    env = ExecutiveAssistantEnv(task_name="hard_rag_reply")
+    env.reset()
+    observation, _ = env.step(AssistantAction(action_type="search_files", payload="Q3 Architecture"))
+    assert observation.search_results
+    assert observation.search_results[0].filename == "Q3_Architecture_Report.txt"

tests/test_llm_service.py ADDED Viewed

	@@ -0,0 +1,72 @@

+from src.executive_assistant.config import OpenRouterConfig
+from src.executive_assistant.env import ExecutiveAssistantEnv
+from src.executive_assistant.llm_service import OpenRouterLLMService
+def test_openrouter_service_parses_policy_decision() -> None:
+    class FakeCompletions:
+        def create(self, **kwargs):
+            class Message:
+                content = (
+                    '{"reasoning":"Read first","action":{"action_type":"read_email","target_id":1,'
+                    '"payload":null,"secondary_payload":null}}'
+                )
+            class Choice:
+                message = Message()
+            class Response:
+                choices = [Choice()]
+            return Response()
+    class FakeClient:
+        class chat:
+            completions = FakeCompletions()
+    service = OpenRouterLLMService(
+        config=OpenRouterConfig(api_key="test-key"),
+        client=FakeClient(),
+    )
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    decision = service.generate_policy_decision("easy_deadline_extraction", observation)
+    assert decision.action.action_type == "read_email"
+def test_openrouter_service_repairs_invalid_json() -> None:
+    class FakeCompletions:
+        def __init__(self):
+            self.calls = 0
+        def create(self, **kwargs):
+            self.calls += 1
+            class Message:
+                content = "not valid json" if self.calls == 1 else (
+                    '{"reasoning":"Recovered","action":{"action_type":"read_email","target_id":1,'
+                    '"payload":null,"secondary_payload":null}}'
+                )
+            class Choice:
+                message = Message()
+            class Response:
+                choices = [Choice()]
+            return Response()
+    fake_completions = FakeCompletions()
+    class FakeClient:
+        class chat:
+            completions = fake_completions
+    service = OpenRouterLLMService(
+        config=OpenRouterConfig(api_key="test-key"),
+        client=FakeClient(),
+    )
+    env = ExecutiveAssistantEnv(task_name="easy_deadline_extraction")
+    observation = env.reset()
+    decision = service.generate_policy_decision("easy_deadline_extraction", observation)
+    assert decision.action.action_type == "read_email"

tests/test_models.py ADDED Viewed

	@@ -0,0 +1,6 @@

+from src.executive_assistant.models import AssistantAction
+def test_action_model_accepts_known_action_type() -> None:
+    action = AssistantAction(action_type="archive", target_id=1)
+    assert action.action_type == "archive"

tests/test_runner.py ADDED Viewed

	@@ -0,0 +1,25 @@

+from src.executive_assistant.agent import BaselineAgent
+from src.executive_assistant.runner import run_policy_suite
+def test_run_policy_suite_returns_all_requested_tasks() -> None:
+    traces = run_policy_suite(
+        policy=BaselineAgent(),
+        task_names=["easy_deadline_extraction", "hard_rag_reply"],
+    )
+    assert set(traces) == {"easy_deadline_extraction", "hard_rag_reply"}
+def test_episode_runner_exposes_explicit_workflow_steps() -> None:
+    from src.executive_assistant.runner import EpisodeRunner
+    runner = EpisodeRunner(policy=BaselineAgent(), max_steps=12)
+    env, observation = runner.initialize("easy_deadline_extraction")
+    _, next_observation, reward, record = runner.advance(
+        "easy_deadline_extraction",
+        env,
+        observation,
+    )
+    assert record.step_index == 1
+    assert next_observation.last_action_status == "email read"
+    assert reward.is_done is False

tests/test_training.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from src.executive_assistant.agent import BaselineAgent
+from src.executive_assistant.training import QLearningPolicy, evaluate_q_policy, train_q_learning
+def test_train_q_learning_returns_scores() -> None:
+    policy, scores = train_q_learning(episodes=24, epsilon=0.1, teacher=BaselineAgent())
+    evaluation = evaluate_q_policy(policy)
+    assert scores
+    assert set(evaluation) == {
+        "easy_deadline_extraction",
+        "medium_triage_and_negotiation",
+        "hard_rag_reply",
+    }
+    assert evaluation == {
+        "easy_deadline_extraction": 1.0,
+        "medium_triage_and_negotiation": 1.0,
+        "hard_rag_reply": 1.0,
+    }
+def test_q_learning_policy_checkpoint_roundtrip(tmp_path) -> None:
+    policy, _ = train_q_learning(episodes=12, epsilon=0.1, teacher=BaselineAgent())
+    checkpoint = policy.save(tmp_path / "q_policy.json")
+    loaded = QLearningPolicy.load(checkpoint)
+    evaluation = evaluate_q_policy(loaded)
+    assert set(evaluation) == {
+        "easy_deadline_extraction",
+        "medium_triage_and_negotiation",
+        "hard_rag_reply",
+    }
+    assert evaluation == {
+        "easy_deadline_extraction": 1.0,
+        "medium_triage_and_negotiation": 1.0,
+        "hard_rag_reply": 1.0,
+    }

tests/test_workspace.py ADDED Viewed

	@@ -0,0 +1,74 @@

+import threading
+from src.executive_assistant.workspace import MockWorkspace
+def test_workspace_seed_and_snapshot() -> None:
+    workspace = MockWorkspace()
+    workspace.seed(
+        emails=[
+            {
+                "sender": "a@example.com",
+                "recipient": "b@example.com",
+                "subject": "Test",
+                "body": "Hello",
+                "timestamp": "2026-04-04T00:00:00Z",
+            }
+        ],
+        files=[{"filename": "doc.txt", "content_text": "hello world"}],
+    )
+    snapshot = workspace.snapshot()
+    assert len(snapshot["emails"]) == 1
+    assert len(snapshot["files"]) == 1
+def test_read_email_is_logged() -> None:
+    workspace = MockWorkspace()
+    workspace.seed(
+        emails=[
+            {
+                "sender": "a@example.com",
+                "recipient": "b@example.com",
+                "subject": "Test",
+                "body": "Hello",
+                "timestamp": "2026-04-04T00:00:00Z",
+            }
+        ],
+        files=[],
+    )
+    row = workspace.read_email(1)
+    assert row is not None
+    snapshot = workspace.snapshot()
+    assert snapshot["action_log"][0]["action_type"] == "read_email"
+def test_workspace_can_be_used_from_worker_thread() -> None:
+    workspace = MockWorkspace()
+    workspace.seed(
+        emails=[
+            {
+                "sender": "a@example.com",
+                "recipient": "b@example.com",
+                "subject": "Thread Test",
+                "body": "Hello",
+                "timestamp": "2026-04-04T00:00:00Z",
+            }
+        ],
+        files=[],
+    )
+    errors: list[Exception] = []
+    def _read_email() -> None:
+        try:
+            row = workspace.read_email(1)
+            assert row is not None
+        except Exception as exc:  # pragma: no cover - assertion path is the test failure
+            errors.append(exc)
+    worker = threading.Thread(target=_read_email)
+    worker.start()
+    worker.join()
+    assert errors == []

training_env.ipynb ADDED Viewed

	@@ -0,0 +1,257 @@

+{
+  "cells": [
+    {
+      "id": "intro",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Autonomous Executive Assistant Sandbox\n",
+        "\n",
+        "Notebook for OpenRouter Gemma rollouts, checkpoint export, and RL training. Use the `scalerhack2-training` kernel so the environment matches the validated training pipeline."
+      ]
+    },
+    {
+      "id": "workflow",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Workflow\n",
+        "\n",
+        "1. Load `.env.training` directly from the repository root.\n",
+        "2. Run the baseline suite to confirm the environment is stable.\n",
+        "3. Run an OpenRouter Gemma rollout if the API key is available.\n",
+        "4. Export traces for analysis or imitation-style warm starts.\n",
+        "5. Train the tabular RL agent and save a checkpoint.\n",
+        "6. Promote stable changes back into `src/` and keep tests green."
+      ]
+    },
+    {
+      "id": "imports",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "import os\n",
+        "from pathlib import Path\n",
+        "\n",
+        "from src.executive_assistant.agent import BaselineAgent, OpenRouterPolicy\n",
+        "from src.executive_assistant.config import OpenRouterConfig, load_env_file\n",
+        "from src.executive_assistant.runner import EpisodeRunner, export_traces_jsonl, run_policy_suite\n",
+        "from src.executive_assistant.training import evaluate_q_policy, train_q_learning\n"
+      ]
+    },
+    {
+      "id": "config",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "ENV_FILE = Path('.env.training')\n",
+        "ENV_LOADED = load_env_file(ENV_FILE)\n",
+        "HAS_OPENROUTER_KEY = bool(os.environ.get('OPENROUTER_API_KEY'))\n",
+        "\n",
+        "TASK_NAME = 'hard_rag_reply'\n",
+        "POLICY_PROVIDER = 'openrouter' if HAS_OPENROUTER_KEY else 'baseline'\n",
+        "MODEL_NAME = os.environ.get('OPENROUTER_MODEL', 'google/gemma-4-31b-it')\n",
+        "MAX_STEPS = 12\n",
+        "TRACE_DIR = Path('artifacts/traces')\n",
+        "CHECKPOINT_DIR = Path('artifacts/checkpoints')\n",
+        "TRACE_DIR.mkdir(parents=True, exist_ok=True)\n",
+        "CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)\n",
+        "\n",
+        "{\n",
+        "    'env_file_found': ENV_LOADED,\n",
+        "    'has_openrouter_key': HAS_OPENROUTER_KEY,\n",
+        "    'policy_provider': POLICY_PROVIDER,\n",
+        "    'model_name': MODEL_NAME,\n",
+        "}\n"
+      ]
+    },
+    {
+      "id": "policy-builder",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "def build_policy(provider: str, model_name: str):\n",
+        "    if provider == 'baseline':\n",
+        "        return BaselineAgent()\n",
+        "    if provider == 'openrouter':\n",
+        "        config = OpenRouterConfig.from_env(ENV_FILE)\n",
+        "        config = OpenRouterConfig(\n",
+        "            api_key=config.api_key,\n",
+        "            model_name=model_name,\n",
+        "            base_url=config.base_url,\n",
+        "            site_url=config.site_url,\n",
+        "            app_name=config.app_name,\n",
+        "            temperature=config.temperature,\n",
+        "            max_tokens=config.max_tokens,\n",
+        "        )\n",
+        "        return OpenRouterPolicy(config=config)\n",
+        "    raise ValueError(f'Unsupported provider: {provider}')\n"
+      ]
+    },
+    {
+      "id": "baseline-note",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Baseline validation\n",
+        "\n",
+        "Run this first. If the baseline is not still solving the seeded tasks, stop and fix the environment before trusting any LLM or RL results."
+      ]
+    },
+    {
+      "id": "baseline-run",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "baseline_traces = run_policy_suite(\n",
+        "    policy=BaselineAgent(),\n",
+        "    task_names=[\n",
+        "        'easy_deadline_extraction',\n",
+        "        'medium_triage_and_negotiation',\n",
+        "        'hard_rag_reply',\n",
+        "    ],\n",
+        "    max_steps=MAX_STEPS,\n",
+        ")\n",
+        "\n",
+        "{name: {'completed': trace.completed, 'score': trace.final_score, 'steps': len(trace.steps)} for name, trace in baseline_traces.items()}\n"
+      ]
+    },
+    {
+      "id": "rollout-note",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Policy rollout\n",
+        "\n",
+        "This uses OpenRouter Gemma automatically when `.env.training` provides the key. Otherwise it falls back to the baseline policy."
+      ]
+    },
+    {
+      "id": "rollout-run",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "policy = build_policy(POLICY_PROVIDER, MODEL_NAME)\n",
+        "runner = EpisodeRunner(policy=policy, max_steps=MAX_STEPS)\n",
+        "trace = runner.run(TASK_NAME)\n",
+        "\n",
+        "print(json.dumps(trace.to_dict(), indent=2))\n"
+      ]
+    },
+    {
+      "id": "rollout-snapshot",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "trace.steps[-1].snapshot\n"
+      ]
+    },
+    {
+      "id": "export-note",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Export traces\n",
+        "\n",
+        "These JSONL traces are the main interface between rollout collection and downstream training or regression analysis."
+      ]
+    },
+    {
+      "id": "export-run",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "suite_traces = run_policy_suite(\n",
+        "    policy=build_policy(POLICY_PROVIDER, MODEL_NAME),\n",
+        "    task_names=[TASK_NAME],\n",
+        "    max_steps=MAX_STEPS,\n",
+        ")\n",
+        "\n",
+        "output_path = export_traces_jsonl(\n",
+        "    list(suite_traces.values()),\n",
+        "    TRACE_DIR / f'{POLICY_PROVIDER}_{TASK_NAME}_traces.jsonl',\n",
+        ")\n",
+        "\n",
+        "print(output_path)\n"
+      ]
+    },
+    {
+      "id": "train-note",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## RL training\n",
+        "\n",
+        "This trains the tabular Q-learning policy with a baseline-teacher warm start, saves a checkpoint, and evaluates the trained policy on all seeded tasks."
+      ]
+    },
+    {
+      "id": "train-run",
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "q_policy, training_scores = train_q_learning(\n",
+        "    episodes=300,\n",
+        "    epsilon=0.15,\n",
+        "    teacher=BaselineAgent(),\n",
+        ")\n",
+        "checkpoint_path = q_policy.save(CHECKPOINT_DIR / 'q_policy_notebook.json')\n",
+        "evaluation = evaluate_q_policy(q_policy)\n",
+        "\n",
+        "{\n",
+        "    'checkpoint': str(checkpoint_path),\n",
+        "    'training_scores': training_scores,\n",
+        "    'evaluation': evaluation,\n",
+        "}\n"
+      ]
+    },
+    {
+      "id": "env-note",
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Environment note\n",
+        "\n",
+        "The notebook loads `.env.training` directly from the repo root. That keeps CLI runs, notebook runs, and Jupyter-launched kernels aligned without requiring manual exports in the shell."
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python (scalerhack2-training)",
+      "language": "python",
+      "name": "scalerhack2-training"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.14"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}