Flickinshots commited on
Commit
bf6550d
·
verified ·
1 Parent(s): fa6dcee

Deploy Project Epsilon Space bundle

Browse files
pyproject.toml CHANGED
@@ -8,9 +8,23 @@ version = "0.1.0"
8
  description = "Deterministic executive assistant environment backed by an in-memory SQLite workspace."
9
  readme = "README.md"
10
  requires-python = ">=3.11"
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  [tool.setuptools]
13
  package-dir = {"" = "src"}
 
14
 
15
  [tool.setuptools.packages.find]
16
  where = ["src"]
 
8
  description = "Deterministic executive assistant environment backed by an in-memory SQLite workspace."
9
  readme = "README.md"
10
  requires-python = ">=3.11"
11
+ dependencies = [
12
+ "fastapi>=0.115.0",
13
+ "gradio>=5.0.0",
14
+ "openai>=1.76.0",
15
+ "openenv-core>=0.2.0",
16
+ "pydantic>=2.8.0",
17
+ "pytest>=8.0.0",
18
+ "PyYAML>=6.0.0",
19
+ "uvicorn>=0.30.0",
20
+ ]
21
+
22
+ [project.scripts]
23
+ server = "src.server.app:main"
24
 
25
  [tool.setuptools]
26
  package-dir = {"" = "src"}
27
+ py-modules = ["app"]
28
 
29
  [tool.setuptools.packages.find]
30
  where = ["src"]
requirements.txt CHANGED
@@ -1,6 +1,7 @@
1
  fastapi>=0.115.0
2
  gradio>=5.0.0
3
  openai>=1.76.0
 
4
  pydantic>=2.8.0
5
  pytest>=8.0.0
6
  PyYAML>=6.0.0
 
1
  fastapi>=0.115.0
2
  gradio>=5.0.0
3
  openai>=1.76.0
4
+ openenv-core>=0.2.0
5
  pydantic>=2.8.0
6
  pytest>=8.0.0
7
  PyYAML>=6.0.0
server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ from .app import app, main
server/app.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from src.server.app import app, main
src/autonomous_executive_assistant_sandbox.egg-info/PKG-INFO ADDED
@@ -0,0 +1,293 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: autonomous-executive-assistant-sandbox
3
+ Version: 0.1.0
4
+ Summary: Deterministic executive assistant environment backed by an in-memory SQLite workspace.
5
+ Requires-Python: >=3.11
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: fastapi>=0.115.0
8
+ Requires-Dist: gradio>=5.0.0
9
+ Requires-Dist: openai>=1.76.0
10
+ Requires-Dist: openenv-core>=0.2.0
11
+ Requires-Dist: pydantic>=2.8.0
12
+ Requires-Dist: pytest>=8.0.0
13
+ Requires-Dist: PyYAML>=6.0.0
14
+ Requires-Dist: uvicorn>=0.30.0
15
+
16
+ # Autonomous Executive Assistant Sandbox
17
+
18
+ Deterministic RL-style workspace for an executive-assistant agent operating over a mock inbox, todo list, and local document store.
19
+
20
+ This project is being packaged for deployment to Hugging Face Spaces as a judge-facing demo for the **OpenEnv Scaler x Meta x PyTorch Hack**. The hack dashboard currently lists the main build round as **March 25, 2026 through April 8, 2026**, with finals on **April 25-26, 2026** in Bengaluru.
21
+
22
+ ## Project status
23
+
24
+ This repository is scaffolded from the product requirements in [PRD.md](./PRD.md). The current setup establishes:
25
+
26
+ - a Python package layout for the environment, agent, graders, and seed data
27
+ - an OpenEnv-oriented contract using Pydantic models
28
+ - an OpenRouter-backed Gemma inference path through the OpenAI client for validator-facing model execution
29
+ - a tabular RL training pipeline with rollout export and checkpointing
30
+ - separate app and training environments, including a registered Jupyter kernel
31
+
32
+ ## Environment Description
33
+
34
+ This environment models a real knowledge-work loop that humans perform every day:
35
+
36
+ - triaging email
37
+ - extracting structured tasks from unstructured communication
38
+ - escalating urgent requests
39
+ - negotiating meeting times
40
+ - searching local documents before replying
41
+
42
+ The environment is intentionally deterministic so an agent can be graded on workflow quality rather than on lucky wording. Instead of relying on Gmail, a live calendar, or a live file server, the system uses an isolated SQLite-backed workspace that simulates:
43
+
44
+ - an inbox with seeded emails
45
+ - a todo list with deadlines and context
46
+ - a file store for retrieval-style tasks
47
+ - an action log for deterministic grading and reward shaping
48
+
49
+ ## OpenEnv Interface
50
+
51
+ The environment entrypoint is [src/executive_assistant/env.py](/home/ranl/code/scalerhack2/src/executive_assistant/env.py).
52
+
53
+ - `reset()` seeds a task-specific workspace and returns the initial typed observation.
54
+ - `step(action)` executes a typed action and returns `(observation, reward, done, info)`.
55
+ - `state()` returns the current environment state, including the observation snapshot and full workspace tables.
56
+ - [openenv.yaml](./openenv.yaml) binds the environment class and typed models together.
57
+
58
+ The Hugging Face Space also exposes validator-friendly HTTP endpoints alongside the Gradio UI:
59
+
60
+ - `POST /openenv/reset`
61
+ - `POST /openenv/step`
62
+ - `GET /openenv/state`
63
+ - `GET /health`
64
+
65
+ The shorter `/reset`, `/step`, and `/state` aliases are also available for validators that probe non-prefixed routes.
66
+
67
+ ### Observation Space
68
+
69
+ `WorkspaceObservation` includes:
70
+
71
+ - `current_time`
72
+ - `unread_emails`
73
+ - `active_todos`
74
+ - `last_action_status`
75
+ - `current_email`
76
+ - `search_results`
77
+ - `action_history`
78
+
79
+ ### Action Space
80
+
81
+ `AssistantAction.action_type` supports:
82
+
83
+ - `read_email`
84
+ - `reply`
85
+ - `forward`
86
+ - `add_todo`
87
+ - `archive`
88
+ - `search_files`
89
+
90
+ ### Reward Space
91
+
92
+ `TaskReward` includes:
93
+
94
+ - `step_reward`
95
+ - `total_score`
96
+ - `is_done`
97
+ - `reasoning`
98
+
99
+ Rewards are dense and shaped over the full trajectory. Partial progress is rewarded, invalid or undesirable behavior lowers score indirectly through missed milestones and penalties, and episodes terminate at a fixed step budget.
100
+
101
+ ## Tasks And Difficulty
102
+
103
+ ### Easy: `easy_deadline_extraction`
104
+
105
+ Read a seeded academic email, extract three exact deadlines into todos, and archive the source message.
106
+
107
+ ### Medium: `medium_triage_and_negotiation`
108
+
109
+ Archive newsletters, escalate a client complaint to the manager, and reply with a concrete time to a meeting reschedule request.
110
+
111
+ ### Hard: `hard_rag_reply`
112
+
113
+ Read a stakeholder request, search the local report store, retrieve the relevant metrics, and reply with the correct grounded values.
114
+
115
+ All three tasks are deterministically graded in [src/executive_assistant/graders.py](/home/ranl/code/scalerhack2/src/executive_assistant/graders.py) with scores clamped to `0.0–1.0`.
116
+
117
+ ## Repository layout
118
+
119
+ ```text
120
+ .
121
+ ├── app.py
122
+ ├── openenv.yaml
123
+ ├── requirements.txt
124
+ ├── requirements.app.txt
125
+ ├── requirements.training.txt
126
+ ├── training_env.ipynb
127
+ ├── src/
128
+ │ └── executive_assistant/
129
+ │ ├── agent.py
130
+ │ ├── config.py
131
+ │ ├── env.py
132
+ │ ├── graders.py
133
+ │ ├── llm_service.py
134
+ │ ├── models.py
135
+ │ ├── prompts.py
136
+ │ ├── runner.py
137
+ │ ├── seeds.py
138
+ │ ├── training.py
139
+ │ └── workspace.py
140
+ ├── scripts/
141
+ │ ├── evaluate_policies.py
142
+ │ ├── run_policy_episode.py
143
+ │ ├── setup_app_env.sh
144
+ │ ├── setup_training_env.sh
145
+ │ └── train_rl_agent.py
146
+ └── tests/
147
+ ├── test_agent.py
148
+ ├── test_env.py
149
+ ├── test_llm_service.py
150
+ ├── test_models.py
151
+ ├── test_runner.py
152
+ ├── test_training.py
153
+ └── test_workspace.py
154
+ ```
155
+
156
+ ## Environment setup
157
+
158
+ ```bash
159
+ bash scripts/setup_app_env.sh
160
+ bash scripts/setup_training_env.sh
161
+ ```
162
+
163
+ The training setup registers the Jupyter kernel `scalerhack2-training`.
164
+
165
+ ## Validation and runners
166
+
167
+ Run the deterministic baseline across all seeded tasks:
168
+
169
+ ```bash
170
+ .venv-training/bin/python scripts/evaluate_policies.py --provider baseline
171
+ ```
172
+
173
+ Current deterministic baseline scores:
174
+
175
+ - `easy_deadline_extraction`: `1.0`
176
+ - `medium_triage_and_negotiation`: `1.0`
177
+ - `hard_rag_reply`: `1.0`
178
+
179
+ Run the required root-level inference script through the OpenRouter API using the OpenAI client compatibility layer. The canonical setup is:
180
+
181
+ ```bash
182
+ OPENROUTER_API_KEY=... \
183
+ API_BASE_URL=https://openrouter.ai/api/v1 \
184
+ MODEL_NAME=google/gemma-4-31b-it \
185
+ .venv-training/bin/python inference.py
186
+ ```
187
+
188
+ If a validator requires the `OPENAI_API_KEY` variable name, set it to the same OpenRouter key:
189
+
190
+ ```bash
191
+ OPENAI_API_KEY=... \
192
+ API_BASE_URL=https://openrouter.ai/api/v1 \
193
+ MODEL_NAME=google/gemma-4-31b-it \
194
+ .venv-training/bin/python inference.py
195
+ ```
196
+
197
+ Run a single episode and print the full trace:
198
+
199
+ ```bash
200
+ .venv-training/bin/python scripts/run_policy_episode.py --task hard_rag_reply --provider baseline
201
+ ```
202
+
203
+ Run the RL training smoke pipeline and save a checkpoint:
204
+
205
+ ```bash
206
+ .venv-training/bin/python scripts/train_rl_agent.py --episodes 300
207
+ ```
208
+
209
+ Start the deployed app runtime:
210
+
211
+ ```bash
212
+ .venv-app/bin/python app.py
213
+ ```
214
+
215
+ The live Gradio app intentionally exposes only `baseline` and `rl`. In `rl` mode, the app loads the trained JSON checkpoint, injects that checkpoint recommendation into the observation context, and asks OpenRouter Gemma through the OpenAI client to generate the runtime action. If the model call fails, the app falls back to the checkpoint action so the demo remains runnable.
216
+
217
+ The repository intentionally uses the `openai` Python client with `base_url=https://openrouter.ai/api/v1` and `MODEL_NAME=google/gemma-4-31b-it`. It accepts the hackathon-compatible aliases `OPENAI_API_KEY`, `API_BASE_URL`, and `MODEL_NAME`, but the provider remains OpenRouter.
218
+
219
+ ## OpenEnv Validation And Submission Checklist
220
+
221
+ Submission-sensitive files:
222
+
223
+ - environment metadata: [openenv.yaml](/home/ranl/code/scalerhack2/openenv.yaml)
224
+ - environment runtime: [src/executive_assistant/env.py](/home/ranl/code/scalerhack2/src/executive_assistant/env.py)
225
+ - typed models: [src/executive_assistant/models.py](/home/ranl/code/scalerhack2/src/executive_assistant/models.py)
226
+ - root inference script: [inference.py](/home/ranl/code/scalerhack2/inference.py)
227
+ - Docker build target: [Dockerfile](/home/ranl/code/scalerhack2/Dockerfile)
228
+
229
+ Recommended pre-submission checks:
230
+
231
+ ```bash
232
+ .venv-training/bin/pytest -q
233
+ .venv-training/bin/python scripts/evaluate_policies.py --provider baseline
234
+ .venv-training/bin/python inference.py
235
+ docker build -t email-maestro .
236
+ docker run -p 7860:7860 email-maestro
237
+ ```
238
+
239
+ If you have the OpenEnv validator installed locally, also run:
240
+
241
+ ```bash
242
+ openenv validate
243
+ ```
244
+
245
+ ## Hugging Face Spaces deployment
246
+
247
+ The repository now includes a one-command Hugging Face Spaces deployment path that stages a Space-friendly bundle, injects a discrete HF `README.md`, carries over the RL checkpoint, creates or updates the Space, uploads the app, and sets runtime metadata variables.
248
+
249
+ 1. Create the training environment if you have not already:
250
+
251
+ ```bash
252
+ bash scripts/setup_training_env.sh
253
+ ```
254
+
255
+ 2. Prepare deployment variables:
256
+
257
+ ```bash
258
+ cp .env.hf.space.example .env.hf.space
259
+ ```
260
+
261
+ 3. Fill in at least:
262
+
263
+ - `HF_TOKEN`
264
+ - `HF_SPACE_REPO`
265
+ - `HF_SPACE_TEAM_USERNAMES`
266
+
267
+ 4. Deploy in one command:
268
+
269
+ ```bash
270
+ bash scripts/deploy_hf_space.sh
271
+ ```
272
+
273
+ What the deploy pipeline does:
274
+
275
+ - creates the target Space with `sdk=docker`
276
+ - stages a clean bundle without local `.env` files, virtualenvs, caches, or git metadata
277
+ - writes a discrete HF Space `README.md` addressed to **Team Epsilon**
278
+ - bundles `artifacts/checkpoints/q_policy_notebook.json` for the `rl` policy, or trains a fresh checkpoint if one is missing
279
+ - uploads the Space contents and sets `OPENROUTER_APP_NAME` and `OPENROUTER_SITE_URL`
280
+ - optionally sets `OPENROUTER_API_KEY` on the Space when it is present locally
281
+
282
+ Supporting deployment docs:
283
+
284
+ - HF README example: [docs/HF_SPACE_README.md](./docs/HF_SPACE_README.md)
285
+ - Deployment env template: [.env.hf.space.example](./.env.hf.space.example)
286
+ - Deployment script: [scripts/deploy_hf_space.py](./scripts/deploy_hf_space.py)
287
+
288
+ ## Development workflow
289
+
290
+ 1. Keep reusable logic in `src/executive_assistant/`.
291
+ 2. Use `training_env.ipynb` with the `scalerhack2-training` kernel for rollouts, prompt iteration, and RL experiments.
292
+ 3. Promote notebook code into modules once it stabilizes.
293
+ 4. Validate behavior through unit tests, deterministic scenario checks, RL checkpoint smoke runs, and exported episode traces.
src/autonomous_executive_assistant_sandbox.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ src/autonomous_executive_assistant_sandbox.egg-info/PKG-INFO
4
+ src/autonomous_executive_assistant_sandbox.egg-info/SOURCES.txt
5
+ src/autonomous_executive_assistant_sandbox.egg-info/dependency_links.txt
6
+ src/autonomous_executive_assistant_sandbox.egg-info/entry_points.txt
7
+ src/autonomous_executive_assistant_sandbox.egg-info/requires.txt
8
+ src/autonomous_executive_assistant_sandbox.egg-info/top_level.txt
9
+ src/executive_assistant/__init__.py
10
+ src/executive_assistant/agent.py
11
+ src/executive_assistant/config.py
12
+ src/executive_assistant/deployment.py
13
+ src/executive_assistant/env.py
14
+ src/executive_assistant/graders.py
15
+ src/executive_assistant/llm_service.py
16
+ src/executive_assistant/models.py
17
+ src/executive_assistant/prompts.py
18
+ src/executive_assistant/runner.py
19
+ src/executive_assistant/seeds.py
20
+ src/executive_assistant/training.py
21
+ src/executive_assistant/workspace.py
22
+ src/server/__init__.py
23
+ src/server/app.py
24
+ tests/test_agent.py
25
+ tests/test_app.py
26
+ tests/test_config.py
27
+ tests/test_deployment.py
28
+ tests/test_env.py
29
+ tests/test_inference.py
30
+ tests/test_llm_service.py
31
+ tests/test_models.py
32
+ tests/test_runner.py
33
+ tests/test_training.py
34
+ tests/test_workspace.py
src/autonomous_executive_assistant_sandbox.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
src/autonomous_executive_assistant_sandbox.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = src.server.app:main
src/autonomous_executive_assistant_sandbox.egg-info/requires.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastapi>=0.115.0
2
+ gradio>=5.0.0
3
+ openai>=1.76.0
4
+ openenv-core>=0.2.0
5
+ pydantic>=2.8.0
6
+ pytest>=8.0.0
7
+ PyYAML>=6.0.0
8
+ uvicorn>=0.30.0
src/autonomous_executive_assistant_sandbox.egg-info/top_level.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ app
2
+ executive_assistant
3
+ server
src/executive_assistant/deployment.py CHANGED
@@ -20,10 +20,12 @@ DEFAULT_STAGE_IGNORE_NAMES = {
20
  ".git",
21
  ".codex",
22
  ".pytest_cache",
 
23
  ".venv-app",
24
  ".venv-training",
25
  ".vscode",
26
  "__pycache__",
 
27
  }
28
  DEFAULT_STAGE_IGNORE_SUFFIXES = {
29
  ".pyc",
 
20
  ".git",
21
  ".codex",
22
  ".pytest_cache",
23
+ ".venv",
24
  ".venv-app",
25
  ".venv-training",
26
  ".vscode",
27
  "__pycache__",
28
+ "artifacts",
29
  }
30
  DEFAULT_STAGE_IGNORE_SUFFIXES = {
31
  ".pyc",
src/server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ from .app import app, main
src/server/app.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import uvicorn
4
+
5
+ from app import app
6
+ from src.executive_assistant.config import AppRuntimeConfig
7
+
8
+
9
+ def main() -> None:
10
+ runtime = AppRuntimeConfig()
11
+ uvicorn.run(app, host=runtime.host, port=runtime.port)
uv.lock ADDED
The diff for this file is too large to render. See raw diff