Spaces:
Sleeping
Sleeping
File size: 15,490 Bytes
fef31e6 ff293b1 fef31e6 ff293b1 fef31e6 ff293b1 ee21104 d669b0f ee21104 ff293b1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | ---
title: Ghostexec Environment Server
emoji: π’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
---
# Ghostexec
**Ghostexec** is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment that simulates a busy executiveβs world: inbox, calendar, contacts, tasks, and stakeholder moods. The agent chooses **structured actions** (reply, reschedule, delegate, β¦); the server returns a **plain-text briefing** as the main observation and a **scalar reward** shaped around conflict, relationships, and task progress. Scenario data lives in `scenarios/*.json` β nothing is hardcoded in Python for world content.
**Manifest:** `openenv.yaml` (name **`ghostexec`**, HF Space identifier).
**Package:** `openenv-ghostexec` in `pyproject.toml` (import as `ghostexec`).
---
## Deliverables
| Deliverable | URL |
|-------------|-----|
| Public HF Space (required) | `TODO: https://huggingface.co/spaces/<org>/ghostexec` |
| Write-up / blog (HF post preferred) | `TODO: https://huggingface.co/blog/...` |
| Short demo video (<2 min) | `TODO: https://youtube.com/...` |
Fill these URLs before submission freeze so reviewers can verify everything from one place.
---
## OpenEnv Hackathon alignment (themes + submission checklist)
**Theme fit (examples, not exhaustive):** Ghostexec targets **Theme 3.2 β Personalized tasks** (executive-style inbox, calendar, conflicts, delegation via structured actions). **Theme 4** is partially supported via curriculum + perturb (`GHOSTEXEC_CURRICULUM`, `GHOSTEXEC_PERTURB`) and diverse scenarios under `scenarios/`.
**Minimum submission checklist (fill before freeze):**
| Item | Status |
|------|--------|
| OpenEnv-based env + `openenv.yaml` | Done in-repo (`openenv-core[core]>=0.2.3` in `pyproject.toml`; aligns with current PyPI release line). |
| Short write-up or <2 min video | **You:** publish and paste links in [Deliverables](#deliverables). |
| Public HF Space URL | **You:** `openenv push` and paste the URL in [Deliverables](#deliverables). |
---
## Design narrative
Ghostexec is intentionally built as an **AI Chief of Staff** environment, not a grid-world clone: the model must triage inbox, calendar, stakeholder mood, and task deadlines under conflict pressure while taking only legal structured actions.
- **Environment Innovation (40%)** β scenario-driven executive operations with competing priorities, conflict queues, and relationship-sensitive outcomes in `scenarios/*.json` + `server/ghostexec_environment.py`.
- **Storytelling & Presentation (30%)** β each scenario encodes a narrative arc (VIP escalations, family/professional collisions, deadline cascades) so policy behavior reads like realistic assistant decisions rather than abstract moves.
- **Showing Improvement in Rewards (20%)** β environment reward remains deterministic, inspectable, and traceable through metadata + episode logs under `outputs/logs/`.
- **Reward Quality (10%)** β fixed weighted core signal (0.35 conflict / 0.35 relationship / 0.30 task), bounded shaping terms, explicit invalid-action handling, and do_nothing penalties.
This framing gives judges a clear throughline: **realistic executive chaos -> constrained legal actions -> measurable policy improvement on held-out scenarios**.
---
## Features
- **Legal action set** β `reply_email`, `archive_email`, `reschedule_meeting`, `cancel_meeting`, `complete_task`, `delegate_task`, `send_message`, `do_nothing` (see `models.py`).
- **Human-readable observations** β `GhostexecObservation.echoed_message` is the full briefing text for the model (not raw JSON).
- **Invalid actions** β Handled in-process: structured metadata (e.g. `step_ok`), no server crash.
- **Reward** β Weighted blend of conflict, relationship, and task signals (see [Reward](#reward)); per-step logging under `outputs/logs/` (gitignored).
- **HTTP + WebSocket** β FastAPI app in `server/app.py`; `GhostexecEnv` uses WebSockets for persistent episodes.
---
## Quick start (Python client)
From the repo root (`ghostexec/` β where `pyproject.toml` lives):
```bash
uv sync
uv run server --port 8000
```
In another terminal or notebook:
```python
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:500], "β¦") # plain-text briefing
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body=(
"Marcus β acknowledged. Revised figures and short rationale "
"before noon. β Exec"
),
)
)
print("reward:", step.reward)
print("metadata keys:", sorted((step.observation.metadata or {}).keys()))
```
**Docker image** (optional): if your OpenEnv client supports it, you can point `GhostexecEnv` at a container built from the root `Dockerfile`. Build from repo root:
```bash
docker build -t ghostexec-env:latest .
```
---
## Actions and fields
`GhostexecAction` (`models.py`) includes:
| `action_type` | Typical fields used |
|------------------------|----------------------|
| `reply_email` | `email_id`, `message_body` |
| `archive_email` | `email_id` |
| `reschedule_meeting` | `meeting_id`, `new_time`, `reason` |
| `cancel_meeting` | `meeting_id`, `reason` |
| `complete_task` | `task_id` |
| `delegate_task` | `task_id`, `contact_name` |
| `send_message` | `contact_name`, `message` (channel text) |
| `do_nothing` | β (intentionally weak / penalised path) |
Unknown or malformed HTTP payloads deserialize safely to `do_nothing`-style defaults where applicable so older clients do not crash.
---
## Observation
`GhostexecObservation`:
- **`echoed_message`** β Full briefing (emails, conflicts, contacts, tasks, stress, steps remaining).
- **`message_length`** β Length of `echoed_message` for quick checks.
- **`reward`**, **`done`**, **`metadata`** β Step outcome; metadata carries flags such as `step_ok`, reward breakdown fields, and ids for debugging.
---
## Reward
Phase-4 scoring (`server/reward.py`) combines three channels with **fixed weights**:
\[
\text{weighted base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}
\]
Then applies output scaling, invalid-step adjustments, bonuses/penalties, and a floor for `do_nothing`. Full component values are available on `RewardBreakdown` and are mirrored into observation metadata where configured. **Episode reward traces** append to `outputs/logs/episode_rewards.jsonl` (directory gitignored).
**Reward-engineering provenance.** The design follows the reward-shaping playbook surveyed in *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications* ([arXiv:2408.10215](https://arxiv.org/abs/2408.10215)): dense per-step shaping around proxy signals (conflict / relationship / task) instead of a single sparse end-of-episode reward, fixed weights to keep channel trade-offs inspectable, and bounded per-step magnitudes to resist hacking.
---
## HTTP vs WebSocket (episode state)
- **HTTP** `POST /reset` and `POST /step` often bind to **short-lived** environment instances depending on deployment; consecutive HTTP calls may not share one in-memory episode.
- **Ghostexec** still applies your action against a scenario-primed instance so a lone `POST /step` can return a meaningful reward and metadata.
- **WebSocket `/ws`** β Use this (or `GhostexecEnv(base_url=...)`, which speaks WebSocket) for **multi-step episodes** on the same session.
Endpoints (typical OpenEnv layout): **`/web`**, **`/docs`**, **`/health`**, **`/ws`**.
---
## Running and testing locally
```bash
# Dev server (package layout)
uv run uvicorn ghostexec.server.app:app --reload --host 0.0.0.0 --port 8000
# Or console entrypoint (matches Dockerfile)
uv run server --port 8000
```
**Smoke script** (HTTP):
```bash
uv run python scripts/http_endpoint_smoke.py --local
uv run python scripts/http_endpoint_smoke.py --url http://127.0.0.1:8000
uv run python scripts/http_endpoint_smoke.py --print-curl
```
**Tests:**
```bash
uv run pytest tests/ -q
```
Opt-in Docker build smoke (Phase 1 gate):
```bash
GHOSTEXEC_RUN_DOCKER_BUILD=1 uv run pytest tests/test_docker_build.py -q
```
With the server already on port 8000:
```bash
uv run pytest tests/test_live_server_exhaustive.py -v --tb=short
```
Override live URL (Windows PowerShell example):
```powershell
$env:GHOSTEXEC_LIVE_BASE_URL = "http://127.0.0.1:9000"
uv run pytest tests/test_live_server_exhaustive.py -q
```
Optional real WebSocket client check:
```bash
# Terminal 1
uv run server --port 8000
# Terminal 2
set GHOSTEXEC_WS_BASE_URL=http://127.0.0.1:8000
uv run pytest tests/test_complete_integration.py::test_ghostexec_env_client_against_live_url_if_set -q
```
Post-training plot pack (loss + reward + components + baseline bar):
```bash
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir outputs/plots
```
The script writes:
- `outputs/plots/loss_curve.png`
- `outputs/plots/reward_curve.png`
- `outputs/plots/components_curve.png`
- `outputs/plots/baseline_comparison.png`
SFT before GRPO (with partial live-env usage during SFT data generation and GRPO rewards):
```bash
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
```
This performs:
- SFT warm-start on JSONL (`prompt` + `completion`) generated from live `/reset` briefings.
- GRPO continuation from the SFT adapter.
- Mixed reward shaping where env-derived reward remains active and local shaping can be down-weighted/up-weighted via scales.
- Optional complexity curriculum (`easy_to_full`) that starts with stronger scaffold/local signals and anneals to env-dominant reward later.
- Stability-first optimization defaults (cosine schedule + warmup + grad clipping + higher GRPO KL beta). Optional `--reward-ema-decay 0..1` smooths the *env* reward channel (defaults come from `--training-preset`). Training always runs the full `max_*_steps` (no early-stop callbacks).
Recommended model strategy for hackathon iteration speed:
- Start with `--model-preset small_iter_fast` (`unsloth/Qwen2.5-3B-Instruct`) + QLoRA.
- Run many short SFT->GRPO loops, improve reward signals, then scale model size only after curves stabilize.
- Use larger presets only when memory + runtime are consistently stable.
- Use `--training-preset hackathon_turbo` to apply stable aggressive defaults for iterative win-rate.
- Script prints SFT/GRPO LoRA delta checks; if deltas are near zero it stops, so you never mistake a no-op run for real finetuning.
---
## Hugging Face Spaces
Full OpenEnv CLI flow from this directory (matches steps 5β8 of the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)):
```bash
openenv serve # local dev server on :8000
openenv build # build the Docker image
openenv validate --verbose # structure + Dockerfile + entrypoint checks
openenv push # deploy to HF Spaces
# openenv push --repo-id your-username/ghostexec
```
Use a **public** Space for the default hackathon flow unless you intentionally need a private Space. Authenticate with Hugging Face first (`huggingface-cli login` or equivalent).
---
## Scenarios
| File | Role |
|------|------|
| `scenarios/phase2_core.json` | Default dense inbox/calendar/tasks fixture |
| `scenarios/monday_morning.json`, `dinner_disaster.json`, `vip_meltdown.json` | Narrative demos |
| `scenarios/vip_meltdown_drift.json` | Mood / escalation drift |
| `scenarios/schema_drift_test.json` | Drift-event harness |
---
## Concurrent WebSocket sessions
`server/app.py` passes **`GhostexecEnvironment`** (the class) into `create_app` with `max_concurrent_envs=1` by default. Increase `max_concurrent_envs` if you need multiple simultaneous WebSocket clients.
---
## Project layout
```
ghostexec/
βββ openenv.yaml # OpenEnv name, version, description
βββ pyproject.toml # Package metadata + optional extras
βββ uv.lock
βββ models.py # World + GhostexecAction / GhostexecObservation
βββ client.py # GhostexecEnv (WebSocket client)
βββ scenarios/ # World JSON (source of truth for episodes)
βββ scripts/ # http_endpoint_smoke.py
βββ tests/
βββ server/
βββ app.py # FastAPI + create_app
βββ ghostexec_environment.py
βββ reward.py
βββ Dockerfile
```
---
## Resources & references
Ghostexec is built against the official Meta PyTorch OpenEnv stack. Every design choice below is traceable to one of these sources.
**OpenEnv core.** The Gymnasium-style `reset()` / `step()` / `state` interface in `server/ghostexec_environment.py`, the `EnvClient` subclass in `client.py`, and the `create_app(...)` wiring in `server/app.py` follow the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html) exactly.
- Core repo: [meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)
- Docs: [meta-pytorch.org/OpenEnv](https://meta-pytorch.org/OpenEnv/)
**OpenEnv Hub (Hugging Face).** Target deployment for `openenv push`. The Space metadata at the top of this README + `openenv.yaml` are the knobs HF Spaces reads.
- Environments: [huggingface.co/openenv](https://huggingface.co/openenv)
- Spaces: [huggingface.co/openenv/spaces](https://huggingface.co/openenv/spaces)
**Tutorials.** General OpenEnv environment patterns are documented in the official tutorial pages and examples.
- All tutorials: [OpenEnv/tutorial](https://github.com/meta-pytorch/OpenEnv/tree/main/tutorial)
- Environment examples: [OpenEnv/envs](https://github.com/meta-pytorch/OpenEnv/tree/main/envs)
**YouTube β Building RL environments.** Talks from Meta / OpenEnv contributors that informed the scenario-driven reset, WebSocket session model, and reward breakdown used here:
- [Building RL Environments with OpenEnv](https://www.youtube.com/watch?v=0airz7BhBiA)
- [OpenEnv Deep Dive](https://www.youtube.com/watch?v=ap4q4sAK4OY)
- [Agentic RL Environments](https://www.youtube.com/watch?v=Jew4lhAiqnw)
- [OpenEnv Livestream (4-hour walkthrough)](https://www.youtube.com/live/kkCNMz0Ptd8)
**Reward-engineering papers.** See [Reward](#reward) for how each paper maps to specific components of `server/reward.py`.
- Jnadi, A. (2024). *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications*. [arXiv:2408.10215](https://arxiv.org/abs/2408.10215). Informs the dense per-step conflict / relationship / task shaping and the bounded-magnitude design.
---
## License
BSD-style β see the license notice at the top of each source file (Meta / OpenEnv lineage).
|