--- title: Ghostexec Environment Server emoji: πŸ“’ colorFrom: pink colorTo: yellow sdk: docker pinned: false app_port: 7860 base_path: /web tags: - openenv --- # Ghostexec **Ghostexec** is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment that simulates a busy executive’s world: inbox, calendar, contacts, tasks, and stakeholder moods. The agent chooses **structured actions** (reply, reschedule, delegate, …); the server returns a **plain-text briefing** as the main observation and a **scalar reward** shaped around conflict, relationships, and task progress. Scenario data lives in `scenarios/*.json` β€” nothing is hardcoded in Python for world content. **Manifest:** `openenv.yaml` (name **`ghostexec`**, HF Space identifier). **Package:** `openenv-ghostexec` in `pyproject.toml` (import as `ghostexec`). --- ## Deliverables | Deliverable | URL | |-------------|-----| | Public HF Space (required) | `TODO: https://huggingface.co/spaces//ghostexec` | | Write-up / blog (HF post preferred) | `TODO: https://huggingface.co/blog/...` | | Short demo video (<2 min) | `TODO: https://youtube.com/...` | Fill these URLs before submission freeze so reviewers can verify everything from one place. --- ## OpenEnv Hackathon alignment (themes + submission checklist) **Theme fit (examples, not exhaustive):** Ghostexec targets **Theme 3.2 β€” Personalized tasks** (executive-style inbox, calendar, conflicts, delegation via structured actions). **Theme 4** is partially supported via curriculum + perturb (`GHOSTEXEC_CURRICULUM`, `GHOSTEXEC_PERTURB`) and diverse scenarios under `scenarios/`. **Minimum submission checklist (fill before freeze):** | Item | Status | |------|--------| | OpenEnv-based env + `openenv.yaml` | Done in-repo (`openenv-core[core]>=0.2.3` in `pyproject.toml`; aligns with current PyPI release line). | | Short write-up or <2 min video | **You:** publish and paste links in [Deliverables](#deliverables). | | Public HF Space URL | **You:** `openenv push` and paste the URL in [Deliverables](#deliverables). | --- ## Design narrative Ghostexec is intentionally built as an **AI Chief of Staff** environment, not a grid-world clone: the model must triage inbox, calendar, stakeholder mood, and task deadlines under conflict pressure while taking only legal structured actions. - **Environment Innovation (40%)** β€” scenario-driven executive operations with competing priorities, conflict queues, and relationship-sensitive outcomes in `scenarios/*.json` + `server/ghostexec_environment.py`. - **Storytelling & Presentation (30%)** β€” each scenario encodes a narrative arc (VIP escalations, family/professional collisions, deadline cascades) so policy behavior reads like realistic assistant decisions rather than abstract moves. - **Showing Improvement in Rewards (20%)** β€” environment reward remains deterministic, inspectable, and traceable through metadata + episode logs under `outputs/logs/`. - **Reward Quality (10%)** β€” fixed weighted core signal (0.35 conflict / 0.35 relationship / 0.30 task), bounded shaping terms, explicit invalid-action handling, and do_nothing penalties. This framing gives judges a clear throughline: **realistic executive chaos -> constrained legal actions -> measurable policy improvement on held-out scenarios**. --- ## Features - **Legal action set** β€” `reply_email`, `archive_email`, `reschedule_meeting`, `cancel_meeting`, `complete_task`, `delegate_task`, `send_message`, `do_nothing` (see `models.py`). - **Human-readable observations** β€” `GhostexecObservation.echoed_message` is the full briefing text for the model (not raw JSON). - **Invalid actions** β€” Handled in-process: structured metadata (e.g. `step_ok`), no server crash. - **Reward** β€” Weighted blend of conflict, relationship, and task signals (see [Reward](#reward)); per-step logging under `outputs/logs/` (gitignored). - **HTTP + WebSocket** β€” FastAPI app in `server/app.py`; `GhostexecEnv` uses WebSockets for persistent episodes. --- ## Quick start (Python client) From the repo root (`ghostexec/` β€” where `pyproject.toml` lives): ```bash uv sync uv run server --port 8000 ``` In another terminal or notebook: ```python from ghostexec import GhostexecAction, GhostexecEnv with GhostexecEnv(base_url="http://127.0.0.1:8000") as env: out = env.reset() print(out.observation.echoed_message[:500], "…") # plain-text briefing step = env.step( GhostexecAction( action_type="reply_email", email_id="e01", message_body=( "Marcus β€” acknowledged. Revised figures and short rationale " "before noon. β€” Exec" ), ) ) print("reward:", step.reward) print("metadata keys:", sorted((step.observation.metadata or {}).keys())) ``` **Docker image** (optional): if your OpenEnv client supports it, you can point `GhostexecEnv` at a container built from the root `Dockerfile`. Build from repo root: ```bash docker build -t ghostexec-env:latest . ``` --- ## Actions and fields `GhostexecAction` (`models.py`) includes: | `action_type` | Typical fields used | |------------------------|----------------------| | `reply_email` | `email_id`, `message_body` | | `archive_email` | `email_id` | | `reschedule_meeting` | `meeting_id`, `new_time`, `reason` | | `cancel_meeting` | `meeting_id`, `reason` | | `complete_task` | `task_id` | | `delegate_task` | `task_id`, `contact_name` | | `send_message` | `contact_name`, `message` (channel text) | | `do_nothing` | β€” (intentionally weak / penalised path) | Unknown or malformed HTTP payloads deserialize safely to `do_nothing`-style defaults where applicable so older clients do not crash. --- ## Observation `GhostexecObservation`: - **`echoed_message`** β€” Full briefing (emails, conflicts, contacts, tasks, stress, steps remaining). - **`message_length`** β€” Length of `echoed_message` for quick checks. - **`reward`**, **`done`**, **`metadata`** β€” Step outcome; metadata carries flags such as `step_ok`, reward breakdown fields, and ids for debugging. --- ## Reward Phase-4 scoring (`server/reward.py`) combines three channels with **fixed weights**: \[ \text{weighted base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task} \] Then applies output scaling, invalid-step adjustments, bonuses/penalties, and a floor for `do_nothing`. Full component values are available on `RewardBreakdown` and are mirrored into observation metadata where configured. **Episode reward traces** append to `outputs/logs/episode_rewards.jsonl` (directory gitignored). **Reward-engineering provenance.** The design follows the reward-shaping playbook surveyed in *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications* ([arXiv:2408.10215](https://arxiv.org/abs/2408.10215)): dense per-step shaping around proxy signals (conflict / relationship / task) instead of a single sparse end-of-episode reward, fixed weights to keep channel trade-offs inspectable, and bounded per-step magnitudes to resist hacking. --- ## HTTP vs WebSocket (episode state) - **HTTP** `POST /reset` and `POST /step` often bind to **short-lived** environment instances depending on deployment; consecutive HTTP calls may not share one in-memory episode. - **Ghostexec** still applies your action against a scenario-primed instance so a lone `POST /step` can return a meaningful reward and metadata. - **WebSocket `/ws`** β€” Use this (or `GhostexecEnv(base_url=...)`, which speaks WebSocket) for **multi-step episodes** on the same session. Endpoints (typical OpenEnv layout): **`/web`**, **`/docs`**, **`/health`**, **`/ws`**. --- ## Running and testing locally ```bash # Dev server (package layout) uv run uvicorn ghostexec.server.app:app --reload --host 0.0.0.0 --port 8000 # Or console entrypoint (matches Dockerfile) uv run server --port 8000 ``` **Smoke script** (HTTP): ```bash uv run python scripts/http_endpoint_smoke.py --local uv run python scripts/http_endpoint_smoke.py --url http://127.0.0.1:8000 uv run python scripts/http_endpoint_smoke.py --print-curl ``` **Tests:** ```bash uv run pytest tests/ -q ``` Opt-in Docker build smoke (Phase 1 gate): ```bash GHOSTEXEC_RUN_DOCKER_BUILD=1 uv run pytest tests/test_docker_build.py -q ``` With the server already on port 8000: ```bash uv run pytest tests/test_live_server_exhaustive.py -v --tb=short ``` Override live URL (Windows PowerShell example): ```powershell $env:GHOSTEXEC_LIVE_BASE_URL = "http://127.0.0.1:9000" uv run pytest tests/test_live_server_exhaustive.py -q ``` Optional real WebSocket client check: ```bash # Terminal 1 uv run server --port 8000 # Terminal 2 set GHOSTEXEC_WS_BASE_URL=http://127.0.0.1:8000 uv run pytest tests/test_complete_integration.py::test_ghostexec_env_client_against_live_url_if_set -q ``` Post-training plot pack (loss + reward + components + baseline bar): ```bash uv run python scripts/plot_training_report.py \ --trainer-history outputs/trainer_state.json \ --reward-csv outputs/reward_log.csv \ --baselines-json outputs/compliance_manifest.json \ --out-dir outputs/plots ``` The script writes: - `outputs/plots/loss_curve.png` - `outputs/plots/reward_curve.png` - `outputs/plots/components_curve.png` - `outputs/plots/baseline_comparison.png` SFT before GRPO (with partial live-env usage during SFT data generation and GRPO rewards): ```bash uv run python scripts/train_sft_then_grpo.py \ --model-preset small_iter_fast \ --training-preset hackathon_turbo \ --env-url http://127.0.0.1:8000 \ --generate-sft-from-env \ --sft-samples 120 \ --max-sft-steps 60 \ --max-grpo-steps 120 \ --env-reward-scale 1.0 \ --local-reward-scale 0.35 \ --complexity-curriculum easy_to_full \ --curriculum-ramp-ratio 0.60 ``` This performs: - SFT warm-start on JSONL (`prompt` + `completion`) generated from live `/reset` briefings. - GRPO continuation from the SFT adapter. - Mixed reward shaping where env-derived reward remains active and local shaping can be down-weighted/up-weighted via scales. - Optional complexity curriculum (`easy_to_full`) that starts with stronger scaffold/local signals and anneals to env-dominant reward later. - Stability-first optimization defaults (cosine schedule + warmup + grad clipping + higher GRPO KL beta). Optional `--reward-ema-decay 0..1` smooths the *env* reward channel (defaults come from `--training-preset`). Training always runs the full `max_*_steps` (no early-stop callbacks). Recommended model strategy for hackathon iteration speed: - Start with `--model-preset small_iter_fast` (`unsloth/Qwen2.5-3B-Instruct`) + QLoRA. - Run many short SFT->GRPO loops, improve reward signals, then scale model size only after curves stabilize. - Use larger presets only when memory + runtime are consistently stable. - Use `--training-preset hackathon_turbo` to apply stable aggressive defaults for iterative win-rate. - Script prints SFT/GRPO LoRA delta checks; if deltas are near zero it stops, so you never mistake a no-op run for real finetuning. --- ## Hugging Face Spaces Full OpenEnv CLI flow from this directory (matches steps 5–8 of the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)): ```bash openenv serve # local dev server on :8000 openenv build # build the Docker image openenv validate --verbose # structure + Dockerfile + entrypoint checks openenv push # deploy to HF Spaces # openenv push --repo-id your-username/ghostexec ``` Use a **public** Space for the default hackathon flow unless you intentionally need a private Space. Authenticate with Hugging Face first (`huggingface-cli login` or equivalent). --- ## Scenarios | File | Role | |------|------| | `scenarios/phase2_core.json` | Default dense inbox/calendar/tasks fixture | | `scenarios/monday_morning.json`, `dinner_disaster.json`, `vip_meltdown.json` | Narrative demos | | `scenarios/vip_meltdown_drift.json` | Mood / escalation drift | | `scenarios/schema_drift_test.json` | Drift-event harness | --- ## Concurrent WebSocket sessions `server/app.py` passes **`GhostexecEnvironment`** (the class) into `create_app` with `max_concurrent_envs=1` by default. Increase `max_concurrent_envs` if you need multiple simultaneous WebSocket clients. --- ## Project layout ``` ghostexec/ β”œβ”€β”€ openenv.yaml # OpenEnv name, version, description β”œβ”€β”€ pyproject.toml # Package metadata + optional extras β”œβ”€β”€ uv.lock β”œβ”€β”€ models.py # World + GhostexecAction / GhostexecObservation β”œβ”€β”€ client.py # GhostexecEnv (WebSocket client) β”œβ”€β”€ scenarios/ # World JSON (source of truth for episodes) β”œβ”€β”€ scripts/ # http_endpoint_smoke.py β”œβ”€β”€ tests/ └── server/ β”œβ”€β”€ app.py # FastAPI + create_app β”œβ”€β”€ ghostexec_environment.py β”œβ”€β”€ reward.py └── Dockerfile ``` --- ## Resources & references Ghostexec is built against the official Meta PyTorch OpenEnv stack. Every design choice below is traceable to one of these sources. **OpenEnv core.** The Gymnasium-style `reset()` / `step()` / `state` interface in `server/ghostexec_environment.py`, the `EnvClient` subclass in `client.py`, and the `create_app(...)` wiring in `server/app.py` follow the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html) exactly. - Core repo: [meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv) - Docs: [meta-pytorch.org/OpenEnv](https://meta-pytorch.org/OpenEnv/) **OpenEnv Hub (Hugging Face).** Target deployment for `openenv push`. The Space metadata at the top of this README + `openenv.yaml` are the knobs HF Spaces reads. - Environments: [huggingface.co/openenv](https://huggingface.co/openenv) - Spaces: [huggingface.co/openenv/spaces](https://huggingface.co/openenv/spaces) **Tutorials.** General OpenEnv environment patterns are documented in the official tutorial pages and examples. - All tutorials: [OpenEnv/tutorial](https://github.com/meta-pytorch/OpenEnv/tree/main/tutorial) - Environment examples: [OpenEnv/envs](https://github.com/meta-pytorch/OpenEnv/tree/main/envs) **YouTube β€” Building RL environments.** Talks from Meta / OpenEnv contributors that informed the scenario-driven reset, WebSocket session model, and reward breakdown used here: - [Building RL Environments with OpenEnv](https://www.youtube.com/watch?v=0airz7BhBiA) - [OpenEnv Deep Dive](https://www.youtube.com/watch?v=ap4q4sAK4OY) - [Agentic RL Environments](https://www.youtube.com/watch?v=Jew4lhAiqnw) - [OpenEnv Livestream (4-hour walkthrough)](https://www.youtube.com/live/kkCNMz0Ptd8) **Reward-engineering papers.** See [Reward](#reward) for how each paper maps to specific components of `server/reward.py`. - Jnadi, A. (2024). *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications*. [arXiv:2408.10215](https://arxiv.org/abs/2408.10215). Informs the dense per-step conflict / relationship / task shaping and the bounded-magnitude design. --- ## License BSD-style β€” see the license notice at the top of each source file (Meta / OpenEnv lineage).