ghostexec / README.md
modelbuilderhq's picture
Upload folder using huggingface_hub
d669b0f verified
|
raw
history blame
15.5 kB
---
title: Ghostexec Environment Server
emoji: πŸ“’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
---
# Ghostexec
**Ghostexec** is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment that simulates a busy executive’s world: inbox, calendar, contacts, tasks, and stakeholder moods. The agent chooses **structured actions** (reply, reschedule, delegate, …); the server returns a **plain-text briefing** as the main observation and a **scalar reward** shaped around conflict, relationships, and task progress. Scenario data lives in `scenarios/*.json` β€” nothing is hardcoded in Python for world content.
**Manifest:** `openenv.yaml` (name **`ghostexec`**, HF Space identifier).
**Package:** `openenv-ghostexec` in `pyproject.toml` (import as `ghostexec`).
---
## Deliverables
| Deliverable | URL |
|-------------|-----|
| Public HF Space (required) | `TODO: https://huggingface.co/spaces/<org>/ghostexec` |
| Write-up / blog (HF post preferred) | `TODO: https://huggingface.co/blog/...` |
| Short demo video (&lt;2 min) | `TODO: https://youtube.com/...` |
Fill these URLs before submission freeze so reviewers can verify everything from one place.
---
## OpenEnv Hackathon alignment (themes + submission checklist)
**Theme fit (examples, not exhaustive):** Ghostexec targets **Theme 3.2 β€” Personalized tasks** (executive-style inbox, calendar, conflicts, delegation via structured actions). **Theme 4** is partially supported via curriculum + perturb (`GHOSTEXEC_CURRICULUM`, `GHOSTEXEC_PERTURB`) and diverse scenarios under `scenarios/`.
**Minimum submission checklist (fill before freeze):**
| Item | Status |
|------|--------|
| OpenEnv-based env + `openenv.yaml` | Done in-repo (`openenv-core[core]>=0.2.3` in `pyproject.toml`; aligns with current PyPI release line). |
| Short write-up or &lt;2 min video | **You:** publish and paste links in [Deliverables](#deliverables). |
| Public HF Space URL | **You:** `openenv push` and paste the URL in [Deliverables](#deliverables). |
---
## Design narrative
Ghostexec is intentionally built as an **AI Chief of Staff** environment, not a grid-world clone: the model must triage inbox, calendar, stakeholder mood, and task deadlines under conflict pressure while taking only legal structured actions.
- **Environment Innovation (40%)** β€” scenario-driven executive operations with competing priorities, conflict queues, and relationship-sensitive outcomes in `scenarios/*.json` + `server/ghostexec_environment.py`.
- **Storytelling & Presentation (30%)** β€” each scenario encodes a narrative arc (VIP escalations, family/professional collisions, deadline cascades) so policy behavior reads like realistic assistant decisions rather than abstract moves.
- **Showing Improvement in Rewards (20%)** β€” environment reward remains deterministic, inspectable, and traceable through metadata + episode logs under `outputs/logs/`.
- **Reward Quality (10%)** β€” fixed weighted core signal (0.35 conflict / 0.35 relationship / 0.30 task), bounded shaping terms, explicit invalid-action handling, and do_nothing penalties.
This framing gives judges a clear throughline: **realistic executive chaos -> constrained legal actions -> measurable policy improvement on held-out scenarios**.
---
## Features
- **Legal action set** β€” `reply_email`, `archive_email`, `reschedule_meeting`, `cancel_meeting`, `complete_task`, `delegate_task`, `send_message`, `do_nothing` (see `models.py`).
- **Human-readable observations** β€” `GhostexecObservation.echoed_message` is the full briefing text for the model (not raw JSON).
- **Invalid actions** β€” Handled in-process: structured metadata (e.g. `step_ok`), no server crash.
- **Reward** β€” Weighted blend of conflict, relationship, and task signals (see [Reward](#reward)); per-step logging under `outputs/logs/` (gitignored).
- **HTTP + WebSocket** β€” FastAPI app in `server/app.py`; `GhostexecEnv` uses WebSockets for persistent episodes.
---
## Quick start (Python client)
From the repo root (`ghostexec/` β€” where `pyproject.toml` lives):
```bash
uv sync
uv run server --port 8000
```
In another terminal or notebook:
```python
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:500], "…") # plain-text briefing
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body=(
"Marcus β€” acknowledged. Revised figures and short rationale "
"before noon. β€” Exec"
),
)
)
print("reward:", step.reward)
print("metadata keys:", sorted((step.observation.metadata or {}).keys()))
```
**Docker image** (optional): if your OpenEnv client supports it, you can point `GhostexecEnv` at a container built from the root `Dockerfile`. Build from repo root:
```bash
docker build -t ghostexec-env:latest .
```
---
## Actions and fields
`GhostexecAction` (`models.py`) includes:
| `action_type` | Typical fields used |
|------------------------|----------------------|
| `reply_email` | `email_id`, `message_body` |
| `archive_email` | `email_id` |
| `reschedule_meeting` | `meeting_id`, `new_time`, `reason` |
| `cancel_meeting` | `meeting_id`, `reason` |
| `complete_task` | `task_id` |
| `delegate_task` | `task_id`, `contact_name` |
| `send_message` | `contact_name`, `message` (channel text) |
| `do_nothing` | β€” (intentionally weak / penalised path) |
Unknown or malformed HTTP payloads deserialize safely to `do_nothing`-style defaults where applicable so older clients do not crash.
---
## Observation
`GhostexecObservation`:
- **`echoed_message`** β€” Full briefing (emails, conflicts, contacts, tasks, stress, steps remaining).
- **`message_length`** β€” Length of `echoed_message` for quick checks.
- **`reward`**, **`done`**, **`metadata`** β€” Step outcome; metadata carries flags such as `step_ok`, reward breakdown fields, and ids for debugging.
---
## Reward
Phase-4 scoring (`server/reward.py`) combines three channels with **fixed weights**:
\[
\text{weighted base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}
\]
Then applies output scaling, invalid-step adjustments, bonuses/penalties, and a floor for `do_nothing`. Full component values are available on `RewardBreakdown` and are mirrored into observation metadata where configured. **Episode reward traces** append to `outputs/logs/episode_rewards.jsonl` (directory gitignored).
**Reward-engineering provenance.** The design follows the reward-shaping playbook surveyed in *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications* ([arXiv:2408.10215](https://arxiv.org/abs/2408.10215)): dense per-step shaping around proxy signals (conflict / relationship / task) instead of a single sparse end-of-episode reward, fixed weights to keep channel trade-offs inspectable, and bounded per-step magnitudes to resist hacking.
---
## HTTP vs WebSocket (episode state)
- **HTTP** `POST /reset` and `POST /step` often bind to **short-lived** environment instances depending on deployment; consecutive HTTP calls may not share one in-memory episode.
- **Ghostexec** still applies your action against a scenario-primed instance so a lone `POST /step` can return a meaningful reward and metadata.
- **WebSocket `/ws`** β€” Use this (or `GhostexecEnv(base_url=...)`, which speaks WebSocket) for **multi-step episodes** on the same session.
Endpoints (typical OpenEnv layout): **`/web`**, **`/docs`**, **`/health`**, **`/ws`**.
---
## Running and testing locally
```bash
# Dev server (package layout)
uv run uvicorn ghostexec.server.app:app --reload --host 0.0.0.0 --port 8000
# Or console entrypoint (matches Dockerfile)
uv run server --port 8000
```
**Smoke script** (HTTP):
```bash
uv run python scripts/http_endpoint_smoke.py --local
uv run python scripts/http_endpoint_smoke.py --url http://127.0.0.1:8000
uv run python scripts/http_endpoint_smoke.py --print-curl
```
**Tests:**
```bash
uv run pytest tests/ -q
```
Opt-in Docker build smoke (Phase 1 gate):
```bash
GHOSTEXEC_RUN_DOCKER_BUILD=1 uv run pytest tests/test_docker_build.py -q
```
With the server already on port 8000:
```bash
uv run pytest tests/test_live_server_exhaustive.py -v --tb=short
```
Override live URL (Windows PowerShell example):
```powershell
$env:GHOSTEXEC_LIVE_BASE_URL = "http://127.0.0.1:9000"
uv run pytest tests/test_live_server_exhaustive.py -q
```
Optional real WebSocket client check:
```bash
# Terminal 1
uv run server --port 8000
# Terminal 2
set GHOSTEXEC_WS_BASE_URL=http://127.0.0.1:8000
uv run pytest tests/test_complete_integration.py::test_ghostexec_env_client_against_live_url_if_set -q
```
Post-training plot pack (loss + reward + components + baseline bar):
```bash
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir outputs/plots
```
The script writes:
- `outputs/plots/loss_curve.png`
- `outputs/plots/reward_curve.png`
- `outputs/plots/components_curve.png`
- `outputs/plots/baseline_comparison.png`
SFT before GRPO (with partial live-env usage during SFT data generation and GRPO rewards):
```bash
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
```
This performs:
- SFT warm-start on JSONL (`prompt` + `completion`) generated from live `/reset` briefings.
- GRPO continuation from the SFT adapter.
- Mixed reward shaping where env-derived reward remains active and local shaping can be down-weighted/up-weighted via scales.
- Optional complexity curriculum (`easy_to_full`) that starts with stronger scaffold/local signals and anneals to env-dominant reward later.
- Stability-first optimization defaults (cosine schedule + warmup + grad clipping + higher GRPO KL beta). Optional `--reward-ema-decay 0..1` smooths the *env* reward channel (defaults come from `--training-preset`). Training always runs the full `max_*_steps` (no early-stop callbacks).
Recommended model strategy for hackathon iteration speed:
- Start with `--model-preset small_iter_fast` (`unsloth/Qwen2.5-3B-Instruct`) + QLoRA.
- Run many short SFT->GRPO loops, improve reward signals, then scale model size only after curves stabilize.
- Use larger presets only when memory + runtime are consistently stable.
- Use `--training-preset hackathon_turbo` to apply stable aggressive defaults for iterative win-rate.
- Script prints SFT/GRPO LoRA delta checks; if deltas are near zero it stops, so you never mistake a no-op run for real finetuning.
---
## Hugging Face Spaces
Full OpenEnv CLI flow from this directory (matches steps 5–8 of the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)):
```bash
openenv serve # local dev server on :8000
openenv build # build the Docker image
openenv validate --verbose # structure + Dockerfile + entrypoint checks
openenv push # deploy to HF Spaces
# openenv push --repo-id your-username/ghostexec
```
Use a **public** Space for the default hackathon flow unless you intentionally need a private Space. Authenticate with Hugging Face first (`huggingface-cli login` or equivalent).
---
## Scenarios
| File | Role |
|------|------|
| `scenarios/phase2_core.json` | Default dense inbox/calendar/tasks fixture |
| `scenarios/monday_morning.json`, `dinner_disaster.json`, `vip_meltdown.json` | Narrative demos |
| `scenarios/vip_meltdown_drift.json` | Mood / escalation drift |
| `scenarios/schema_drift_test.json` | Drift-event harness |
---
## Concurrent WebSocket sessions
`server/app.py` passes **`GhostexecEnvironment`** (the class) into `create_app` with `max_concurrent_envs=1` by default. Increase `max_concurrent_envs` if you need multiple simultaneous WebSocket clients.
---
## Project layout
```
ghostexec/
β”œβ”€β”€ openenv.yaml # OpenEnv name, version, description
β”œβ”€β”€ pyproject.toml # Package metadata + optional extras
β”œβ”€β”€ uv.lock
β”œβ”€β”€ models.py # World + GhostexecAction / GhostexecObservation
β”œβ”€β”€ client.py # GhostexecEnv (WebSocket client)
β”œβ”€β”€ scenarios/ # World JSON (source of truth for episodes)
β”œβ”€β”€ scripts/ # http_endpoint_smoke.py
β”œβ”€β”€ tests/
└── server/
β”œβ”€β”€ app.py # FastAPI + create_app
β”œβ”€β”€ ghostexec_environment.py
β”œβ”€β”€ reward.py
└── Dockerfile
```
---
## Resources & references
Ghostexec is built against the official Meta PyTorch OpenEnv stack. Every design choice below is traceable to one of these sources.
**OpenEnv core.** The Gymnasium-style `reset()` / `step()` / `state` interface in `server/ghostexec_environment.py`, the `EnvClient` subclass in `client.py`, and the `create_app(...)` wiring in `server/app.py` follow the [Packaging & Deploying guide](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html) exactly.
- Core repo: [meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)
- Docs: [meta-pytorch.org/OpenEnv](https://meta-pytorch.org/OpenEnv/)
**OpenEnv Hub (Hugging Face).** Target deployment for `openenv push`. The Space metadata at the top of this README + `openenv.yaml` are the knobs HF Spaces reads.
- Environments: [huggingface.co/openenv](https://huggingface.co/openenv)
- Spaces: [huggingface.co/openenv/spaces](https://huggingface.co/openenv/spaces)
**Tutorials.** General OpenEnv environment patterns are documented in the official tutorial pages and examples.
- All tutorials: [OpenEnv/tutorial](https://github.com/meta-pytorch/OpenEnv/tree/main/tutorial)
- Environment examples: [OpenEnv/envs](https://github.com/meta-pytorch/OpenEnv/tree/main/envs)
**YouTube β€” Building RL environments.** Talks from Meta / OpenEnv contributors that informed the scenario-driven reset, WebSocket session model, and reward breakdown used here:
- [Building RL Environments with OpenEnv](https://www.youtube.com/watch?v=0airz7BhBiA)
- [OpenEnv Deep Dive](https://www.youtube.com/watch?v=ap4q4sAK4OY)
- [Agentic RL Environments](https://www.youtube.com/watch?v=Jew4lhAiqnw)
- [OpenEnv Livestream (4-hour walkthrough)](https://www.youtube.com/live/kkCNMz0Ptd8)
**Reward-engineering papers.** See [Reward](#reward) for how each paper maps to specific components of `server/reward.py`.
- Jnadi, A. (2024). *Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications*. [arXiv:2408.10215](https://arxiv.org/abs/2408.10215). Informs the dense per-step conflict / relationship / task shaping and the bounded-magnitude design.
---
## License
BSD-style β€” see the license notice at the top of each source file (Meta / OpenEnv lineage).