Spaces:
Sleeping
title: Ghostexec Environment Server
emoji: π’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
Ghostexec
Ghostexec is an OpenEnv-compatible environment that simulates a busy executiveβs world: inbox, calendar, contacts, tasks, and stakeholder moods. The agent chooses structured actions (reply, reschedule, delegate, β¦); the server returns a plain-text briefing as the main observation and a scalar reward shaped around conflict, relationships, and task progress. Scenario data lives in scenarios/*.json β nothing is hardcoded in Python for world content.
Manifest: openenv.yaml (name ghostexec, HF Space identifier).
Package: openenv-ghostexec in pyproject.toml (import as ghostexec).
Deliverables
| Deliverable | URL |
|---|---|
| Public HF Space (required) | TODO: https://huggingface.co/spaces/<org>/ghostexec |
| Write-up / blog (HF post preferred) | TODO: https://huggingface.co/blog/... |
| Short demo video (<2 min) | TODO: https://youtube.com/... |
Fill these URLs before submission freeze so reviewers can verify everything from one place.
OpenEnv Hackathon alignment (themes + submission checklist)
Theme fit (examples, not exhaustive): Ghostexec targets Theme 3.2 β Personalized tasks (executive-style inbox, calendar, conflicts, delegation via structured actions). Theme 4 is partially supported via curriculum + perturb (GHOSTEXEC_CURRICULUM, GHOSTEXEC_PERTURB) and diverse scenarios under scenarios/.
Minimum submission checklist (fill before freeze):
| Item | Status |
|---|---|
OpenEnv-based env + openenv.yaml |
Done in-repo (openenv-core[core]>=0.2.3 in pyproject.toml; aligns with current PyPI release line). |
| Short write-up or <2 min video | You: publish and paste links in Deliverables. |
| Public HF Space URL | You: openenv push and paste the URL in Deliverables. |
Design narrative
Ghostexec is intentionally built as an AI Chief of Staff environment, not a grid-world clone: the model must triage inbox, calendar, stakeholder mood, and task deadlines under conflict pressure while taking only legal structured actions.
- Environment Innovation (40%) β scenario-driven executive operations with competing priorities, conflict queues, and relationship-sensitive outcomes in
scenarios/*.json+server/ghostexec_environment.py. - Storytelling & Presentation (30%) β each scenario encodes a narrative arc (VIP escalations, family/professional collisions, deadline cascades) so policy behavior reads like realistic assistant decisions rather than abstract moves.
- Showing Improvement in Rewards (20%) β environment reward remains deterministic, inspectable, and traceable through metadata + episode logs under
outputs/logs/. - Reward Quality (10%) β fixed weighted core signal (0.35 conflict / 0.35 relationship / 0.30 task), bounded shaping terms, explicit invalid-action handling, and do_nothing penalties.
This framing gives judges a clear throughline: realistic executive chaos -> constrained legal actions -> measurable policy improvement on held-out scenarios.
Features
- Legal action set β
reply_email,archive_email,reschedule_meeting,cancel_meeting,complete_task,delegate_task,send_message,do_nothing(seemodels.py). - Human-readable observations β
GhostexecObservation.echoed_messageis the full briefing text for the model (not raw JSON). - Invalid actions β Handled in-process: structured metadata (e.g.
step_ok), no server crash. - Reward β Weighted blend of conflict, relationship, and task signals (see Reward); per-step logging under
outputs/logs/(gitignored). - HTTP + WebSocket β FastAPI app in
server/app.py;GhostexecEnvuses WebSockets for persistent episodes.
Quick start (Python client)
From the repo root (ghostexec/ β where pyproject.toml lives):
uv sync
uv run server --port 8000
In another terminal or notebook:
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:500], "β¦") # plain-text briefing
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body=(
"Marcus β acknowledged. Revised figures and short rationale "
"before noon. β Exec"
),
)
)
print("reward:", step.reward)
print("metadata keys:", sorted((step.observation.metadata or {}).keys()))
Docker image (optional): if your OpenEnv client supports it, you can point GhostexecEnv at a container built from the root Dockerfile. Build from repo root:
docker build -t ghostexec-env:latest .
Actions and fields
GhostexecAction (models.py) includes:
action_type |
Typical fields used |
|---|---|
reply_email |
email_id, message_body |
archive_email |
email_id |
reschedule_meeting |
meeting_id, new_time, reason |
cancel_meeting |
meeting_id, reason |
complete_task |
task_id |
delegate_task |
task_id, contact_name |
send_message |
contact_name, message (channel text) |
do_nothing |
β (intentionally weak / penalised path) |
Unknown or malformed HTTP payloads deserialize safely to do_nothing-style defaults where applicable so older clients do not crash.
Observation
GhostexecObservation:
echoed_messageβ Full briefing (emails, conflicts, contacts, tasks, stress, steps remaining).message_lengthβ Length ofechoed_messagefor quick checks.reward,done,metadataβ Step outcome; metadata carries flags such asstep_ok, reward breakdown fields, and ids for debugging.
Reward
Phase-4 scoring (server/reward.py) combines three channels with fixed weights:
[ \text{weighted base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task} ]
Then applies output scaling, invalid-step adjustments, bonuses/penalties, and a floor for do_nothing. Full component values are available on RewardBreakdown and are mirrored into observation metadata where configured. Episode reward traces append to outputs/logs/episode_rewards.jsonl (directory gitignored).
Reward-engineering provenance. The design follows the reward-shaping playbook surveyed in Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications (arXiv:2408.10215): dense per-step shaping around proxy signals (conflict / relationship / task) instead of a single sparse end-of-episode reward, fixed weights to keep channel trade-offs inspectable, and bounded per-step magnitudes to resist hacking.
HTTP vs WebSocket (episode state)
- HTTP
POST /resetandPOST /stepoften bind to short-lived environment instances depending on deployment; consecutive HTTP calls may not share one in-memory episode. - Ghostexec still applies your action against a scenario-primed instance so a lone
POST /stepcan return a meaningful reward and metadata. - WebSocket
/wsβ Use this (orGhostexecEnv(base_url=...), which speaks WebSocket) for multi-step episodes on the same session.
Endpoints (typical OpenEnv layout): /web, /docs, /health, /ws.
Running and testing locally
# Dev server (package layout)
uv run uvicorn ghostexec.server.app:app --reload --host 0.0.0.0 --port 8000
# Or console entrypoint (matches Dockerfile)
uv run server --port 8000
Smoke script (HTTP):
uv run python scripts/http_endpoint_smoke.py --local
uv run python scripts/http_endpoint_smoke.py --url http://127.0.0.1:8000
uv run python scripts/http_endpoint_smoke.py --print-curl
Tests:
uv run pytest tests/ -q
Opt-in Docker build smoke (Phase 1 gate):
GHOSTEXEC_RUN_DOCKER_BUILD=1 uv run pytest tests/test_docker_build.py -q
With the server already on port 8000:
uv run pytest tests/test_live_server_exhaustive.py -v --tb=short
Override live URL (Windows PowerShell example):
$env:GHOSTEXEC_LIVE_BASE_URL = "http://127.0.0.1:9000"
uv run pytest tests/test_live_server_exhaustive.py -q
Optional real WebSocket client check:
# Terminal 1
uv run server --port 8000
# Terminal 2
set GHOSTEXEC_WS_BASE_URL=http://127.0.0.1:8000
uv run pytest tests/test_complete_integration.py::test_ghostexec_env_client_against_live_url_if_set -q
Post-training plot pack (loss + reward + components + baseline bar):
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir outputs/plots
The script writes:
outputs/plots/loss_curve.pngoutputs/plots/reward_curve.pngoutputs/plots/components_curve.pngoutputs/plots/baseline_comparison.png
SFT before GRPO (with partial live-env usage during SFT data generation and GRPO rewards):
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
This performs:
- SFT warm-start on JSONL (
prompt+completion) generated from live/resetbriefings. - GRPO continuation from the SFT adapter.
- Mixed reward shaping where env-derived reward remains active and local shaping can be down-weighted/up-weighted via scales.
- Optional complexity curriculum (
easy_to_full) that starts with stronger scaffold/local signals and anneals to env-dominant reward later. - Stability-first optimization defaults (cosine schedule + warmup + grad clipping + higher GRPO KL beta). Optional
--reward-ema-decay 0..1smooths the env reward channel (defaults come from--training-preset). Training always runs the fullmax_*_steps(no early-stop callbacks).
Recommended model strategy for hackathon iteration speed:
- Start with
--model-preset small_iter_fast(unsloth/Qwen2.5-3B-Instruct) + QLoRA. - Run many short SFT->GRPO loops, improve reward signals, then scale model size only after curves stabilize.
- Use larger presets only when memory + runtime are consistently stable.
- Use
--training-preset hackathon_turboto apply stable aggressive defaults for iterative win-rate. - Script prints SFT/GRPO LoRA delta checks; if deltas are near zero it stops, so you never mistake a no-op run for real finetuning.
Hugging Face Spaces
Full OpenEnv CLI flow from this directory (matches steps 5β8 of the Packaging & Deploying guide):
openenv serve # local dev server on :8000
openenv build # build the Docker image
openenv validate --verbose # structure + Dockerfile + entrypoint checks
openenv push # deploy to HF Spaces
# openenv push --repo-id your-username/ghostexec
Use a public Space for the default hackathon flow unless you intentionally need a private Space. Authenticate with Hugging Face first (huggingface-cli login or equivalent).
Scenarios
| File | Role |
|---|---|
scenarios/phase2_core.json |
Default dense inbox/calendar/tasks fixture |
scenarios/monday_morning.json, dinner_disaster.json, vip_meltdown.json |
Narrative demos |
scenarios/vip_meltdown_drift.json |
Mood / escalation drift |
scenarios/schema_drift_test.json |
Drift-event harness |
Concurrent WebSocket sessions
server/app.py passes GhostexecEnvironment (the class) into create_app with max_concurrent_envs=1 by default. Increase max_concurrent_envs if you need multiple simultaneous WebSocket clients.
Project layout
ghostexec/
βββ openenv.yaml # OpenEnv name, version, description
βββ pyproject.toml # Package metadata + optional extras
βββ uv.lock
βββ models.py # World + GhostexecAction / GhostexecObservation
βββ client.py # GhostexecEnv (WebSocket client)
βββ scenarios/ # World JSON (source of truth for episodes)
βββ scripts/ # http_endpoint_smoke.py
βββ tests/
βββ server/
βββ app.py # FastAPI + create_app
βββ ghostexec_environment.py
βββ reward.py
βββ Dockerfile
Resources & references
Ghostexec is built against the official Meta PyTorch OpenEnv stack. Every design choice below is traceable to one of these sources.
OpenEnv core. The Gymnasium-style reset() / step() / state interface in server/ghostexec_environment.py, the EnvClient subclass in client.py, and the create_app(...) wiring in server/app.py follow the Packaging & Deploying guide exactly.
- Core repo: meta-pytorch/OpenEnv
- Docs: meta-pytorch.org/OpenEnv
OpenEnv Hub (Hugging Face). Target deployment for openenv push. The Space metadata at the top of this README + openenv.yaml are the knobs HF Spaces reads.
- Environments: huggingface.co/openenv
- Spaces: huggingface.co/openenv/spaces
Tutorials. General OpenEnv environment patterns are documented in the official tutorial pages and examples.
- All tutorials: OpenEnv/tutorial
- Environment examples: OpenEnv/envs
YouTube β Building RL environments. Talks from Meta / OpenEnv contributors that informed the scenario-driven reset, WebSocket session model, and reward breakdown used here:
- Building RL Environments with OpenEnv
- OpenEnv Deep Dive
- Agentic RL Environments
- OpenEnv Livestream (4-hour walkthrough)
Reward-engineering papers. See Reward for how each paper maps to specific components of server/reward.py.
- Jnadi, A. (2024). Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications. arXiv:2408.10215. Informs the dense per-step conflict / relationship / task shaping and the bounded-magnitude design.
License
BSD-style β see the license notice at the top of each source file (Meta / OpenEnv lineage).