ghostexec / README.md
modelbuilderhq's picture
Upload folder using huggingface_hub
d669b0f verified
|
raw
history blame
15.5 kB
metadata
title: Ghostexec Environment Server
emoji: πŸ“’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv

Ghostexec

Ghostexec is an OpenEnv-compatible environment that simulates a busy executive’s world: inbox, calendar, contacts, tasks, and stakeholder moods. The agent chooses structured actions (reply, reschedule, delegate, …); the server returns a plain-text briefing as the main observation and a scalar reward shaped around conflict, relationships, and task progress. Scenario data lives in scenarios/*.json β€” nothing is hardcoded in Python for world content.

Manifest: openenv.yaml (name ghostexec, HF Space identifier).
Package: openenv-ghostexec in pyproject.toml (import as ghostexec).


Deliverables

Deliverable URL
Public HF Space (required) TODO: https://huggingface.co/spaces/<org>/ghostexec
Write-up / blog (HF post preferred) TODO: https://huggingface.co/blog/...
Short demo video (<2 min) TODO: https://youtube.com/...

Fill these URLs before submission freeze so reviewers can verify everything from one place.


OpenEnv Hackathon alignment (themes + submission checklist)

Theme fit (examples, not exhaustive): Ghostexec targets Theme 3.2 β€” Personalized tasks (executive-style inbox, calendar, conflicts, delegation via structured actions). Theme 4 is partially supported via curriculum + perturb (GHOSTEXEC_CURRICULUM, GHOSTEXEC_PERTURB) and diverse scenarios under scenarios/.

Minimum submission checklist (fill before freeze):

Item Status
OpenEnv-based env + openenv.yaml Done in-repo (openenv-core[core]>=0.2.3 in pyproject.toml; aligns with current PyPI release line).
Short write-up or <2 min video You: publish and paste links in Deliverables.
Public HF Space URL You: openenv push and paste the URL in Deliverables.

Design narrative

Ghostexec is intentionally built as an AI Chief of Staff environment, not a grid-world clone: the model must triage inbox, calendar, stakeholder mood, and task deadlines under conflict pressure while taking only legal structured actions.

  • Environment Innovation (40%) β€” scenario-driven executive operations with competing priorities, conflict queues, and relationship-sensitive outcomes in scenarios/*.json + server/ghostexec_environment.py.
  • Storytelling & Presentation (30%) β€” each scenario encodes a narrative arc (VIP escalations, family/professional collisions, deadline cascades) so policy behavior reads like realistic assistant decisions rather than abstract moves.
  • Showing Improvement in Rewards (20%) β€” environment reward remains deterministic, inspectable, and traceable through metadata + episode logs under outputs/logs/.
  • Reward Quality (10%) β€” fixed weighted core signal (0.35 conflict / 0.35 relationship / 0.30 task), bounded shaping terms, explicit invalid-action handling, and do_nothing penalties.

This framing gives judges a clear throughline: realistic executive chaos -> constrained legal actions -> measurable policy improvement on held-out scenarios.


Features

  • Legal action set β€” reply_email, archive_email, reschedule_meeting, cancel_meeting, complete_task, delegate_task, send_message, do_nothing (see models.py).
  • Human-readable observations β€” GhostexecObservation.echoed_message is the full briefing text for the model (not raw JSON).
  • Invalid actions β€” Handled in-process: structured metadata (e.g. step_ok), no server crash.
  • Reward β€” Weighted blend of conflict, relationship, and task signals (see Reward); per-step logging under outputs/logs/ (gitignored).
  • HTTP + WebSocket β€” FastAPI app in server/app.py; GhostexecEnv uses WebSockets for persistent episodes.

Quick start (Python client)

From the repo root (ghostexec/ β€” where pyproject.toml lives):

uv sync
uv run server --port 8000

In another terminal or notebook:

from ghostexec import GhostexecAction, GhostexecEnv

with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
    out = env.reset()
    print(out.observation.echoed_message[:500], "…")  # plain-text briefing

    step = env.step(
        GhostexecAction(
            action_type="reply_email",
            email_id="e01",
            message_body=(
                "Marcus β€” acknowledged. Revised figures and short rationale "
                "before noon. β€” Exec"
            ),
        )
    )
    print("reward:", step.reward)
    print("metadata keys:", sorted((step.observation.metadata or {}).keys()))

Docker image (optional): if your OpenEnv client supports it, you can point GhostexecEnv at a container built from the root Dockerfile. Build from repo root:

docker build -t ghostexec-env:latest .

Actions and fields

GhostexecAction (models.py) includes:

action_type Typical fields used
reply_email email_id, message_body
archive_email email_id
reschedule_meeting meeting_id, new_time, reason
cancel_meeting meeting_id, reason
complete_task task_id
delegate_task task_id, contact_name
send_message contact_name, message (channel text)
do_nothing β€” (intentionally weak / penalised path)

Unknown or malformed HTTP payloads deserialize safely to do_nothing-style defaults where applicable so older clients do not crash.


Observation

GhostexecObservation:

  • echoed_message β€” Full briefing (emails, conflicts, contacts, tasks, stress, steps remaining).
  • message_length β€” Length of echoed_message for quick checks.
  • reward, done, metadata β€” Step outcome; metadata carries flags such as step_ok, reward breakdown fields, and ids for debugging.

Reward

Phase-4 scoring (server/reward.py) combines three channels with fixed weights:

[ \text{weighted base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task} ]

Then applies output scaling, invalid-step adjustments, bonuses/penalties, and a floor for do_nothing. Full component values are available on RewardBreakdown and are mirrored into observation metadata where configured. Episode reward traces append to outputs/logs/episode_rewards.jsonl (directory gitignored).

Reward-engineering provenance. The design follows the reward-shaping playbook surveyed in Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications (arXiv:2408.10215): dense per-step shaping around proxy signals (conflict / relationship / task) instead of a single sparse end-of-episode reward, fixed weights to keep channel trade-offs inspectable, and bounded per-step magnitudes to resist hacking.


HTTP vs WebSocket (episode state)

  • HTTP POST /reset and POST /step often bind to short-lived environment instances depending on deployment; consecutive HTTP calls may not share one in-memory episode.
  • Ghostexec still applies your action against a scenario-primed instance so a lone POST /step can return a meaningful reward and metadata.
  • WebSocket /ws β€” Use this (or GhostexecEnv(base_url=...), which speaks WebSocket) for multi-step episodes on the same session.

Endpoints (typical OpenEnv layout): /web, /docs, /health, /ws.


Running and testing locally

# Dev server (package layout)
uv run uvicorn ghostexec.server.app:app --reload --host 0.0.0.0 --port 8000

# Or console entrypoint (matches Dockerfile)
uv run server --port 8000

Smoke script (HTTP):

uv run python scripts/http_endpoint_smoke.py --local
uv run python scripts/http_endpoint_smoke.py --url http://127.0.0.1:8000
uv run python scripts/http_endpoint_smoke.py --print-curl

Tests:

uv run pytest tests/ -q

Opt-in Docker build smoke (Phase 1 gate):

GHOSTEXEC_RUN_DOCKER_BUILD=1 uv run pytest tests/test_docker_build.py -q

With the server already on port 8000:

uv run pytest tests/test_live_server_exhaustive.py -v --tb=short

Override live URL (Windows PowerShell example):

$env:GHOSTEXEC_LIVE_BASE_URL = "http://127.0.0.1:9000"
uv run pytest tests/test_live_server_exhaustive.py -q

Optional real WebSocket client check:

# Terminal 1
uv run server --port 8000
# Terminal 2
set GHOSTEXEC_WS_BASE_URL=http://127.0.0.1:8000
uv run pytest tests/test_complete_integration.py::test_ghostexec_env_client_against_live_url_if_set -q

Post-training plot pack (loss + reward + components + baseline bar):

uv run python scripts/plot_training_report.py \
  --trainer-history outputs/trainer_state.json \
  --reward-csv outputs/reward_log.csv \
  --baselines-json outputs/compliance_manifest.json \
  --out-dir outputs/plots

The script writes:

  • outputs/plots/loss_curve.png
  • outputs/plots/reward_curve.png
  • outputs/plots/components_curve.png
  • outputs/plots/baseline_comparison.png

SFT before GRPO (with partial live-env usage during SFT data generation and GRPO rewards):

uv run python scripts/train_sft_then_grpo.py \
  --model-preset small_iter_fast \
  --training-preset hackathon_turbo \
  --env-url http://127.0.0.1:8000 \
  --generate-sft-from-env \
  --sft-samples 120 \
  --max-sft-steps 60 \
  --max-grpo-steps 120 \
  --env-reward-scale 1.0 \
  --local-reward-scale 0.35 \
  --complexity-curriculum easy_to_full \
  --curriculum-ramp-ratio 0.60

This performs:

  • SFT warm-start on JSONL (prompt + completion) generated from live /reset briefings.
  • GRPO continuation from the SFT adapter.
  • Mixed reward shaping where env-derived reward remains active and local shaping can be down-weighted/up-weighted via scales.
  • Optional complexity curriculum (easy_to_full) that starts with stronger scaffold/local signals and anneals to env-dominant reward later.
  • Stability-first optimization defaults (cosine schedule + warmup + grad clipping + higher GRPO KL beta). Optional --reward-ema-decay 0..1 smooths the env reward channel (defaults come from --training-preset). Training always runs the full max_*_steps (no early-stop callbacks).

Recommended model strategy for hackathon iteration speed:

  • Start with --model-preset small_iter_fast (unsloth/Qwen2.5-3B-Instruct) + QLoRA.
  • Run many short SFT->GRPO loops, improve reward signals, then scale model size only after curves stabilize.
  • Use larger presets only when memory + runtime are consistently stable.
  • Use --training-preset hackathon_turbo to apply stable aggressive defaults for iterative win-rate.
  • Script prints SFT/GRPO LoRA delta checks; if deltas are near zero it stops, so you never mistake a no-op run for real finetuning.

Hugging Face Spaces

Full OpenEnv CLI flow from this directory (matches steps 5–8 of the Packaging & Deploying guide):

openenv serve                       # local dev server on :8000
openenv build                       # build the Docker image
openenv validate --verbose          # structure + Dockerfile + entrypoint checks
openenv push                        # deploy to HF Spaces
# openenv push --repo-id your-username/ghostexec

Use a public Space for the default hackathon flow unless you intentionally need a private Space. Authenticate with Hugging Face first (huggingface-cli login or equivalent).


Scenarios

File Role
scenarios/phase2_core.json Default dense inbox/calendar/tasks fixture
scenarios/monday_morning.json, dinner_disaster.json, vip_meltdown.json Narrative demos
scenarios/vip_meltdown_drift.json Mood / escalation drift
scenarios/schema_drift_test.json Drift-event harness

Concurrent WebSocket sessions

server/app.py passes GhostexecEnvironment (the class) into create_app with max_concurrent_envs=1 by default. Increase max_concurrent_envs if you need multiple simultaneous WebSocket clients.


Project layout

ghostexec/
β”œβ”€β”€ openenv.yaml           # OpenEnv name, version, description
β”œβ”€β”€ pyproject.toml         # Package metadata + optional extras
β”œβ”€β”€ uv.lock
β”œβ”€β”€ models.py              # World + GhostexecAction / GhostexecObservation
β”œβ”€β”€ client.py              # GhostexecEnv (WebSocket client)
β”œβ”€β”€ scenarios/             # World JSON (source of truth for episodes)
β”œβ”€β”€ scripts/               # http_endpoint_smoke.py
β”œβ”€β”€ tests/
└── server/
    β”œβ”€β”€ app.py             # FastAPI + create_app
    β”œβ”€β”€ ghostexec_environment.py
    β”œβ”€β”€ reward.py
    └── Dockerfile

Resources & references

Ghostexec is built against the official Meta PyTorch OpenEnv stack. Every design choice below is traceable to one of these sources.

OpenEnv core. The Gymnasium-style reset() / step() / state interface in server/ghostexec_environment.py, the EnvClient subclass in client.py, and the create_app(...) wiring in server/app.py follow the Packaging & Deploying guide exactly.

OpenEnv Hub (Hugging Face). Target deployment for openenv push. The Space metadata at the top of this README + openenv.yaml are the knobs HF Spaces reads.

Tutorials. General OpenEnv environment patterns are documented in the official tutorial pages and examples.

YouTube β€” Building RL environments. Talks from Meta / OpenEnv contributors that informed the scenario-driven reset, WebSocket session model, and reward breakdown used here:

Reward-engineering papers. See Reward for how each paper maps to specific components of server/reward.py.

  • Jnadi, A. (2024). Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications. arXiv:2408.10215. Informs the dense per-step conflict / relationship / task shaping and the bounded-magnitude design.

License

BSD-style β€” see the license notice at the top of each source file (Meta / OpenEnv lineage).