Spaces:
Sleeping
Sleeping
File size: 7,959 Bytes
d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd 160c47d 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd 160c47d 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd d815df7 60fe7cd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | ---
title: Ghostexec Environment Server
emoji: π’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
---
# Ghostexec: The AI Chief-of-Staff Environment
Ghostexec is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.
The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.
## Submission Package
| Item | Link |
|------|------|
| Public HF Space (required) | [modelbuilderhq/ghostexec](https://huggingface.co/spaces/modelbuilderhq/ghostexec) |
| OpenEnv manifest | [`openenv.yaml`](openenv.yaml) |
| Training notebook (Colab-ready) | [`notebooks/ghostexec_unsloth_grpo_hf_api.ipynb`](notebooks/ghostexec_unsloth_grpo_hf_api.ipynb) |
| Minimal training script (Unsloth + TRL) | [`scripts/train_sft_then_grpo.py`](scripts/train_sft_then_grpo.py) |
| Mini-blog (required) | [**BLOG.md on Hugging Face**](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md) |
| Demo video <2 minutes (required) | [**YouTube β Ghostexec demo**](https://youtu.be/g4IFZMEzfO8) |
## Why This Environment Is Competitive
- **Novel task composition**: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
- **Non-trivial behavior**: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
- **Dynamic world model**: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
- **Trainable reward signal**: dense step reward for learning plus bounded graders for evaluation.
- **Hackathon fit**: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.
## Judging-Criteria Mapping
### 1) Environment Innovation (40%)
- The observation is a realistic text briefing, not a toy tabular state dump.
- Actions are schema-bound (`GhostexecAction`) and validated against live world ids.
- The world evolves after each step (conflict graph, stress, mood, time shifts).
- Drift events in scenario data test robustness to changing conditions.
**Task ladder**
| Task ID | Difficulty | Scenario |
|---------|------------|----------|
| `phase2_core` | easy | `scenarios/phase2_core.json` |
| `monday_morning` | medium | `scenarios/monday_morning.json` |
| `dinner_disaster` | hard | `scenarios/dinner_disaster.json` |
### 2) Storytelling and Presentation (30%)
Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.
The demo is easy to follow:
1. show the same briefing the model sees,
2. compare weak vs better action choice,
3. show reward movement and policy behavior improvements.
### 3) Showing Improvement in Rewards (20%)
The repo includes persisted training artifacts and plot outputs:
- `output/reward_curve.png`
- `output/loss_curve.png`
- `output/baseline_comparison.png`
**Training evidence plots**

*Reward trend across training progression.*

*SFT/GRPO training loss over optimization steps.*

*Random vs frozen vs trained policy mean episode reward.*
**Current before/after metrics (from saved artifacts)**
| Metric | Baseline | Trained |
|--------|----------|---------|
| Mean step reward | `0.145` | `0.257` |
| Invalid action rate | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
| Grader score | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
### 4) Reward and Training Pipeline (10%)
Ghostexec uses a coherent weighted reward core plus bounded shaping:
\[
\text{weighted\_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}
\]
Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.
Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.
## Quick Start
```bash
uv sync
uv run server --port 8000
```
Python client example:
```python
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:400], "...")
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body="Acknowledged. Sending concise revised update before noon.",
)
)
print("reward:", step.reward)
```
## Reproducible Training Commands
```bash
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
```
Generate post-train plots:
```bash
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir output
```
## OpenEnv and Space Deployment
```bash
openenv serve
openenv build
openenv validate --verbose
openenv push
```
If needed:
```bash
openenv push --repo-id your-username/ghostexec
```
## Environment API and Contract
- Core endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/docs`, `/ws`
- Observation contains:
- `echoed_message` (plain-text briefing),
- optional metadata (step validity, reward breakdown, ids).
- Action schema: see `GhostexecAction` in [`models.py`](models.py).
Supported `action_type` values:
- `reply_email`
- `archive_email`
- `reschedule_meeting`
- `cancel_meeting`
- `complete_task`
- `delegate_task`
- `send_message`
- `do_nothing`
## Submission Readiness Checklist
- [x] OpenEnv latest-compatible environment with valid `openenv.yaml`
- [x] Public HF Space deployed and reachable
- [x] Minimal trainable script using Unsloth + TRL
- [x] Colab-ready notebook for reruns
- [x] Training evidence plots embedded in README
- [x] Add HF blog link β [spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md)
- [x] Add <2 minute YouTube demo link β [youtu.be/g4IFZMEzfO8](https://youtu.be/g4IFZMEzfO8)
## Repository Structure
```text
ghostexec/
βββ openenv.yaml
βββ pyproject.toml
βββ models.py
βββ client.py
βββ graders.py
βββ scenarios/
βββ scripts/
βββ notebooks/
βββ tests/
βββ output/
βββ server/
βββ app.py
βββ ghostexec_environment.py
βββ reward.py
```
## Additional References
- [OpenEnv (Meta PyTorch)](https://github.com/meta-pytorch/OpenEnv)
- [OpenEnv Packaging and Deploying Docs](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)
- [OpenEnv Hub](https://huggingface.co/openenv)
- [Environment Innovation Deep-Dive](environment-innovation/README.md)
## License
BSD-style license as included in this repository and upstream OpenEnv lineage notices.
|