File size: 7,867 Bytes
d815df7
 
 
 
 
 
 
 
 
 
 
 
 
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
 
160c47d
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
 
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
60fe7cd
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
d815df7
60fe7cd
 
d815df7
60fe7cd
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
 
60fe7cd
d815df7
 
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
 
 
 
 
 
60fe7cd
 
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60fe7cd
 
 
 
 
 
 
 
 
d815df7
60fe7cd
d815df7
 
 
 
 
 
 
 
60fe7cd
d815df7
60fe7cd
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
 
 
 
d815df7
60fe7cd
 
 
 
 
 
 
160c47d
60fe7cd
 
 
 
 
d815df7
 
 
 
 
 
 
 
 
 
60fe7cd
d815df7
 
 
60fe7cd
d815df7
 
60fe7cd
d815df7
60fe7cd
 
 
 
d815df7
 
 
60fe7cd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---

title: Ghostexec Environment Server
emoji: πŸ“’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv
---


# Ghostexec: The AI Chief-of-Staff Environment

Ghostexec is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.

The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.

## Submission Package

| Item | Link |
|------|------|
| Public HF Space (required) | [modelbuilderhq/ghostexec](https://huggingface.co/spaces/modelbuilderhq/ghostexec) |
| OpenEnv manifest | [`openenv.yaml`](openenv.yaml) |
| Training notebook (Colab-ready) | [`notebooks/ghostexec_unsloth_grpo_hf_api.ipynb`](notebooks/ghostexec_unsloth_grpo_hf_api.ipynb) |
| Minimal training script (Unsloth + TRL) | [`scripts/train_sft_then_grpo.py`](scripts/train_sft_then_grpo.py) |
| Mini-blog (required) | [**BLOG.md on Hugging Face**](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md) |
| Demo video <2 minutes (required) | [**YouTube β€” Ghostexec demo**](https://youtu.be/g4IFZMEzfO8) |

## Why This Environment Is Competitive

- **Novel task composition**: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
- **Non-trivial behavior**: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
- **Dynamic world model**: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
- **Trainable reward signal**: dense step reward for learning plus bounded graders for evaluation.
- **Hackathon fit**: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.

### 1) Our Inovation

- The observation is a realistic text briefing, not a toy tabular state dump.
- Actions are schema-bound (`GhostexecAction`) and validated against live world ids.
- The world evolves after each step (conflict graph, stress, mood, time shifts).
- Drift events in scenario data test robustness to changing conditions.

**Task ladder**

| Task ID | Difficulty | Scenario |
|---------|------------|----------|
| `phase2_core` | easy | `scenarios/phase2_core.json` |
| `monday_morning` | medium | `scenarios/monday_morning.json` |
| `dinner_disaster` | hard | `scenarios/dinner_disaster.json` |

### 2) Overview

Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.

The demo is easy to follow:
1. show the same briefing the model sees,
2. compare weak vs better action choice,
3. show reward movement and policy behavior improvements.

### 3) Improvement in Rewards

The repo includes persisted training artifacts and plot outputs:

- `output/reward_curve.png`
- `output/loss_curve.png`
- `output/baseline_comparison.png`

**Training evidence plots**

![Reward curve](output/reward_curve.png)
*Reward trend across training progression.*

![Loss curve](output/loss_curve.png)
*SFT/GRPO training loss over optimization steps.*

![Baseline comparison](output/baseline_comparison.png)
*Random vs frozen vs trained policy mean episode reward.*

**Current before/after metrics (from saved artifacts)**

| Metric | Baseline | Trained |
|--------|----------|---------|
| Mean step reward | `0.145` | `0.257` |
| Invalid action rate | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
| Grader score | `Not logged in saved artifacts` | `Not logged in saved artifacts` |

### 4) Reward and Training Pipeline 

Ghostexec uses a coherent weighted reward core plus bounded shaping:

\[
\text{weighted\_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}

\]



Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.



Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.



## Quick Start



```bash

uv sync

uv run server --port 8000

```



Python client example:



```python

from ghostexec import GhostexecAction, GhostexecEnv



with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
    out = env.reset()

    print(out.observation.echoed_message[:400], "...")


    step = env.step(

        GhostexecAction(

            action_type="reply_email",

            email_id="e01",

            message_body="Acknowledged. Sending concise revised update before noon.",

        )

    )

    print("reward:", step.reward)

```


## Reproducible Training Commands

```bash

uv run python scripts/train_sft_then_grpo.py \

  --model-preset small_iter_fast \

  --training-preset hackathon_turbo \

  --env-url http://127.0.0.1:8000 \

  --generate-sft-from-env \

  --sft-samples 120 \

  --max-sft-steps 60 \

  --max-grpo-steps 120 \

  --env-reward-scale 1.0 \

  --local-reward-scale 0.35 \

  --complexity-curriculum easy_to_full \

  --curriculum-ramp-ratio 0.60

```

Generate post-train plots:

```bash

uv run python scripts/plot_training_report.py \

  --trainer-history outputs/trainer_state.json \

  --reward-csv outputs/reward_log.csv \

  --baselines-json outputs/compliance_manifest.json \

  --out-dir output

```

## OpenEnv and Space Deployment

```bash

openenv serve

openenv build

openenv validate --verbose

openenv push

```

If needed:

```bash

openenv push --repo-id your-username/ghostexec

```

## Environment API and Contract

- Core endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/docs`, `/ws`
- Observation contains:
  - `echoed_message` (plain-text briefing),
  - optional metadata (step validity, reward breakdown, ids).
- Action schema: see `GhostexecAction` in [`models.py`](models.py).

Supported `action_type` values:

- `reply_email`
- `archive_email`
- `reschedule_meeting`
- `cancel_meeting`
- `complete_task`
- `delegate_task`
- `send_message`
- `do_nothing`

## Submission Readiness Checklist

- [x] OpenEnv latest-compatible environment with valid `openenv.yaml`
- [x] Public HF Space deployed and reachable
- [x] Minimal trainable script using Unsloth + TRL
- [x] Colab-ready notebook for reruns
- [x] Training evidence plots embedded in README
- [x] Add HF blog link β€” [spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md)
- [x] Add <2 minute YouTube demo link β€” [youtu.be/g4IFZMEzfO8](https://youtu.be/g4IFZMEzfO8)

## Repository Structure

```text

ghostexec/

β”œβ”€β”€ openenv.yaml

β”œβ”€β”€ pyproject.toml

β”œβ”€β”€ models.py

β”œβ”€β”€ client.py

β”œβ”€β”€ graders.py

β”œβ”€β”€ scenarios/

β”œβ”€β”€ scripts/

β”œβ”€β”€ notebooks/

β”œβ”€β”€ tests/

β”œβ”€β”€ output/

└── server/

    β”œβ”€β”€ app.py

    β”œβ”€β”€ ghostexec_environment.py

    └── reward.py

```

## Additional References

- [OpenEnv (Meta PyTorch)](https://github.com/meta-pytorch/OpenEnv)
- [OpenEnv Packaging and Deploying Docs](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)
- [OpenEnv Hub](https://huggingface.co/openenv)
- [Environment Innovation Deep-Dive](environment-innovation/README.md)

## License

BSD-style license as included in this repository and upstream OpenEnv lineage notices.