Spaces:

modelbuilderhq
/

ghostexec

Sleeping

File size: 7,867 Bytes

d815df7
 
 
 
 
 
 
 
 
 
 
 
 
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
 
160c47d
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
 
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
60fe7cd
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
d815df7
60fe7cd
 
d815df7
60fe7cd
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
8c627b1
d815df7
60fe7cd
d815df7
 
60fe7cd
d815df7
 
60fe7cd
d815df7
60fe7cd
d815df7
60fe7cd
d815df7
 
 
 
 
 
60fe7cd
 
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
60fe7cd
d815df7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60fe7cd
 
 
 
 
 
 
 
 
d815df7
60fe7cd
d815df7
 
 
 
 
 
 
 
60fe7cd
d815df7
60fe7cd
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
d815df7
60fe7cd
d815df7
60fe7cd
 
 
 
 
 
 
 
d815df7
60fe7cd
 
 
 
 
 
 
160c47d
60fe7cd
 
 
 
 
d815df7
 
 
 
 
 
 
 
 
 
60fe7cd
d815df7
 
 
60fe7cd
d815df7
 
60fe7cd
d815df7
60fe7cd
 
 
 
d815df7
 
 
60fe7cd

---

title: Ghostexec Environment Server
emoji: 📢
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv
---


# Ghostexec: The AI Chief-of-Staff Environment

Ghostexec is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.

The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.

## Submission Package

| Item | Link |
|------|------|
| Public HF Space (required) | [modelbuilderhq/ghostexec](https://huggingface.co/spaces/modelbuilderhq/ghostexec) |
| OpenEnv manifest | [`openenv.yaml`](openenv.yaml) |
| Training notebook (Colab-ready) | [`notebooks/ghostexec_unsloth_grpo_hf_api.ipynb`](notebooks/ghostexec_unsloth_grpo_hf_api.ipynb) |
| Minimal training script (Unsloth + TRL) | [`scripts/train_sft_then_grpo.py`](scripts/train_sft_then_grpo.py) |
| Mini-blog (required) | [**BLOG.md on Hugging Face**](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md) |
| Demo video &lt;2 minutes (required) | [**YouTube — Ghostexec demo**](https://youtu.be/g4IFZMEzfO8) |

## Why This Environment Is Competitive

- **Novel task composition**: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
- **Non-trivial behavior**: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
- **Dynamic world model**: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
- **Trainable reward signal**: dense step reward for learning plus bounded graders for evaluation.
- **Hackathon fit**: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.

### 1) Our Inovation

- The observation is a realistic text briefing, not a toy tabular state dump.
- Actions are schema-bound (`GhostexecAction`) and validated against live world ids.
- The world evolves after each step (conflict graph, stress, mood, time shifts).
- Drift events in scenario data test robustness to changing conditions.

**Task ladder**

| Task ID | Difficulty | Scenario |
|---------|------------|----------|
| `phase2_core` | easy | `scenarios/phase2_core.json` |
| `monday_morning` | medium | `scenarios/monday_morning.json` |
| `dinner_disaster` | hard | `scenarios/dinner_disaster.json` |

### 2) Overview

Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.

The demo is easy to follow:
1. show the same briefing the model sees,
2. compare weak vs better action choice,
3. show reward movement and policy behavior improvements.

### 3) Improvement in Rewards

The repo includes persisted training artifacts and plot outputs:

- `output/reward_curve.png`
- `output/loss_curve.png`
- `output/baseline_comparison.png`

**Training evidence plots**

![Reward curve](output/reward_curve.png)
*Reward trend across training progression.*

![Loss curve](output/loss_curve.png)
*SFT/GRPO training loss over optimization steps.*

![Baseline comparison](output/baseline_comparison.png)
*Random vs frozen vs trained policy mean episode reward.*

**Current before/after metrics (from saved artifacts)**

| Metric | Baseline | Trained |
|--------|----------|---------|
| Mean step reward | `0.145` | `0.257` |
| Invalid action rate | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
| Grader score | `Not logged in saved artifacts` | `Not logged in saved artifacts` |

### 4) Reward and Training Pipeline 

Ghostexec uses a coherent weighted reward core plus bounded shaping:

\[
\text{weighted\_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}

\]



Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.



Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.



## Quick Start



```bash

uv sync

uv run server --port 8000

```



Python client example:



```python

from ghostexec import GhostexecAction, GhostexecEnv



with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
    out = env.reset()

    print(out.observation.echoed_message[:400], "...")


    step = env.step(

        GhostexecAction(

            action_type="reply_email",

            email_id="e01",

            message_body="Acknowledged. Sending concise revised update before noon.",

        )

    )

    print("reward:", step.reward)

```


## Reproducible Training Commands

```bash

uv run python scripts/train_sft_then_grpo.py \

  --model-preset small_iter_fast \

  --training-preset hackathon_turbo \

  --env-url http://127.0.0.1:8000 \

  --generate-sft-from-env \

  --sft-samples 120 \

  --max-sft-steps 60 \

  --max-grpo-steps 120 \

  --env-reward-scale 1.0 \

  --local-reward-scale 0.35 \

  --complexity-curriculum easy_to_full \

  --curriculum-ramp-ratio 0.60

```

Generate post-train plots:

```bash

uv run python scripts/plot_training_report.py \

  --trainer-history outputs/trainer_state.json \

  --reward-csv outputs/reward_log.csv \

  --baselines-json outputs/compliance_manifest.json \

  --out-dir output

```

## OpenEnv and Space Deployment

```bash

openenv serve

openenv build

openenv validate --verbose

openenv push

```

If needed:

```bash

openenv push --repo-id your-username/ghostexec

```

## Environment API and Contract

- Core endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/docs`, `/ws`
- Observation contains:
  - `echoed_message` (plain-text briefing),
  - optional metadata (step validity, reward breakdown, ids).
- Action schema: see `GhostexecAction` in [`models.py`](models.py).

Supported `action_type` values:

- `reply_email`
- `archive_email`
- `reschedule_meeting`
- `cancel_meeting`
- `complete_task`
- `delegate_task`
- `send_message`
- `do_nothing`

## Submission Readiness Checklist

- [x] OpenEnv latest-compatible environment with valid `openenv.yaml`
- [x] Public HF Space deployed and reachable
- [x] Minimal trainable script using Unsloth + TRL
- [x] Colab-ready notebook for reruns
- [x] Training evidence plots embedded in README
- [x] Add HF blog link — [spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md)
- [x] Add &lt;2 minute YouTube demo link — [youtu.be/g4IFZMEzfO8](https://youtu.be/g4IFZMEzfO8)

## Repository Structure

```text

ghostexec/

├── openenv.yaml

├── pyproject.toml

├── models.py

├── client.py

├── graders.py

├── scenarios/

├── scripts/

├── notebooks/

├── tests/

├── output/

└── server/

    ├── app.py

    ├── ghostexec_environment.py

    └── reward.py

```

## Additional References

- [OpenEnv (Meta PyTorch)](https://github.com/meta-pytorch/OpenEnv)
- [OpenEnv Packaging and Deploying Docs](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)
- [OpenEnv Hub](https://huggingface.co/openenv)
- [Environment Innovation Deep-Dive](environment-innovation/README.md)

## License

BSD-style license as included in this repository and upstream OpenEnv lineage notices.