File size: 10,502 Bytes
fd0c71a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# Deployment

## Local OpenEnv Validation

```bash
bash scripts/bootstrap_openenv.sh
bash scripts/bootstrap_openenv.sh --runtime-check
```

The first command validates local OpenEnv packaging. The runtime check starts the FastAPI environment service and validates `GET /openapi.json`, `GET /health`, `GET /metadata`, `GET /schema`, `POST /mcp`, and the `/reset`/`/step`/`/state` HTTP contract.

## Hugging Face CLI

Use the repository virtual environment CLI:

```bash
./.venv/bin/hf version
./.venv/bin/hf auth login
./.venv/bin/hf auth whoami
```

The global `hf` command on this workstation currently fails because its installed `huggingface_hub` and Typer versions are incompatible. Do not use it for final deployment.

## Hugging Face Space Deployment

```bash
export HF_SPACE_REPO_ID="TheJackBright/polyguard-openenv"
uv run python scripts/deploy_space_api.py --repo-id "$HF_SPACE_REPO_ID"
uv run python -c "from huggingface_hub import HfApi; print(HfApi().space_info('$HF_SPACE_REPO_ID').id)"
openenv validate --url "https://thejackbright-polyguard-openenv.hf.space"
```

`scripts/deploy_space_api.py` is the preferred deployment path for this repo because it uploads a valid Docker Space README frontmatter bundle through `huggingface_hub.HfApi`. `scripts/deploy_space.sh` remains available, but the current OpenEnv CLI path may fail with invalid generated `colorFrom`/`colorTo` metadata.

Useful `scripts/deploy_space.sh` flags:

- `--dry-run`: print commands only.
- `--skip-build`: skip `openenv build`.
- `--skip-validate`: skip local validation.
- `--private`: deploy as a private Space.
- `--create-pr`: push deployment changes as a pull request when supported by the OpenEnv CLI.

Default deploy configuration is in [`configs/deployment.yaml`](/Users/daver/Desktop/Meta_Pytorch_OpenEnv_Scaler/polyguard-rl/configs/deployment.yaml).

## Required Submission Evidence

After deployment, replace `docs/results/hf_space_verification.json` with a successful payload that includes:

- `passed: true`
- HF Space repo id
- HF Space URL
- `huggingface_hub.HfApi().space_info(...)` output or summary
- `openenv validate --url ...` result

Current tracked evidence reports `passed: true`, and the public runtime returned healthy metadata during the April 26, 2026 audit. Strict acceptance mode will fail again if this evidence is removed or replaced with a non-passing payload.

## Hugging Face Training Space

Use this path when local Ollama/GPU training is unavailable. It creates a private Docker Space under the authenticated account, starts the Gradio training runner, and uploads outputs/checkpoints to a private artifact repo.

```bash
export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_training_space.py \
  --repo-id TheJackBright/polyguard-openenv-training-full \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --hardware a10g-large \
  --model-sweep Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct \
  --sft-epochs 2 \
  --grpo-epochs 1 \
  --sft-max-steps 0 \
  --grpo-max-steps 0 \
  --grpo-max-prompts 0
```

Keep `HF_TOKEN` as a shell environment variable or Hugging Face Space secret only. Do not commit it to source files, notebooks, logs, README text, or report JSON.

The Space executes the notebook-equivalent training loop from `notebooks/09_training_loop.ipynb`, including massive-profile dataset build, SFT baseline training, GRPO environment-reward training, adapter merge, post-save inference, ablations, benchmark comparisons, Qwen model sweep charts, and anti-hacking/overfit checks. `--max-steps 0` means full-epoch training, not a zero-step run.

After the Space uploads artifacts, pull them locally and stop paid GPU usage:

```bash
.venv/bin/python scripts/pull_training_artifacts.py \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
.venv/bin/python scripts/pause_training_space.py \
  --repo-id TheJackBright/polyguard-openenv-training-full \
  --mode cpu-basic
```

If only the 0.5B Qwen run is needed first, use the run-specific puller after the artifact repo has uploaded files:

```bash
.venv/bin/python scripts/pull_sweep_artifacts.py \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --run-id qwen-qwen2-5-0-5b-instruct
.venv/bin/python scripts/activate_sweep_model.py \
  --source sweep \
  --run-id qwen-qwen2-5-0-5b-instruct \
  --preferred-artifact grpo_adapter
```

For Qwen 1.5B, use the same path with the 1.5B run id:

```bash
.venv/bin/python scripts/pull_sweep_artifacts.py \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --run-id qwen-qwen2-5-1-5b-instruct
.venv/bin/python scripts/activate_sweep_model.py \
  --source sweep \
  --run-id qwen-qwen2-5-1-5b-instruct \
  --preferred-artifact grpo_adapter
```

## Hugging Face Evidence Space

The evidence Space is separate from the training Space and does not retrain. It pulls completed status/artifact metadata, runs verifier-only rollouts, writes charts/JSON/Markdown, and uploads the evidence bundle back under `submission_evidence/qwen_0_5b_1_5b/` when the artifact repo is writable.

```bash
export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_evidence_space.py \
  --repo-id TheJackBright/polyguard-openenv-evidence \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
  --models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
  --hardware cpu-basic
```

Evidence URLs and folders:

- Evidence Space: `https://huggingface.co/spaces/TheJackBright/polyguard-openenv-evidence`
- Training Space status source: `https://thejackbright-polyguard-openenv-training-full.hf.space`
- Active implementation bundle: `https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke`
- Local tracked bundle: `docs/results/submission_evidence_qwen_0_5b_1_5b/`
- Local zip: `submission_bundle/qwen_0_5b_1_5b_evidence.zip`

Pull the evidence bundle after the evidence Space uploads it:

```bash
.venv/bin/python scripts/pull_submission_evidence.py \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
```

As of the April 26, 2026 live check, the training Space status confirms Qwen 0.5B and 1.5B SFT, GRPO, GRPO post-save inference, and policy ablations completed. The artifact repo still lists only `.gitattributes`, so per-run GRPO histories/checkpoints remain `remote_completed_pending_artifact_upload` in the evidence report until upload completes.

## Active Model Artifact Bundle

The current implementation-ready active model bundle is separate from the full remote sweep artifacts. It contains the local active Qwen 0.5B trained/smoke artifacts that the app can use now:

- `checkpoints/grpo_adapter/`
- `checkpoints/sft_adapter/`
- `checkpoints/merged/`
- `manifests/active_model_manifest.json`
- `reports/`

Local bundle:

```text
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke/
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke.zip
```

HF bundle:

```text
https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke
```

Download and restore:

```bash
export HF_TOKEN="$(cat ~/.cache/huggingface/token)"
./.venv/bin/hf download TheJackBright/polyguard-openenv-training-full-artifacts \
  --repo-type model \
  --include 'usable_model_bundles/local-qwen-0-5b-active-smoke/**' \
  --local-dir ./hf_artifacts

cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/grpo_adapter checkpoints/grpo_adapter
cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/sft_adapter checkpoints/sft_adapter
cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/merged checkpoints/merged
mkdir -p checkpoints/active
cp hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/manifests/active_model_manifest.json checkpoints/active/active_model_manifest.json
curl http://127.0.0.1:8200/policy/model_status
```

Current public/tracked evidence should be described as a 3-model SFT-baseline sweep plus a top-level environment-backed GRPO run. Do not claim a full public per-model GRPO sweep unless the private artifacts have been pulled, mirrored into public evidence, and documented. Unauthenticated API checks against the private training artifact repos return an auth error by design.

Expected pulled artifacts include:

- `outputs/reports/hf_sweep_summary.json`
- `outputs/reports/anti_hacking_overfit_report.json`
- `outputs/reports/sweeps/<model>/sft_trl_run.json`
- `outputs/reports/sweeps/<model>/grpo_trl_run.json`
- `outputs/reports/sweeps/<model>/postsave_inference_sft.json`
- `outputs/reports/sweeps/<model>/postsave_inference_grpo.json`
- `outputs/plots/sft_vs_grpo_reward.png`
- `outputs/plots/sft_loss_curves.png`
- `outputs/plots/grpo_reward_curves.png`
- `outputs/plots/qwen_model_grpo_reward.png`
- `outputs/plots/reward_component_bars.png`
- `outputs/plots/anti_cheat_failure_rates.png`
- `outputs/plots/train_holdout_gap.png`
- `outputs/plots/inference_validity_reward.png`
- `outputs/plots/inference_latency_validity.png`

## Local Services

```bash
bash scripts/run_all_local.sh --quick --skip-train
```

This builds local data/model assets, skips TRL training, starts the environment/API/UI services, and runs smoke checks. Local inference defaults to the HF Transformers path; set `POLYGUARD_ENABLE_OLLAMA=true` only when a local Ollama runtime is intentionally available.

For the active-model product path, start the API after activation and verify:

```bash
curl http://127.0.0.1:8200/policy/model_status
curl -X POST http://127.0.0.1:8200/policy/infer
```

`/policy/model_status` reports the active run id, preferred artifact, local artifact availability, loaded source, and any model-load error. The Patient Workbench displays the same active/fallback state in the header.

## Live Submission Link Validation

The normal acceptance gate stays offline-friendly and checks link presence/shape. After publishing the final story URL, run:

```bash
uv run python scripts/validate_submission_links.py
```

This command performs live HTTP checks for public README URLs, skips localhost/dev URLs, and fails if the selected Hugging Face blog or YouTube story artifact is still unavailable.