File size: 10,522 Bytes
21c7db9 f8a246b 21c7db9 f8a246b 21c7db9 f8a246b 21c7db9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | # Deployment
## Local OpenEnv Validation
```bash
bash scripts/bootstrap_openenv.sh
bash scripts/bootstrap_openenv.sh --runtime-check
```
The first command validates local OpenEnv packaging. The runtime check starts the FastAPI environment service and validates `GET /openapi.json`, `GET /health`, `GET /metadata`, `GET /schema`, `POST /mcp`, and the `/reset`/`/step`/`/state` HTTP contract.
## Hugging Face CLI
Use the repository virtual environment CLI:
```bash
./.venv/bin/hf version
./.venv/bin/hf auth login
./.venv/bin/hf auth whoami
```
The global `hf` command on this workstation currently fails because its installed `huggingface_hub` and Typer versions are incompatible. Do not use it for final deployment.
## Hugging Face Space Deployment
```bash
export HF_SPACE_REPO_ID="TheJackBright/polyguard-openenv-workbench"
uv run python scripts/deploy_space_api.py --repo-id "$HF_SPACE_REPO_ID"
uv run python -c "from huggingface_hub import HfApi; print(HfApi().space_info('$HF_SPACE_REPO_ID').id)"
openenv validate --url "https://thejackbright-polyguard-openenv-workbench.hf.space"
```
`scripts/deploy_space_api.py` is the preferred deployment path for this repo because it uploads a valid Docker Space README frontmatter bundle through `huggingface_hub.HfApi`. `scripts/deploy_space.sh` remains available, but the current OpenEnv CLI path may fail with invalid generated `colorFrom`/`colorTo` metadata.
Useful `scripts/deploy_space.sh` flags:
- `--dry-run`: print commands only.
- `--skip-build`: skip `openenv build`.
- `--skip-validate`: skip local validation.
- `--private`: deploy as a private Space.
- `--create-pr`: push deployment changes as a pull request when supported by the OpenEnv CLI.
Default deploy configuration is in [`configs/deployment.yaml`](/Users/daver/Desktop/Meta_Pytorch_OpenEnv_Scaler/polyguard-rl/configs/deployment.yaml).
## Required Submission Evidence
After deployment, replace `docs/results/hf_space_verification.json` with a successful payload that includes:
- `passed: true`
- HF Space repo id
- HF Space URL
- `huggingface_hub.HfApi().space_info(...)` output or summary
- `openenv validate --url ...` result
Current tracked evidence reports `passed: true`, and the public runtime returned healthy metadata during the April 26, 2026 audit. Strict acceptance mode will fail again if this evidence is removed or replaced with a non-passing payload.
## Hugging Face Training Space
Use this path when local Ollama/GPU training is unavailable. It creates a private Docker Space under the authenticated account, starts the Gradio training runner, and uploads outputs/checkpoints to a private artifact repo.
```bash
export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_training_space.py \
--repo-id TheJackBright/polyguard-openenv-training-full \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--hardware a10g-large \
--model-sweep Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct \
--sft-epochs 2 \
--grpo-epochs 1 \
--sft-max-steps 0 \
--grpo-max-steps 0 \
--grpo-max-prompts 0
```
Keep `HF_TOKEN` as a shell environment variable or Hugging Face Space secret only. Do not commit it to source files, notebooks, logs, README text, or report JSON.
The Space executes the notebook-equivalent training loop from `notebooks/09_training_loop.ipynb`, including massive-profile dataset build, SFT baseline training, GRPO environment-reward training, adapter merge, post-save inference, ablations, benchmark comparisons, Qwen model sweep charts, and anti-hacking/overfit checks. `--max-steps 0` means full-epoch training, not a zero-step run.
After the Space uploads artifacts, pull them locally and stop paid GPU usage:
```bash
.venv/bin/python scripts/pull_training_artifacts.py \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
.venv/bin/python scripts/pause_training_space.py \
--repo-id TheJackBright/polyguard-openenv-training-full \
--mode cpu-basic
```
If only the 0.5B Qwen run is needed first, use the run-specific puller after the artifact repo has uploaded files:
```bash
.venv/bin/python scripts/pull_sweep_artifacts.py \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--run-id qwen-qwen2-5-0-5b-instruct
.venv/bin/python scripts/activate_sweep_model.py \
--source sweep \
--run-id qwen-qwen2-5-0-5b-instruct \
--preferred-artifact grpo_adapter
```
For Qwen 1.5B, use the same path with the 1.5B run id:
```bash
.venv/bin/python scripts/pull_sweep_artifacts.py \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--run-id qwen-qwen2-5-1-5b-instruct
.venv/bin/python scripts/activate_sweep_model.py \
--source sweep \
--run-id qwen-qwen2-5-1-5b-instruct \
--preferred-artifact grpo_adapter
```
## Hugging Face Evidence Space
The evidence Space is separate from the training Space and does not retrain. It pulls completed status/artifact metadata, runs verifier-only rollouts, writes charts/JSON/Markdown, and uploads the evidence bundle back under `submission_evidence/qwen_0_5b_1_5b/` when the artifact repo is writable.
```bash
export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_evidence_space.py \
--repo-id TheJackBright/polyguard-openenv-evidence \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
--models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
--hardware cpu-basic
```
Evidence URLs and folders:
- Evidence Space: `https://huggingface.co/spaces/TheJackBright/polyguard-openenv-evidence`
- Training Space status source: `https://thejackbright-polyguard-openenv-training-full.hf.space`
- Active implementation bundle: `https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke`
- Local tracked bundle: `docs/results/submission_evidence_qwen_0_5b_1_5b/`
- Local zip: `submission_bundle/qwen_0_5b_1_5b_evidence.zip`
Pull the evidence bundle after the evidence Space uploads it:
```bash
.venv/bin/python scripts/pull_submission_evidence.py \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
```
As of the April 26, 2026 live check, the training Space status confirms Qwen 0.5B and 1.5B SFT, GRPO, GRPO post-save inference, and policy ablations completed. The artifact repo still lists only `.gitattributes`, so per-run GRPO histories/checkpoints remain `remote_completed_pending_artifact_upload` in the evidence report until upload completes.
## Active Model Artifact Bundle
The current implementation-ready active model bundle is separate from the full remote sweep artifacts. It contains the local active Qwen 0.5B trained/smoke artifacts that the app can use now:
- `checkpoints/grpo_adapter/`
- `checkpoints/sft_adapter/`
- `checkpoints/merged/`
- `manifests/active_model_manifest.json`
- `reports/`
Local bundle:
```text
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke/
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke.zip
```
HF bundle:
```text
https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke
```
Download and restore:
```bash
export HF_TOKEN="$(cat ~/.cache/huggingface/token)"
./.venv/bin/hf download TheJackBright/polyguard-openenv-training-full-artifacts \
--repo-type model \
--include 'usable_model_bundles/local-qwen-0-5b-active-smoke/**' \
--local-dir ./hf_artifacts
cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/grpo_adapter checkpoints/grpo_adapter
cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/sft_adapter checkpoints/sft_adapter
cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/merged checkpoints/merged
mkdir -p checkpoints/active
cp hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/manifests/active_model_manifest.json checkpoints/active/active_model_manifest.json
curl http://127.0.0.1:8200/policy/model_status
```
Current public/tracked evidence should be described as a 3-model SFT-baseline sweep plus a top-level environment-backed GRPO run. Do not claim a full public per-model GRPO sweep unless the private artifacts have been pulled, mirrored into public evidence, and documented. Unauthenticated API checks against the private training artifact repos return an auth error by design.
Expected pulled artifacts include:
- `outputs/reports/hf_sweep_summary.json`
- `outputs/reports/anti_hacking_overfit_report.json`
- `outputs/reports/sweeps/<model>/sft_trl_run.json`
- `outputs/reports/sweeps/<model>/grpo_trl_run.json`
- `outputs/reports/sweeps/<model>/postsave_inference_sft.json`
- `outputs/reports/sweeps/<model>/postsave_inference_grpo.json`
- `outputs/plots/sft_vs_grpo_reward.png`
- `outputs/plots/sft_loss_curves.png`
- `outputs/plots/grpo_reward_curves.png`
- `outputs/plots/qwen_model_grpo_reward.png`
- `outputs/plots/reward_component_bars.png`
- `outputs/plots/anti_cheat_failure_rates.png`
- `outputs/plots/train_holdout_gap.png`
- `outputs/plots/inference_validity_reward.png`
- `outputs/plots/inference_latency_validity.png`
## Local Services
```bash
bash scripts/run_all_local.sh --quick --skip-train
```
This builds local data/model assets, skips TRL training, starts the environment/API/UI services, and runs smoke checks. Local inference defaults to the HF Transformers path; set `POLYGUARD_ENABLE_OLLAMA=true` only when a local Ollama runtime is intentionally available.
For the active-model product path, start the API after activation and verify:
```bash
curl http://127.0.0.1:8200/policy/model_status
curl -X POST http://127.0.0.1:8200/policy/infer
```
`/policy/model_status` reports the active run id, preferred artifact, local artifact availability, loaded source, and any model-load error. The Patient Workbench displays the same active/fallback state in the header.
## Live Submission Link Validation
The normal acceptance gate stays offline-friendly and checks link presence/shape. After publishing the final story URL, run:
```bash
uv run python scripts/validate_submission_links.py
```
This command performs live HTTP checks for public README URLs, skips localhost/dev URLs, and fails if the selected Hugging Face blog or YouTube story artifact is still unavailable.
|