Spaces:

adithya9903
/

polyguard-openenv-training-3b-continuation

Paused

App Files Files Community

polyguard-openenv-training-3b-continuation / docs /deployment.md

adithya9903

Deploy PolyGuard HF training Space

fd0c71a verified 12 days ago

preview code

raw

history blame contribute delete

10.5 kB

	# Deployment

	## Local OpenEnv Validation

	```bash
	bash scripts/bootstrap_openenv.sh
	bash scripts/bootstrap_openenv.sh --runtime-check
	```

	The first command validates local OpenEnv packaging. The runtime check starts the FastAPI environment service and validates `GET /openapi.json`, `GET /health`, `GET /metadata`, `GET /schema`, `POST /mcp`, and the `/reset`/`/step`/`/state` HTTP contract.

	## Hugging Face CLI

	Use the repository virtual environment CLI:

	```bash
	./.venv/bin/hf version
	./.venv/bin/hf auth login
	./.venv/bin/hf auth whoami
	```

	The global `hf` command on this workstation currently fails because its installed `huggingface_hub` and Typer versions are incompatible. Do not use it for final deployment.

	## Hugging Face Space Deployment

	```bash
	export HF_SPACE_REPO_ID="TheJackBright/polyguard-openenv"
	uv run python scripts/deploy_space_api.py --repo-id "$HF_SPACE_REPO_ID"
	uv run python -c "from huggingface_hub import HfApi; print(HfApi().space_info('$HF_SPACE_REPO_ID').id)"
	openenv validate --url "https://thejackbright-polyguard-openenv.hf.space"
	```

	`scripts/deploy_space_api.py` is the preferred deployment path for this repo because it uploads a valid Docker Space README frontmatter bundle through `huggingface_hub.HfApi`. `scripts/deploy_space.sh` remains available, but the current OpenEnv CLI path may fail with invalid generated `colorFrom`/`colorTo` metadata.

	Useful `scripts/deploy_space.sh` flags:

	- `--dry-run`: print commands only.
	- `--skip-build`: skip `openenv build`.
	- `--skip-validate`: skip local validation.
	- `--private`: deploy as a private Space.
	- `--create-pr`: push deployment changes as a pull request when supported by the OpenEnv CLI.

	Default deploy configuration is in [`configs/deployment.yaml`](/Users/daver/Desktop/Meta_Pytorch_OpenEnv_Scaler/polyguard-rl/configs/deployment.yaml).

	## Required Submission Evidence

	After deployment, replace `docs/results/hf_space_verification.json` with a successful payload that includes:

	- `passed: true`
	- HF Space repo id
	- HF Space URL
	- `huggingface_hub.HfApi().space_info(...)` output or summary
	- `openenv validate --url ...` result

	Current tracked evidence reports `passed: true`, and the public runtime returned healthy metadata during the April 26, 2026 audit. Strict acceptance mode will fail again if this evidence is removed or replaced with a non-passing payload.

	## Hugging Face Training Space

	Use this path when local Ollama/GPU training is unavailable. It creates a private Docker Space under the authenticated account, starts the Gradio training runner, and uploads outputs/checkpoints to a private artifact repo.

	```bash
	export HF_TOKEN="<write-token>"
	.venv/bin/python scripts/deploy_training_space.py \
	--repo-id TheJackBright/polyguard-openenv-training-full \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
	--hardware a10g-large \
	--model-sweep Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct \
	--sft-epochs 2 \
	--grpo-epochs 1 \
	--sft-max-steps 0 \
	--grpo-max-steps 0 \
	--grpo-max-prompts 0
	```

	Keep `HF_TOKEN` as a shell environment variable or Hugging Face Space secret only. Do not commit it to source files, notebooks, logs, README text, or report JSON.

	The Space executes the notebook-equivalent training loop from `notebooks/09_training_loop.ipynb`, including massive-profile dataset build, SFT baseline training, GRPO environment-reward training, adapter merge, post-save inference, ablations, benchmark comparisons, Qwen model sweep charts, and anti-hacking/overfit checks. `--max-steps 0` means full-epoch training, not a zero-step run.

	After the Space uploads artifacts, pull them locally and stop paid GPU usage:

	```bash
	.venv/bin/python scripts/pull_training_artifacts.py \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
	.venv/bin/python scripts/pause_training_space.py \
	--repo-id TheJackBright/polyguard-openenv-training-full \
	--mode cpu-basic
	```

	If only the 0.5B Qwen run is needed first, use the run-specific puller after the artifact repo has uploaded files:

	```bash
	.venv/bin/python scripts/pull_sweep_artifacts.py \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
	--run-id qwen-qwen2-5-0-5b-instruct
	.venv/bin/python scripts/activate_sweep_model.py \
	--source sweep \
	--run-id qwen-qwen2-5-0-5b-instruct \
	--preferred-artifact grpo_adapter
	```

	For Qwen 1.5B, use the same path with the 1.5B run id:

	```bash
	.venv/bin/python scripts/pull_sweep_artifacts.py \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
	--run-id qwen-qwen2-5-1-5b-instruct
	.venv/bin/python scripts/activate_sweep_model.py \
	--source sweep \
	--run-id qwen-qwen2-5-1-5b-instruct \
	--preferred-artifact grpo_adapter
	```

	## Hugging Face Evidence Space

	The evidence Space is separate from the training Space and does not retrain. It pulls completed status/artifact metadata, runs verifier-only rollouts, writes charts/JSON/Markdown, and uploads the evidence bundle back under `submission_evidence/qwen_0_5b_1_5b/` when the artifact repo is writable.

	```bash
	export HF_TOKEN="<write-token>"
	.venv/bin/python scripts/deploy_evidence_space.py \
	--repo-id TheJackBright/polyguard-openenv-evidence \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
	--training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
	--models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
	--hardware cpu-basic
	```

	Evidence URLs and folders:

	- Evidence Space: `https://huggingface.co/spaces/TheJackBright/polyguard-openenv-evidence`
	- Training Space status source: `https://thejackbright-polyguard-openenv-training-full.hf.space`
	- Active implementation bundle: `https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke`
	- Local tracked bundle: `docs/results/submission_evidence_qwen_0_5b_1_5b/`
	- Local zip: `submission_bundle/qwen_0_5b_1_5b_evidence.zip`

	Pull the evidence bundle after the evidence Space uploads it:

	```bash
	.venv/bin/python scripts/pull_submission_evidence.py \
	--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
	```

	As of the April 26, 2026 live check, the training Space status confirms Qwen 0.5B and 1.5B SFT, GRPO, GRPO post-save inference, and policy ablations completed. The artifact repo still lists only `.gitattributes`, so per-run GRPO histories/checkpoints remain `remote_completed_pending_artifact_upload` in the evidence report until upload completes.

	## Active Model Artifact Bundle

	The current implementation-ready active model bundle is separate from the full remote sweep artifacts. It contains the local active Qwen 0.5B trained/smoke artifacts that the app can use now:

	- `checkpoints/grpo_adapter/`
	- `checkpoints/sft_adapter/`
	- `checkpoints/merged/`
	- `manifests/active_model_manifest.json`
	- `reports/`

	Local bundle:

	```text
	submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke/
	submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke.zip
	```

	HF bundle:

	```text
	https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke
	```

	Download and restore:

	```bash
	export HF_TOKEN="$(cat ~/.cache/huggingface/token)"
	./.venv/bin/hf download TheJackBright/polyguard-openenv-training-full-artifacts \
	--repo-type model \
	--include 'usable_model_bundles/local-qwen-0-5b-active-smoke/**' \
	--local-dir ./hf_artifacts

	cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/grpo_adapter checkpoints/grpo_adapter
	cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/sft_adapter checkpoints/sft_adapter
	cp -R hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/checkpoints/merged checkpoints/merged
	mkdir -p checkpoints/active
	cp hf_artifacts/usable_model_bundles/local-qwen-0-5b-active-smoke/manifests/active_model_manifest.json checkpoints/active/active_model_manifest.json
	curl http://127.0.0.1:8200/policy/model_status
	```

	Current public/tracked evidence should be described as a 3-model SFT-baseline sweep plus a top-level environment-backed GRPO run. Do not claim a full public per-model GRPO sweep unless the private artifacts have been pulled, mirrored into public evidence, and documented. Unauthenticated API checks against the private training artifact repos return an auth error by design.

	Expected pulled artifacts include:

	- `outputs/reports/hf_sweep_summary.json`
	- `outputs/reports/anti_hacking_overfit_report.json`
	- `outputs/reports/sweeps/<model>/sft_trl_run.json`
	- `outputs/reports/sweeps/<model>/grpo_trl_run.json`
	- `outputs/reports/sweeps/<model>/postsave_inference_sft.json`
	- `outputs/reports/sweeps/<model>/postsave_inference_grpo.json`
	- `outputs/plots/sft_vs_grpo_reward.png`
	- `outputs/plots/sft_loss_curves.png`
	- `outputs/plots/grpo_reward_curves.png`
	- `outputs/plots/qwen_model_grpo_reward.png`
	- `outputs/plots/reward_component_bars.png`
	- `outputs/plots/anti_cheat_failure_rates.png`
	- `outputs/plots/train_holdout_gap.png`
	- `outputs/plots/inference_validity_reward.png`
	- `outputs/plots/inference_latency_validity.png`

	## Local Services

	```bash
	bash scripts/run_all_local.sh --quick --skip-train
	```

	This builds local data/model assets, skips TRL training, starts the environment/API/UI services, and runs smoke checks. Local inference defaults to the HF Transformers path; set `POLYGUARD_ENABLE_OLLAMA=true` only when a local Ollama runtime is intentionally available.

	For the active-model product path, start the API after activation and verify:

	```bash
	curl http://127.0.0.1:8200/policy/model_status
	curl -X POST http://127.0.0.1:8200/policy/infer
	```

	`/policy/model_status` reports the active run id, preferred artifact, local artifact availability, loaded source, and any model-load error. The Patient Workbench displays the same active/fallback state in the header.

	## Live Submission Link Validation

	The normal acceptance gate stays offline-friendly and checks link presence/shape. After publishing the final story URL, run:

	```bash
	uv run python scripts/validate_submission_links.py
	```

	This command performs live HTTP checks for public README URLs, skips localhost/dev URLs, and fails if the selected Hugging Face blog or YouTube story artifact is still unavailable.