File size: 6,529 Bytes
21c7db9 f8a246b 21c7db9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | # Submission Checklist
## Required Narrative
- Problem statement clearly states the capability gap: safe long-horizon polypharmacy action selection.
- Environment describes observation, action, state, episode termination, and OpenEnv endpoints.
- Agent capabilities cover med reconciliation, evidence, graph safety, dosing, candidate generation, planning, critique, and explanation.
- Tasks cover DDI risk, safer substitutions, taper/deprescribing, precision dosing, missing-data recovery, and new-drug decomposition.
- Reward/evaluation logic documents the 13 reward columns, 4 primary channels, anti-cheat checks, timeouts, and offline evaluation.
- Post-training/self-improvement strategy documents SFT warm start, GRPO with environment rewards, ablations, adapter export, and post-save inference validation.
## Required Deliverables
- GitHub repo with all required links in README.
- Hugging Face Space URL.
- Colab notebook URL.
- YouTube video URL or Hugging Face blog URL. The current README blog URL is the intended target but still returns 404 until published.
- Tracked plots and compact reports under `docs/results/`.
- Successful `docs/results/hf_space_verification.json` with `passed: true`.
- Participant-guide traceability map in `docs/participant_guide_traceability.md`.
## Commands To Validate Before Submission
```bash
uv run pytest
uv run openenv validate .
bash scripts/bootstrap_openenv.sh --runtime-check
(cd app/ui/frontend && npm run build)
.venv/bin/python scripts/evaluate_baselines.py
.venv/bin/python scripts/evaluate_all.py
.venv/bin/python scripts/evaluate_compare_runs.py --baseline outputs/reports/baselines.json --candidate outputs/reports/benchmark_report.json --output outputs/reports/improvement_report.json
.venv/bin/python scripts/acceptance_gate.py
```
After the story artifact is published, run the opt-in live link checker:
```bash
uv run python scripts/validate_submission_links.py
```
## Full Remote Training Evidence
```bash
export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_training_space.py \
--repo-id TheJackBright/polyguard-openenv-training-full \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--hardware a10g-large \
--model-sweep Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct \
--sft-epochs 2 \
--grpo-epochs 1 \
--sft-max-steps 0 \
--grpo-max-steps 0 \
--grpo-max-prompts 0
.venv/bin/python scripts/pull_training_artifacts.py \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
.venv/bin/python scripts/activate_sweep_model.py \
--source sweep \
--run-id qwen-qwen2-5-0-5b-instruct \
--preferred-artifact grpo_adapter
```
Final public artifacts should include `hf_sweep_summary.json`, `anti_hacking_overfit_report.json`, post-save inference reports, adapter evidence, `active_model_manifest.json`, and all relevant charts under `docs/results/` and `outputs/plots/`. Current tracked evidence includes a 3-model SFT-baseline sweep plus a top-level environment-backed GRPO run. Only claim a full public per-model GRPO sweep after those private artifacts are pulled, mirrored, and documented.
## Qwen 0.5B/1.5B Submission Evidence
```bash
.venv/bin/python scripts/generate_submission_evidence.py \
--models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
--episodes 8
```
The generated files live in:
- `docs/results/submission_evidence_qwen_0_5b_1_5b/`
- `outputs/reports/submission_evidence/qwen_0_5b_1_5b/`
- `outputs/plots/submission_evidence/qwen_0_5b_1_5b/`
- `submission_bundle/qwen_0_5b_1_5b_evidence.zip`
The current live evidence confirms remote completion of 0.5B/1.5B SFT, GRPO, GRPO post-save inference, and policy ablations, but marks per-run GRPO files/checkpoints as pending because the private artifact repo has not uploaded them yet.
The implementation-ready active model bundle is available separately:
```text
https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke/
```
It includes the local active Qwen 0.5B `grpo_adapter`, `sft_adapter`, `merged` model, manifests, and reports for immediate app integration while the full per-run remote sweep artifacts remain pending.
Deploy the evaluation-only HF Space without interrupting the training Space:
```bash
.venv/bin/python scripts/deploy_evidence_space.py \
--repo-id TheJackBright/polyguard-openenv-evidence \
--artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
--training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
--models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
--hardware cpu-basic
```
## Strict Final Gate
```bash
export POLYGUARD_ENFORCE_SUBMISSION_LINKS=true
.venv/bin/python scripts/acceptance_gate.py
```
Strict mode must pass only after:
- README links are not placeholders.
- `docs/results/avg_reward.png` and `docs/results/policy_stack_avg_reward.png` exist.
- `docs/results/hf_space_verification.json` has `passed: true`.
- `outputs/reports/sft_trl_run.json` has `status: ok`, non-zero examples, a non-empty artifact path, and uses `trl_unsloth` or `trl_transformers`.
- `outputs/reports/grpo_trl_run.json` has `status: ok`, accepted backend, and non-empty `artifact_path`.
- `outputs/reports/postsave_inference.json` does not use `fallback_policy`.
- `outputs/reports/improvement_report.json` has `improved: true`.
- `outputs/reports/hf_sweep_summary.json` has at least one completed non-fallback model row.
- `outputs/reports/anti_hacking_overfit_report.json` has `passed: true`.
- `GET /policy/model_status` reports the intended active run and artifact availability.
Strict mode passed during the April 26, 2026 audit. It does not perform live HTTP status checks, so the final blog/video URL still needs explicit validation.
## HF Auth Commands
```bash
./.venv/bin/hf auth login
./.venv/bin/hf auth whoami
export HF_SPACE_REPO_ID="TheJackBright/polyguard-openenv-workbench"
```
Use `./.venv/bin/hf`, not the global `hf` binary.
Private HF training artifact repositories require authentication and should not be used as judge-facing public links unless they are made public or mirrored into the repository/Space documentation.
|