Spaces:
Running
Running
| # Participant Guide Traceability | |
| This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence. | |
| ## Covered Requirements | |
| | Guide requirement | PolyGuard evidence | | |
| | --- | --- | | |
| | Build an OpenEnv environment with `reset`, `step`, `state`, observations, actions, rewards, and termination | `PolyGuardEnv`, `openenv.yaml`, `server/app.py`, FastAPI/OpenEnv endpoints, and OpenEnv validation | | |
| | Use a verifiable, stateful, step-by-step task | Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition | | |
| | Provide easy, medium, and hard curriculum tasks | Scenario data in `data/scenarios/` and task presets exposed through `/env/catalog` | | |
| | Use multiple independent reward checks and anti-hacking controls | 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests | | |
| | Keep rewards numeric and bounded | `clamp_reward()` enforces `[0.001, 0.999]` rounded to 3 decimals across environment, training rewards, and API tests | | |
| | Build dataset acquisition and preprocessing | `scripts/bootstrap_data.py`, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora | | |
| | Provide SFT warm start and GRPO/RLVR-style training | `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, TRL integration, LoRA/adapter saving, and environment-backed reward verifier | | |
| | Use TRL/Unsloth or accepted HF TRL path | Current local artifacts use `trl_transformers`; Unsloth remains optional and is used when available | | |
| | Export adapters safely and test inference | `scripts/merge_adapters_safe.py` and `scripts/test_inference_postsave.py` | | |
| | Show results with plots and reports | `docs/results/*.json` and tracked reward/process/legal/success plot PNGs | | |
| | Host the environment on Hugging Face Spaces | `scripts/deploy_space_api.py`, `scripts/deploy_space.sh`, Docker runtime, and `docs/results/hf_space_verification.json` after live validation | | |
| | Include a Colab training notebook | `notebooks/09_training_loop.ipynb` | | |
| | Link story material from README | README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404 | | |
| ## Current Evidence Status | |
| - Local tests, OpenEnv validation, and frontend build are passing. | |
| - SFT/GRPO reports are non-fallback only after a real TRL run or accepted artifact pull. | |
| - The strict submission gate intentionally fails if SFT has no examples, GRPO has no artifact, post-save inference falls back, reward improvement is absent, or HF Space verification is missing. | |
| - The available HF token writes to `TheJackBright`, so the live Space target is `TheJackBright/polyguard-openenv`. | |
| ## Remaining Human-Owned External Step | |
| Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The codebase can prepare links and drafts, but publishing that external content is not created by the local test suite. | |