Spaces:
Running
Running
Participant Guide Traceability
This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.
Covered Requirements
| Guide requirement | PolyGuard evidence |
|---|---|
Build an OpenEnv environment with reset, step, state, observations, actions, rewards, and termination |
PolyGuardEnv, openenv.yaml, server/app.py, FastAPI/OpenEnv endpoints, and OpenEnv validation |
| Use a verifiable, stateful, step-by-step task | Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition |
| Provide easy, medium, and hard curriculum tasks | Scenario data in data/scenarios/ and task presets exposed through /env/catalog |
| Use multiple independent reward checks and anti-hacking controls | 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests |
| Keep rewards numeric and bounded | clamp_reward() enforces [0.001, 0.999] rounded to 3 decimals across environment, training rewards, and API tests |
| Build dataset acquisition and preprocessing | scripts/bootstrap_data.py, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora |
| Provide SFT warm start and GRPO/RLVR-style training | scripts/train_sft_trl.py, scripts/train_grpo_trl.py, TRL integration, LoRA/adapter saving, and environment-backed reward verifier |
| Use TRL/Unsloth or accepted HF TRL path | Current local artifacts use trl_transformers; Unsloth remains optional and is used when available |
| Export adapters safely and test inference | scripts/merge_adapters_safe.py and scripts/test_inference_postsave.py |
| Show results with plots and reports | docs/results/*.json and tracked reward/process/legal/success plot PNGs |
| Host the environment on Hugging Face Spaces | scripts/deploy_space_api.py, scripts/deploy_space.sh, Docker runtime, and docs/results/hf_space_verification.json after live validation |
| Include a Colab training notebook | notebooks/09_training_loop.ipynb |
| Link story material from README | README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404 |
Current Evidence Status
- Local tests, OpenEnv validation, and frontend build are passing.
- SFT/GRPO reports are non-fallback only after a real TRL run or accepted artifact pull.
- The strict submission gate intentionally fails if SFT has no examples, GRPO has no artifact, post-save inference falls back, reward improvement is absent, or HF Space verification is missing.
- The available HF token writes to
TheJackBright, so the live Space target isTheJackBright/polyguard-openenv.
Remaining Human-Owned External Step
Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The codebase can prepare links and drafts, but publishing that external content is not created by the local test suite.