polyguard-openenv / docs /participant_guide_traceability.md
TheJackBright's picture
Deploy PolyGuard OpenEnv Space
877add7 verified

Participant Guide Traceability

This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

Covered Requirements

Guide requirement PolyGuard evidence
Build an OpenEnv environment with reset, step, state, observations, actions, rewards, and termination PolyGuardEnv, openenv.yaml, server/app.py, FastAPI/OpenEnv endpoints, and OpenEnv validation
Use a verifiable, stateful, step-by-step task Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition
Provide easy, medium, and hard curriculum tasks Scenario data in data/scenarios/ and task presets exposed through /env/catalog
Use multiple independent reward checks and anti-hacking controls 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests
Keep rewards numeric and bounded clamp_reward() enforces [0.001, 0.999] rounded to 3 decimals across environment, training rewards, and API tests
Build dataset acquisition and preprocessing scripts/bootstrap_data.py, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora
Provide SFT warm start and GRPO/RLVR-style training scripts/train_sft_trl.py, scripts/train_grpo_trl.py, TRL integration, LoRA/adapter saving, and environment-backed reward verifier
Use TRL/Unsloth or accepted HF TRL path Current local artifacts use trl_transformers; Unsloth remains optional and is used when available
Export adapters safely and test inference scripts/merge_adapters_safe.py and scripts/test_inference_postsave.py
Show results with plots and reports docs/results/*.json and tracked reward/process/legal/success plot PNGs
Host the environment on Hugging Face Spaces scripts/deploy_space_api.py, scripts/deploy_space.sh, Docker runtime, and docs/results/hf_space_verification.json after live validation
Include a Colab training notebook notebooks/09_training_loop.ipynb
Link story material from README README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404

Current Evidence Status

  • Local tests, OpenEnv validation, and frontend build are passing.
  • SFT/GRPO reports are non-fallback only after a real TRL run or accepted artifact pull.
  • The strict submission gate intentionally fails if SFT has no examples, GRPO has no artifact, post-save inference falls back, reward improvement is absent, or HF Space verification is missing.
  • The available HF token writes to TheJackBright, so the live Space target is TheJackBright/polyguard-openenv.

Remaining Human-Owned External Step

Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The codebase can prepare links and drafts, but publishing that external content is not created by the local test suite.