Spaces:

TheJackBright
/

polyguard-openenv

Running

App Files Files Community

polyguard-openenv / docs /participant_guide_traceability.md

TheJackBright

Deploy PolyGuard OpenEnv Space

877add7 verified 12 days ago

preview code

raw

history blame contribute delete

3.11 kB

Participant Guide Traceability

This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

Covered Requirements

Guide requirement	PolyGuard evidence
Build an OpenEnv environment with `reset`, `step`, `state`, observations, actions, rewards, and termination	`PolyGuardEnv`, `openenv.yaml`, `server/app.py`, FastAPI/OpenEnv endpoints, and OpenEnv validation
Use a verifiable, stateful, step-by-step task	Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition
Provide easy, medium, and hard curriculum tasks	Scenario data in `data/scenarios/` and task presets exposed through `/env/catalog`
Use multiple independent reward checks and anti-hacking controls	13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests
Keep rewards numeric and bounded	`clamp_reward()` enforces `[0.001, 0.999]` rounded to 3 decimals across environment, training rewards, and API tests
Build dataset acquisition and preprocessing	`scripts/bootstrap_data.py`, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora
Provide SFT warm start and GRPO/RLVR-style training	`scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, TRL integration, LoRA/adapter saving, and environment-backed reward verifier
Use TRL/Unsloth or accepted HF TRL path	Current local artifacts use `trl_transformers`; Unsloth remains optional and is used when available
Export adapters safely and test inference	`scripts/merge_adapters_safe.py` and `scripts/test_inference_postsave.py`
Show results with plots and reports	`docs/results/*.json` and tracked reward/process/legal/success plot PNGs
Host the environment on Hugging Face Spaces	`scripts/deploy_space_api.py`, `scripts/deploy_space.sh`, Docker runtime, and `docs/results/hf_space_verification.json` after live validation
Include a Colab training notebook	`notebooks/09_training_loop.ipynb`
Link story material from README	README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404

Current Evidence Status

Local tests, OpenEnv validation, and frontend build are passing.
SFT/GRPO reports are non-fallback only after a real TRL run or accepted artifact pull.
The strict submission gate intentionally fails if SFT has no examples, GRPO has no artifact, post-save inference falls back, reward improvement is absent, or HF Space verification is missing.
The available HF token writes to TheJackBright, so the live Space target is TheJackBright/polyguard-openenv.

Remaining Human-Owned External Step

Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The codebase can prepare links and drafts, but publishing that external content is not created by the local test suite.