polyguard-openenv-training-3b-continuation / docs /participant_guide_traceability.md
adithya9903's picture
Deploy PolyGuard HF training Space
fd0c71a verified

Participant Guide Traceability

This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

Covered Requirements

Guide requirement PolyGuard evidence
Build an OpenEnv environment with reset, step, state, observations, actions, rewards, and termination PolyGuardEnv, openenv.yaml, server/app.py, FastAPI/OpenEnv endpoints, and OpenEnv validation
Use a verifiable, stateful, step-by-step task Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition
Provide easy, medium, and hard curriculum tasks Scenario data in data/scenarios/ and task presets exposed through /env/catalog
Use multiple independent reward checks and anti-hacking controls 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests
Keep rewards numeric and bounded clamp_reward() enforces [0.001, 0.999] rounded to 3 decimals across environment, training rewards, and API tests
Build dataset acquisition and preprocessing scripts/bootstrap_data.py, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora
Provide SFT warm start and GRPO/RLVR-style training scripts/train_sft_trl.py, scripts/train_grpo_trl.py, TRL integration, LoRA/adapter saving, and environment-backed reward verifier
Use TRL/Unsloth or accepted HF TRL path Current artifacts use trl_transformers; Unsloth is wired as an optional acceleration path and is used when available
Run full remote training when local GPU/Ollama is unavailable scripts/deploy_training_space.py deploys private HF training Spaces with massive corpus build, Qwen sweeps, SFT baseline, and GRPO training support; private artifact repos require auth and are not public judge links
Export adapters safely and test inference scripts/merge_adapters_safe.py and scripts/test_inference_postsave.py
Show results with plots and reports docs/results/*.json, tracked reward/process/legal/success/sweep plot PNGs, a 3-model SFT-baseline sweep, and a top-level environment-backed GRPO run
Host the environment on Hugging Face Spaces scripts/deploy_space_api.py, scripts/deploy_space.sh, Docker runtime, docs/results/hf_space_verification.json, and live Space health/metadata checks
Include a Colab training notebook notebooks/09_training_loop.ipynb
Link story material from README README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404

Current Evidence Status

  • Local tests, OpenEnv validation, strict acceptance, and frontend build evidence are present.
  • Current tracked reports include a non-fallback SFT run, a top-level non-fallback GRPO run, post-save inference, improvement reports, anti-hacking reports, and a 3-model SFT-baseline sweep.
  • The optional private remote artifact pull checks reward bounds, reward precision, missing charts, GRPO adapter paths, and the anti-hacking/overfit report. Do not describe private artifacts as public judge-facing links unless mirrored.
  • The strict submission gate passes as of April 26, 2026, but it validates link presence/shape, not live HTTP status.
  • The live public Space target is TheJackBright/polyguard-openenv; /health returned {"status":"healthy"} during this audit.

Remaining Human-Owned External Step

Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The current blog URL returns 404 until published. After publication, run uv run python scripts/validate_submission_links.py.