polyguard-openenv-training-3b-continuation / docs /final_submission_audit.md
adithya9903's picture
Deploy PolyGuard HF training Space
fd0c71a verified

Final Submission Audit

Audit date: April 26, 2026.

Status Summary

PolyGuard implements the participant-guide stack from dataset acquisition through OpenEnv environment, rewards, SFT, GRPO, inference, UI/API product, evaluation, and Hugging Face Space deployment. The public environment Space is live at https://huggingface.co/spaces/TheJackBright/polyguard-openenv and the runtime health endpoint returned {"status":"healthy"} during this audit.

The only known judge-facing blocker is external storytelling: the README blog URL https://huggingface.co/blog/TheJackBright/polyguard-openenv currently returns 404 until docs/hf_blog_draft.md is published there or the README is updated with a real YouTube/slide/blog URL.

Requirement Matrix

Requirement area Status Evidence
Problem statement and theme fit Implemented README describes safe long-horizon polypharmacy action selection under World Modeling / Professional Tasks.
OpenEnv environment Implemented openenv.yaml, PolyGuardEnv, FastAPI /reset, /step, /state, /metadata, /schema, /mcp, and /ws; uv run openenv validate . passes.
Dataset acquisition and preprocessing Implemented scripts/bootstrap_data.py, scripts/ingest_open_drug_sources.py, scripts/build_training_corpus.py, data/processed/*, data/scenarios/*, and docs/dataset_report.md.
Easy/medium/hard curriculum Implemented Scenario JSON/JSONL sets plus task presets exposed through /env/catalog.
Rewards and anti-hacking Implemented 13 reward components, 4 primary channels, bounded reward scaling, timeout handling, app/env/anti_cheat.py, and reward/anti-cheat tests.
Training loop Implemented scripts/train_sft_trl.py, scripts/train_grpo_trl.py, app/training/grpo_trl.py, and app/hf_space/training_runner.py.
TRL / Unsloth stack Implemented with fallback reality documented TRL path is active and reports trl_transformers; Unsloth is wired as optional but was unavailable in current reports.
Post-training export and inference Implemented scripts/merge_adapters_safe.py, scripts/test_inference_postsave.py, active model manifest, and API/UI model status path.
Product/demo Implemented FastAPI product API, React/Vite workbench, policy lab, training monitor, replay, dosing, and safety views.
Results and plots Implemented Tracked docs/results/*.json and PNG plots, including SFT baseline sweep evidence and top-level environment-backed GRPO evidence.
HF Space deployment Implemented Public Space is running on CPU basic, Space metadata is available, and tracked docs/results/hf_space_verification.json reports OpenEnv validation passed.
Colab notebook Implemented README links notebooks/09_training_loop.ipynb through Colab.
Story artifact Pending external publication docs/hf_blog_draft.md exists, but the README blog URL returns 404 until published.
Full public per-model GRPO sweep Not claimed Current public/tracked evidence is a 3-model SFT-baseline sweep plus a top-level GRPO run. Private training artifact repos require auth and must be mirrored before being used as public evidence.

Fresh Verification

  • uv run pytest: 49 tests passed.
  • uv run openenv validate .: local OpenEnv validation passed.
  • POLYGUARD_ENFORCE_SUBMISSION_LINKS=true uv run python scripts/acceptance_gate.py: strict gate passed.
  • curl -s https://thejackbright-polyguard-openenv.hf.space/health: returned {"status":"healthy"}.
  • curl -s https://thejackbright-polyguard-openenv.hf.space/metadata: returned PolyGuard OpenEnv metadata with reward range [0.001, 0.999].

Submission Notes

  • Publish the Hugging Face blog draft or replace the story URL before final hand-in.
  • Run uv run python scripts/validate_submission_links.py after publication to catch broken README URLs.
  • Do not add private HF artifact repos as judge-facing links unless they are made public or their outputs are mirrored into the repository/Space documentation.