Spaces:

adithya9903
/

polyguard-openenv-training-3b-continuation

Paused

App Files Files Community

polyguard-openenv-training-3b-continuation / docs /participant_guide_traceability.md

adithya9903

Deploy PolyGuard HF training Space

fd0c71a verified 12 days ago

preview code

raw

history blame contribute delete

3.81 kB

Participant Guide Traceability

This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

Covered Requirements

Guide requirement	PolyGuard evidence
Build an OpenEnv environment with `reset`, `step`, `state`, observations, actions, rewards, and termination	`PolyGuardEnv`, `openenv.yaml`, `server/app.py`, FastAPI/OpenEnv endpoints, and OpenEnv validation
Use a verifiable, stateful, step-by-step task	Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition
Provide easy, medium, and hard curriculum tasks	Scenario data in `data/scenarios/` and task presets exposed through `/env/catalog`
Use multiple independent reward checks and anti-hacking controls	13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests
Keep rewards numeric and bounded	`clamp_reward()` enforces `[0.001, 0.999]` rounded to 3 decimals across environment, training rewards, and API tests
Build dataset acquisition and preprocessing	`scripts/bootstrap_data.py`, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora
Provide SFT warm start and GRPO/RLVR-style training	`scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, TRL integration, LoRA/adapter saving, and environment-backed reward verifier
Use TRL/Unsloth or accepted HF TRL path	Current artifacts use `trl_transformers`; Unsloth is wired as an optional acceleration path and is used when available
Run full remote training when local GPU/Ollama is unavailable	`scripts/deploy_training_space.py` deploys private HF training Spaces with massive corpus build, Qwen sweeps, SFT baseline, and GRPO training support; private artifact repos require auth and are not public judge links
Export adapters safely and test inference	`scripts/merge_adapters_safe.py` and `scripts/test_inference_postsave.py`
Show results with plots and reports	`docs/results/*.json`, tracked reward/process/legal/success/sweep plot PNGs, a 3-model SFT-baseline sweep, and a top-level environment-backed GRPO run
Host the environment on Hugging Face Spaces	`scripts/deploy_space_api.py`, `scripts/deploy_space.sh`, Docker runtime, `docs/results/hf_space_verification.json`, and live Space health/metadata checks
Include a Colab training notebook	`notebooks/09_training_loop.ipynb`
Link story material from README	README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404

Current Evidence Status

Local tests, OpenEnv validation, strict acceptance, and frontend build evidence are present.
Current tracked reports include a non-fallback SFT run, a top-level non-fallback GRPO run, post-save inference, improvement reports, anti-hacking reports, and a 3-model SFT-baseline sweep.
The optional private remote artifact pull checks reward bounds, reward precision, missing charts, GRPO adapter paths, and the anti-hacking/overfit report. Do not describe private artifacts as public judge-facing links unless mirrored.
The strict submission gate passes as of April 26, 2026, but it validates link presence/shape, not live HTTP status.
The live public Space target is TheJackBright/polyguard-openenv; /health returned {"status":"healthy"} during this audit.

Remaining Human-Owned External Step

Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The current blog URL returns 404 until published. After publication, run uv run python scripts/validate_submission_links.py.