Spaces:

adithya9903
/

polyguard-openenv-training-3b-continuation

Paused

App Files Files Community

polyguard-openenv-training-3b-continuation / docs /participant_guide_traceability.md

adithya9903

Deploy PolyGuard HF training Space

fd0c71a verified 12 days ago

preview code

raw

history blame contribute delete

3.81 kB

	# Participant Guide Traceability

	This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

	## Covered Requirements

	\| Guide requirement \| PolyGuard evidence \|
	\| --- \| --- \|
	\| Build an OpenEnv environment with `reset`, `step`, `state`, observations, actions, rewards, and termination \| `PolyGuardEnv`, `openenv.yaml`, `server/app.py`, FastAPI/OpenEnv endpoints, and OpenEnv validation \|
	\| Use a verifiable, stateful, step-by-step task \| Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition \|
	\| Provide easy, medium, and hard curriculum tasks \| Scenario data in `data/scenarios/` and task presets exposed through `/env/catalog` \|
	\| Use multiple independent reward checks and anti-hacking controls \| 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests \|
	\| Keep rewards numeric and bounded \| `clamp_reward()` enforces `[0.001, 0.999]` rounded to 3 decimals across environment, training rewards, and API tests \|
	\| Build dataset acquisition and preprocessing \| `scripts/bootstrap_data.py`, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora \|
	\| Provide SFT warm start and GRPO/RLVR-style training \| `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, TRL integration, LoRA/adapter saving, and environment-backed reward verifier \|
	\| Use TRL/Unsloth or accepted HF TRL path \| Current artifacts use `trl_transformers`; Unsloth is wired as an optional acceleration path and is used when available \|
	\| Run full remote training when local GPU/Ollama is unavailable \| `scripts/deploy_training_space.py` deploys private HF training Spaces with massive corpus build, Qwen sweeps, SFT baseline, and GRPO training support; private artifact repos require auth and are not public judge links \|
	\| Export adapters safely and test inference \| `scripts/merge_adapters_safe.py` and `scripts/test_inference_postsave.py` \|
	\| Show results with plots and reports \| `docs/results/*.json`, tracked reward/process/legal/success/sweep plot PNGs, a 3-model SFT-baseline sweep, and a top-level environment-backed GRPO run \|
	\| Host the environment on Hugging Face Spaces \| `scripts/deploy_space_api.py`, `scripts/deploy_space.sh`, Docker runtime, `docs/results/hf_space_verification.json`, and live Space health/metadata checks \|
	\| Include a Colab training notebook \| `notebooks/09_training_loop.ipynb` \|
	\| Link story material from README \| README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404 \|

	## Current Evidence Status

	- Local tests, OpenEnv validation, strict acceptance, and frontend build evidence are present.
	- Current tracked reports include a non-fallback SFT run, a top-level non-fallback GRPO run, post-save inference, improvement reports, anti-hacking reports, and a 3-model SFT-baseline sweep.
	- The optional private remote artifact pull checks reward bounds, reward precision, missing charts, GRPO adapter paths, and the anti-hacking/overfit report. Do not describe private artifacts as public judge-facing links unless mirrored.
	- The strict submission gate passes as of April 26, 2026, but it validates link presence/shape, not live HTTP status.
	- The live public Space target is `TheJackBright/polyguard-openenv`; `/health` returned `{"status":"healthy"}` during this audit.

	## Remaining Human-Owned External Step

	Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The current blog URL returns 404 until published. After publication, run `uv run python scripts/validate_submission_links.py`.

	# Participant Guide Traceability

	This audit maps the hackathon guide, FAQ, and judging criteria to concrete PolyGuard implementation evidence.

	## Covered Requirements

	\| Guide requirement \| PolyGuard evidence \|
	\| --- \| --- \|
	\| Build an OpenEnv environment with `reset`, `step`, `state`, observations, actions, rewards, and termination \| `PolyGuardEnv`, `openenv.yaml`, `server/app.py`, FastAPI/OpenEnv endpoints, and OpenEnv validation \|
	\| Use a verifiable, stateful, step-by-step task \| Polypharmacy action selection over DDI, regimen risk, precision dosing, deprescribing, evidence recovery, alternatives, and new-drug decomposition \|
	\| Provide easy, medium, and hard curriculum tasks \| Scenario data in `data/scenarios/` and task presets exposed through `/env/catalog` \|
	\| Use multiple independent reward checks and anti-hacking controls \| 13 reward components, 4 primary channels, anti-cheat checks, timeout checks, candidate alignment, legality gates, and reward-range tests \|
	\| Keep rewards numeric and bounded \| `clamp_reward()` enforces `[0.001, 0.999]` rounded to 3 decimals across environment, training rewards, and API tests \|
	\| Build dataset acquisition and preprocessing \| `scripts/bootstrap_data.py`, source ingestion/build scripts, synthetic patients, retrieval corpus, scenarios, and SFT/GRPO corpora \|
	\| Provide SFT warm start and GRPO/RLVR-style training \| `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, TRL integration, LoRA/adapter saving, and environment-backed reward verifier \|
	\| Use TRL/Unsloth or accepted HF TRL path \| Current artifacts use `trl_transformers`; Unsloth is wired as an optional acceleration path and is used when available \|
	\| Run full remote training when local GPU/Ollama is unavailable \| `scripts/deploy_training_space.py` deploys private HF training Spaces with massive corpus build, Qwen sweeps, SFT baseline, and GRPO training support; private artifact repos require auth and are not public judge links \|
	\| Export adapters safely and test inference \| `scripts/merge_adapters_safe.py` and `scripts/test_inference_postsave.py` \|
	\| Show results with plots and reports \| `docs/results/*.json`, tracked reward/process/legal/success/sweep plot PNGs, a 3-model SFT-baseline sweep, and a top-level environment-backed GRPO run \|
	\| Host the environment on Hugging Face Spaces \| `scripts/deploy_space_api.py`, `scripts/deploy_space.sh`, Docker runtime, `docs/results/hf_space_verification.json`, and live Space health/metadata checks \|
	\| Include a Colab training notebook \| `notebooks/09_training_loop.ipynb` \|
	\| Link story material from README \| README links the selected Hugging Face blog/story URL; publish it before final hand-in if the external URL is still 404 \|

	## Current Evidence Status

	- Local tests, OpenEnv validation, strict acceptance, and frontend build evidence are present.
	- Current tracked reports include a non-fallback SFT run, a top-level non-fallback GRPO run, post-save inference, improvement reports, anti-hacking reports, and a 3-model SFT-baseline sweep.
	- The optional private remote artifact pull checks reward bounds, reward precision, missing charts, GRPO adapter paths, and the anti-hacking/overfit report. Do not describe private artifacts as public judge-facing links unless mirrored.
	- The strict submission gate passes as of April 26, 2026, but it validates link presence/shape, not live HTTP status.
	- The live public Space target is `TheJackBright/polyguard-openenv`; `/health` returned `{"status":"healthy"}` during this audit.

	## Remaining Human-Owned External Step

	Publish the story artifact at the README's Hugging Face blog URL or replace it with a YouTube/slide URL before final submission. The current blog URL returns 404 until published. After publication, run `uv run python scripts/validate_submission_links.py`.