Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /docs /submission_checklist.md

TheJackBright

Deploy GitHub root master to Space

c296d62 12 days ago

preview code

raw

history blame contribute delete

6.53 kB

Submission Checklist

Required Narrative

Problem statement clearly states the capability gap: safe long-horizon polypharmacy action selection.
Environment describes observation, action, state, episode termination, and OpenEnv endpoints.
Agent capabilities cover med reconciliation, evidence, graph safety, dosing, candidate generation, planning, critique, and explanation.
Tasks cover DDI risk, safer substitutions, taper/deprescribing, precision dosing, missing-data recovery, and new-drug decomposition.
Reward/evaluation logic documents the 13 reward columns, 4 primary channels, anti-cheat checks, timeouts, and offline evaluation.
Post-training/self-improvement strategy documents SFT warm start, GRPO with environment rewards, ablations, adapter export, and post-save inference validation.

Required Deliverables

GitHub repo with all required links in README.
Hugging Face Space URL.
Colab notebook URL.
YouTube video URL or Hugging Face blog URL. The current README blog URL is the intended target but still returns 404 until published.
Tracked plots and compact reports under docs/results/.
Successful docs/results/hf_space_verification.json with passed: true.
Participant-guide traceability map in docs/participant_guide_traceability.md.

Commands To Validate Before Submission

uv run pytest
uv run openenv validate .
bash scripts/bootstrap_openenv.sh --runtime-check
(cd app/ui/frontend && npm run build)
.venv/bin/python scripts/evaluate_baselines.py
.venv/bin/python scripts/evaluate_all.py
.venv/bin/python scripts/evaluate_compare_runs.py --baseline outputs/reports/baselines.json --candidate outputs/reports/benchmark_report.json --output outputs/reports/improvement_report.json
.venv/bin/python scripts/acceptance_gate.py

After the story artifact is published, run the opt-in live link checker:

uv run python scripts/validate_submission_links.py

Full Remote Training Evidence

export HF_TOKEN="<write-token>"
.venv/bin/python scripts/deploy_training_space.py \
  --repo-id TheJackBright/polyguard-openenv-training-full \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --hardware a10g-large \
  --model-sweep Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct \
  --sft-epochs 2 \
  --grpo-epochs 1 \
  --sft-max-steps 0 \
  --grpo-max-steps 0 \
  --grpo-max-prompts 0
.venv/bin/python scripts/pull_training_artifacts.py \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts
.venv/bin/python scripts/activate_sweep_model.py \
  --source sweep \
  --run-id qwen-qwen2-5-0-5b-instruct \
  --preferred-artifact grpo_adapter

Final public artifacts should include hf_sweep_summary.json, anti_hacking_overfit_report.json, post-save inference reports, adapter evidence, active_model_manifest.json, and all relevant charts under docs/results/ and outputs/plots/. Current tracked evidence includes a 3-model SFT-baseline sweep plus a top-level environment-backed GRPO run. Only claim a full public per-model GRPO sweep after those private artifacts are pulled, mirrored, and documented.

Qwen 0.5B/1.5B Submission Evidence

.venv/bin/python scripts/generate_submission_evidence.py \
  --models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
  --episodes 8

The generated files live in:

docs/results/submission_evidence_qwen_0_5b_1_5b/
outputs/reports/submission_evidence/qwen_0_5b_1_5b/
outputs/plots/submission_evidence/qwen_0_5b_1_5b/
submission_bundle/qwen_0_5b_1_5b_evidence.zip

The current live evidence confirms remote completion of 0.5B/1.5B SFT, GRPO, GRPO post-save inference, and policy ablations, but marks per-run GRPO files/checkpoints as pending because the private artifact repo has not uploaded them yet.

The implementation-ready active model bundle is available separately:

https://huggingface.co/TheJackBright/polyguard-openenv-training-full-artifacts/tree/main/usable_model_bundles/local-qwen-0-5b-active-smoke
submission_bundle/model_artifacts/local-qwen-0-5b-active-smoke/

It includes the local active Qwen 0.5B grpo_adapter, sft_adapter, merged model, manifests, and reports for immediate app integration while the full per-run remote sweep artifacts remain pending.

Deploy the evaluation-only HF Space without interrupting the training Space:

.venv/bin/python scripts/deploy_evidence_space.py \
  --repo-id TheJackBright/polyguard-openenv-evidence \
  --artifact-repo-id TheJackBright/polyguard-openenv-training-full-artifacts \
  --training-space-url https://thejackbright-polyguard-openenv-training-full.hf.space \
  --models qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct \
  --hardware cpu-basic

Strict Final Gate

export POLYGUARD_ENFORCE_SUBMISSION_LINKS=true
.venv/bin/python scripts/acceptance_gate.py

Strict mode must pass only after:

README links are not placeholders.
docs/results/avg_reward.png and docs/results/policy_stack_avg_reward.png exist.
docs/results/hf_space_verification.json has passed: true.
outputs/reports/sft_trl_run.json has status: ok, non-zero examples, a non-empty artifact path, and uses trl_unsloth or trl_transformers.
outputs/reports/grpo_trl_run.json has status: ok, accepted backend, and non-empty artifact_path.
outputs/reports/postsave_inference.json does not use fallback_policy.
outputs/reports/improvement_report.json has improved: true.
outputs/reports/hf_sweep_summary.json has at least one completed non-fallback model row.
outputs/reports/anti_hacking_overfit_report.json has passed: true.
GET /policy/model_status reports the intended active run and artifact availability.

Strict mode passed during the April 26, 2026 audit. It does not perform live HTTP status checks, so the final blog/video URL still needs explicit validation.

HF Auth Commands

./.venv/bin/hf auth login
./.venv/bin/hf auth whoami
export HF_SPACE_REPO_ID="TheJackBright/polyguard-openenv-workbench"

Use ./.venv/bin/hf, not the global hf binary.

Private HF training artifact repositories require authentication and should not be used as judge-facing public links unless they are made public or mirrored into the repository/Space documentation.