Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /README.md

TheJackBright

Deploy GitHub root master to Space

c296d62 11 days ago

preview code

raw

history blame contribute delete

4.96 kB

PolyGuard (OpenEnv implementation package)

Run all CLI commands from this directory (cd polyguard-rl). The repository root README.md carries the same submission narrative with paths adjusted for viewers landing on the GitHub repo home page.

Submission Links

GitHub Repo URL: https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK
HF Space URL: https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench
Colab Notebook URL: https://colab.research.google.com/github/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK/blob/master/polyguard-rl/PolyGuard_SFT_GRPO_One_Run_Runner.ipynb (see also notebooks/09_training_loop.ipynb for a modular training walkthrough)
YouTube Video URL: not used for this submission; the repository root README is the story artifact.
Story artifact: the repository root README.md is the final blog-style narrative and evidence map.

Shared Environment, Logs, And Scripts

The required environment files, training logs, and training scripts are shared in the repo and indexed in Submission Artifact Index.

Environment/runtime: openenv.yaml, pyproject.toml, uv.lock, requirements*.txt, Dockerfile*, app/env/, server/app.py, and app/hf_space/Dockerfile.
Training scripts/notebooks: PolyGuard_SFT_GRPO_One_Run_Runner.ipynb, notebooks/09_training_loop.ipynb, scripts/train_sft_trl.py, scripts/train_grpo_trl.py, scripts/deploy_training_space.py, app/hf_space/training_runner.py, and app/training/.
Training logs/results: docs/results/final_submission_evidence/reports/, docs/results/sweeps/, docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/, and docs/results/qwen_completed_runs/reports/.
Final downloadable artifact Space: https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts.

Problem Statement

Polypharmacy decisions are long-horizon, partially observable, and safety-critical. PolyGuard is a research environment where an LLM agent selects constrained clinical actions, receives verifier-backed reward, and improves via SFT + GRPO—not generic open-ended chat fine-tuning.

Environment

PolyGuardEnv exposes OpenEnv-style HTTP/WebSocket endpoints (/reset, /step, /state, /metadata, /schema, /mcp, /health, /ws). Sub-environments include DDI, bandit mining, regimen risk, precision dosing, longitudinal deprescribing, web-search missing data, alternative suggestion, and new-drug decomposition. See openenv.yaml, app/env/env_core.py, app/env/fastapi_app.py, and docs/environment_design.md.

Agent Capabilities

Medication reconciliation, evidence retrieval, graph safety, dosing guardrails, candidate generation, supervisor routing, planner/critic stack, explanations, and contextual bandit ranking for ablations (app/agents/, docs/agents.md).

Tasks

DDI risk reduction, safe adds/substitutions, regimen optimization, taper/deprescribing sequences, precision dosing, missing-data recovery, and new-drug decomposition (data/scenarios/, app/env/catalog.py).

Reward Model / Evaluation Logic

Thirteen verifier-backed reward components roll up into four primary channels (safety_legality, clinical_improvement, dosing_quality, process_integrity), clamped to [0.001, 0.999], with anti-cheat and timeout logic (app/env/reward_router.py, app/env/anti_cheat.py, docs/reward_design.md).

Training And Post-Training Strategy

Build corpora (scripts/bootstrap_data.py, scripts/build_training_corpus.py), SFT with TRL (scripts/train_sft_trl.py), GRPO with environment reward (scripts/train_grpo_trl.py), merge adapters (scripts/merge_adapters_safe.py), validate inference (scripts/test_inference_postsave.py), evaluate and plot (scripts/evaluate_*.py, docs/results/). Optional HF GPU training uses scripts/deploy_training_space.py; public review should start with the repository root README.md, then docs/training.md for implementation notes.