Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /README.md

TheJackBright

Deploy GitHub root master to Space

c296d62 11 days ago

preview code

raw

history blame contribute delete

4.96 kB

	# PolyGuard (OpenEnv implementation package)

	Run all CLI commands from this directory (`cd polyguard-rl`). The repository root [`README.md`](../README.md) carries the same submission narrative with paths adjusted for viewers landing on the GitHub repo home page.

	## Submission Links

	- GitHub Repo URL: [https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK](https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK)
	- HF Space URL: [https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench](https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench)
	- Colab Notebook URL: [https://colab.research.google.com/github/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK/blob/master/polyguard-rl/PolyGuard_SFT_GRPO_One_Run_Runner.ipynb](https://colab.research.google.com/github/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK/blob/master/polyguard-rl/PolyGuard_SFT_GRPO_One_Run_Runner.ipynb) (see also `notebooks/09_training_loop.ipynb` for a modular training walkthrough)
	- YouTube Video URL: not used for this submission; the repository root README is the story artifact.
	- Story artifact: the repository root [`README.md`](../README.md) is the final blog-style narrative and evidence map.

	## Shared Environment, Logs, And Scripts

	The required environment files, training logs, and training scripts are shared
	in the repo and indexed in [Submission Artifact Index](docs/submission_artifacts.md).

	- Environment/runtime: `openenv.yaml`, `pyproject.toml`, `uv.lock`, `requirements.txt`, `Dockerfile`, `app/env/`, `server/app.py`, and `app/hf_space/Dockerfile`.
	- Training scripts/notebooks: `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb`, `notebooks/09_training_loop.ipynb`, `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, `scripts/deploy_training_space.py`, `app/hf_space/training_runner.py`, and `app/training/`.
	- Training logs/results: `docs/results/final_submission_evidence/reports/`, `docs/results/sweeps/`, `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/`, and `docs/results/qwen_completed_runs/reports/`.
	- Final downloadable artifact Space: [https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts](https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts).

	## Problem Statement

	Polypharmacy decisions are long-horizon, partially observable, and safety-critical. PolyGuard is a research environment where an LLM agent selects constrained clinical actions, receives verifier-backed reward, and improves via SFT + GRPO—not generic open-ended chat fine-tuning.

	## Environment

	`PolyGuardEnv` exposes OpenEnv-style HTTP/WebSocket endpoints (`/reset`, `/step`, `/state`, `/metadata`, `/schema`, `/mcp`, `/health`, `/ws`). Sub-environments include DDI, bandit mining, regimen risk, precision dosing, longitudinal deprescribing, web-search missing data, alternative suggestion, and new-drug decomposition. See `openenv.yaml`, `app/env/env_core.py`, `app/env/fastapi_app.py`, and `docs/environment_design.md`.

	## Agent Capabilities

	Medication reconciliation, evidence retrieval, graph safety, dosing guardrails, candidate generation, supervisor routing, planner/critic stack, explanations, and contextual bandit ranking for ablations (`app/agents/`, `docs/agents.md`).

	## Tasks

	DDI risk reduction, safe adds/substitutions, regimen optimization, taper/deprescribing sequences, precision dosing, missing-data recovery, and new-drug decomposition (`data/scenarios/`, `app/env/catalog.py`).

	## Reward Model / Evaluation Logic

	Thirteen verifier-backed reward components roll up into four primary channels (`safety_legality`, `clinical_improvement`, `dosing_quality`, `process_integrity`), clamped to `[0.001, 0.999]`, with anti-cheat and timeout logic (`app/env/reward_router.py`, `app/env/anti_cheat.py`, `docs/reward_design.md`).

	## Training And Post-Training Strategy

	Build corpora (`scripts/bootstrap_data.py`, `scripts/build_training_corpus.py`), SFT with TRL (`scripts/train_sft_trl.py`), GRPO with environment reward (`scripts/train_grpo_trl.py`), merge adapters (`scripts/merge_adapters_safe.py`), validate inference (`scripts/test_inference_postsave.py`), evaluate and plot (`scripts/evaluate_*.py`, `docs/results/`). Optional HF GPU training uses `scripts/deploy_training_space.py`; public review should start with the repository root [`README.md`](../README.md), then `docs/training.md` for implementation notes.

	## Documentation index

	- [Architecture](docs/architecture.md)
	- [Environment](docs/environment_design.md)
	- [Rewards](docs/reward_design.md)
	- [Training](docs/training.md)
	- [Evaluation](docs/evaluation.md)
	- [Deployment](docs/deployment.md)
	- [Datasets](docs/datasets.md)
	- [Participant guide traceability](docs/participant_guide_traceability.md)
	- [Idea doc vs implementation](docs/idea_document_traceability.md)
	- [Submission artifact index](docs/submission_artifacts.md)
	- [Space UI demo script](docs/DEMO_RECORDING_SCRIPT.md)