Spaces:

TheJackBright
/

polyguard-openenv

Running

App Files Files Community

polyguard-openenv / README.md

TheJackBright

Deploy PolyGuard OpenEnv Space

877add7 verified 12 days ago

preview code

raw

history blame contribute delete

10 kB

metadata

title: PolyGuard OpenEnv
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8100
pinned: false

POLYGUARD-OPENENV

PolyGuard is an OpenEnv-compatible reinforcement-learning environment for polypharmacy safety, medication optimization, deprescribing, and precision dosing. The project turns medication decision making into a stateful environment where an LLM agent observes a patient/regimen state, chooses constrained clinical actions, receives verifier-backed reward, and improves through TRL/GRPO-style post-training.

Clinical safety note: this is a research environment and demo system for RL environment design. It is not a medical device and must not be used for patient care.

Submission Links

GitHub Repo URL: https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK
HF Space URL: https://huggingface.co/spaces/Vishwa-docs/polyguard-openenv (deployment target; verify before final submission)
Colab Notebook URL: https://colab.research.google.com/github/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK/blob/master/polyguard-rl/notebooks/09_training_loop.ipynb
YouTube Video URL: not used for this submission; the Hugging Face blog URL below is the selected story artifact.
Hugging Face Blog URL: https://huggingface.co/blog/Vishwa-docs/polyguard-openenv (story target; publish before final submission)

Current Readiness

Verified locally:

uv run pytest: 36 tests passed during the audit pass.
uv run openenv validate .: local OpenEnv packaging passed.
bash scripts/bootstrap_openenv.sh --runtime-check: runtime OpenEnv HTTP contract passed when localhost access was allowed.
npm run build in app/ui/frontend: production UI build passed.

Still required for final judge-ready submission:

Authenticate Hugging Face with ./.venv/bin/hf auth login.
Deploy and verify the HF Space.
Run real TRL/Unsloth SFT and GRPO on GPU/Colab so reports no longer show fallback paths.
Replace docs/results/hf_space_verification.json with a successful verification payload.
Regenerate final plots and reports with improvement_report.improved == true.
Run strict readiness: POLYGUARD_ENFORCE_SUBMISSION_LINKS=true ./.venv/bin/python scripts/acceptance_gate.py.

Problem Statement

Polypharmacy decisions are long-horizon, partially observable, and safety-critical. A useful LLM agent must do more than produce a plausible recommendation: it should identify drug-drug interaction risk, reason over comorbidities and labs, choose safe substitutions or deprescribing sequences, request review when uncertain, and expose why it acted.

PolyGuard targets the OpenEnv World Modeling / Professional Tasks theme, with multi-agent and self-improvement elements. It asks whether environment-backed feedback can make a model better at safe medication action selection than prompt-only or rule-only baselines.

Environment

The environment is implemented by PolyGuardEnv and exposed through FastAPI/OpenEnv-compatible endpoints:

POST /reset
POST /step
GET /state
GET /metadata
GET /schema
POST /mcp
GET /health
Backward-compatible aliases under /env/* plus /ws

OpenEnv packaging lives at repo root:

openenv.yaml
__init__.py
client.py
models.py
server/app.py

Each episode samples a patient/regimen scenario and a sub-environment:

DDI
BANDIT_MINING
REGIMEN_RISK
PRECISION_DOSING
LONGITUDINAL_DEPRESCRIBING
WEB_SEARCH_MISSING_DATA
ALTERNATIVE_SUGGESTION
NEW_DRUG_DECOMPOSITION

Difficulty tracks are available as easy, medium, and hard scenario sets.

Agent Capabilities

The agent stack is deliberately decomposed so reward, safety, and explanation can be inspected:

Medication reconciliation
Evidence retrieval and missing-data recovery
Graph safety analysis for DDI and side effects
Dosing guardrails
Candidate generation
Supervisor routing between regimen, dose, and review modes
Planner policy selection
Critic safety veto
Explanation generation
Contextual bandit ranking for policy-stack ablations

Tasks

PolyGuard evaluates these action-selection tasks:

Find bad drug combinations and reduce DDI/polypharmacy side-effect risk.
Recommend safe adds, substitutions, and alternatives.
Optimize regimens under uncertainty.
Produce taper/deprescribing sequences over time.
Choose precision dosing actions when organ function or dose sensitivity matters.
Fetch evidence when critical data is missing.
Decompose a new drug into components for first-pass safety reasoning.

Reward Model / Evaluation Logic

Rewards are verifier-backed and clamped to [0.001, 0.999]. The environment exposes 13 detailed reward columns and 4 primary channels:

safety_legality
clinical_improvement
dosing_quality
process_integrity

Reward logic combines:

Legal action checks
Safety delta and burden improvement
Dosing quality
Abstention quality under uncertainty
Format compliance
Process fidelity
Explanation grounding
Anti-cheat and timeout penalties

Anti-hacking checks block repeated action loops, review abuse, keep-regimen abuse, candidate ID mismatches, parser exploit patterns, and unsafe no-op behavior on known holdout DDIs.

Training And Post-Training Strategy

The intended pipeline is:

Build data assets from local knowledge, synthetic patients, scenario rollouts, optional HF instruction data, optional DDI API augmentation, and optional web fallback.
Run SFT with TRL and optional Unsloth/QLoRA acceleration to teach action-selection format.
Run GRPO with environment-backed reward verification.
Track per-component reward columns and sampled generations.
Run policy-stack ablations against baselines.
Merge/export adapters safely.
Validate post-save inference from the exported artifact.
Deploy the OpenEnv environment to Hugging Face Spaces.

Core commands:

cd polyguard-rl
bash scripts/bootstrap_venv.sh
.venv/bin/python scripts/bootstrap_data.py
.venv/bin/python scripts/build_training_corpus.py --profile small --with-local --with-synthetic --with-hf
.venv/bin/python scripts/train_sft_trl.py --model-id Qwen/Qwen2.5-1.5B-Instruct --epochs 1 --max-steps 20 --use-unsloth
.venv/bin/python scripts/train_grpo_trl.py --model-id Qwen/Qwen2.5-1.5B-Instruct --max-steps 20 --num-generations 2 --use-unsloth
.venv/bin/python scripts/merge_adapters_safe.py --adapter-dir checkpoints/sft_adapter --output-dir checkpoints/merged
.venv/bin/python scripts/test_inference_postsave.py --samples 3
.venv/bin/python scripts/evaluate_all.py

Results

Tracked smoke/evaluation artifacts are mirrored in docs/results/ because outputs/ and checkpoints/ are intentionally ignored.

Current smoke reports show the environment, evaluation, and plotting paths are wired, but final training is not yet judge-ready:

docs/results/sft_trl_run.json currently records a fallback backend.
docs/results/grpo_trl_run.json currently records an environment-reward fallback path.
docs/results/postsave_inference.json currently uses fallback inference.
docs/results/improvement_report.json currently records no positive improvement.
docs/results/hf_space_verification.json is blocked until HF auth/deployment succeeds.

Final submission should replace these with real GPU/Colab TRL/Unsloth artifacts.

Dataset Gather

Implemented data generation and packaging covers:

Normalized drug vocabulary and class tables
Interaction graph edges
Burden, taper, renal, hepatic, duplicate-therapy, and substitution rules
Synthetic patients
Easy/medium/hard scenario files
Retrieval corpus and local evidence index
Unified SFT and GRPO prompt corpora

The current local corpus summary is in data/processed/training_corpus_summary.json when generated.

Deployment

Use the repository-local HF CLI entrypoint. The global hf command on this machine is known to be incompatible with its installed Typer version.

./.venv/bin/hf auth login
./.venv/bin/hf auth whoami
export HF_SPACE_REPO_ID="Vishwa-docs/polyguard-openenv"
bash scripts/deploy_space.sh --repo-id "$HF_SPACE_REPO_ID"
./.venv/bin/hf spaces info "$HF_SPACE_REPO_ID"
openenv validate --url "https://Vishwa-docs-polyguard-openenv.hf.space"

After deployment, save the successful Space info plus OpenEnv validation payload into docs/results/hf_space_verification.json.

Strict Submission Gate

Non-strict local readiness:

.venv/bin/python scripts/acceptance_gate.py

Final submission readiness:

export POLYGUARD_ENFORCE_SUBMISSION_LINKS=true
.venv/bin/python scripts/acceptance_gate.py

Strict mode fails unless README links are real, tracked plots exist, HF Space verification passed, SFT/GRPO used real TRL/Unsloth paths, post-save inference uses the exported artifact, and measured improvement is positive.

Documentation

Future Work

Medicine image/barcode ingestion for regimen capture
Larger model GRPO sweeps
Stronger real-world drug-label ingestion and calibration
More clinician-facing explanation studies
Published HF blog or short video walkthrough

License

MIT