Environment Design
PolyGuardEnv is a deterministic, seeded OpenEnv-style simulation for medication action selection under partial observability.
State
The state tracks patient demographics, medications, labs, vitals, comorbidities, specialist conflicts, action history, cumulative reward, difficulty, sub-environment, burden score, risky-pair summary, and unresolved safety conflicts.
Observation
The observation exposes only the agent-facing view:
- patient summary
- medication table
- comorbidities
- organ function and labs/vitals
- graph safety summary
- burden summary
- precision dosing flags
- unresolved conflicts
- candidate action set
- step budget
- action history
- warning summary
- abstention indicators
Actions
Actions are constrained by PolyGuardAction and generated candidate IDs. The agent can keep a regimen, stop a drug, substitute within class, recommend alternatives, adjust dose bucket, initiate/continue taper, hold dose, order monitoring, fetch evidence, decompose a new drug, or request specialist/pharmacist review.
Episode End Conditions
Episodes terminate on exploit detection, exhausted step budget, repeated invalid actions, justified review escalation, safety-veto threshold, patient destabilization, safe resolution, per-step timeout, or episode wall-clock timeout.
OpenEnv Surface
The runtime exposes /reset, /step, /state, /metadata, /schema, /mcp, /health, /ws, and /env/* compatibility endpoints. openenv validate . validates packaging; openenv validate --url ... validates a running server.