| # Environment Design |
|
|
| `PolyGuardEnv` is a deterministic, seeded OpenEnv-style simulation for medication action selection under partial observability. |
|
|
| ## State |
|
|
| The state tracks patient demographics, medications, labs, vitals, comorbidities, specialist conflicts, action history, cumulative reward, difficulty, sub-environment, burden score, risky-pair summary, and unresolved safety conflicts. |
|
|
| ## Observation |
|
|
| The observation exposes only the agent-facing view: |
|
|
| - patient summary |
| - medication table |
| - comorbidities |
| - organ function and labs/vitals |
| - graph safety summary |
| - burden summary |
| - precision dosing flags |
| - unresolved conflicts |
| - candidate action set |
| - step budget |
| - action history |
| - warning summary |
| - abstention indicators |
|
|
| ## Actions |
|
|
| Actions are constrained by `PolyGuardAction` and generated candidate IDs. The agent can keep a regimen, stop a drug, substitute within class, recommend alternatives, adjust dose bucket, initiate/continue taper, hold dose, order monitoring, fetch evidence, decompose a new drug, or request specialist/pharmacist review. |
|
|
| ## Episode End Conditions |
|
|
| Episodes terminate on exploit detection, exhausted step budget, repeated invalid actions, justified review escalation, safety-veto threshold, patient destabilization, safe resolution, per-step timeout, or episode wall-clock timeout. |
|
|
| ## OpenEnv Surface |
|
|
| The runtime exposes `/reset`, `/step`, `/state`, `/metadata`, `/schema`, `/mcp`, `/health`, `/ws`, and `/env/*` compatibility endpoints. `openenv validate .` validates packaging; `openenv validate --url ...` validates a running server. |
|
|