polyguard-openenv-workbench / polyguard-rl /docs /environment_design.md
TheJackBright's picture
Deploy GitHub root master to Space
c296d62

Environment Design

PolyGuardEnv is a deterministic, seeded OpenEnv-style simulation for medication action selection under partial observability.

State

The state tracks patient demographics, medications, labs, vitals, comorbidities, specialist conflicts, action history, cumulative reward, difficulty, sub-environment, burden score, risky-pair summary, and unresolved safety conflicts.

Observation

The observation exposes only the agent-facing view:

  • patient summary
  • medication table
  • comorbidities
  • organ function and labs/vitals
  • graph safety summary
  • burden summary
  • precision dosing flags
  • unresolved conflicts
  • candidate action set
  • step budget
  • action history
  • warning summary
  • abstention indicators

Actions

Actions are constrained by PolyGuardAction and generated candidate IDs. The agent can keep a regimen, stop a drug, substitute within class, recommend alternatives, adjust dose bucket, initiate/continue taper, hold dose, order monitoring, fetch evidence, decompose a new drug, or request specialist/pharmacist review.

Episode End Conditions

Episodes terminate on exploit detection, exhausted step budget, repeated invalid actions, justified review escalation, safety-veto threshold, patient destabilization, safe resolution, per-step timeout, or episode wall-clock timeout.

OpenEnv Surface

The runtime exposes /reset, /step, /state, /metadata, /schema, /mcp, /health, /ws, and /env/* compatibility endpoints. openenv validate . validates packaging; openenv validate --url ... validates a running server.