polyguard-openenv / docs /safety.md
TheJackBright's picture
Deploy PolyGuard OpenEnv Space
877add7 verified

Safety

PolyGuard is safety-first: the model is never allowed to apply an arbitrary free-text medication action directly to state.

Guardrails

  • Strict PolyGuardAction schema.
  • Candidate IDs generated by the environment.
  • Legality verifier before state transition.
  • Critic veto before execution.
  • Anti-cheat checks for reward hacking.
  • Timeout and step-budget termination.
  • Uncertainty-based abstention and review escalation.
  • Failure reasons surfaced in traces and API responses.

Clinical Trust Signals

The environment reports:

  • legal/illegal action status
  • safety violations
  • DDI risk deltas
  • medication burden changes
  • uncertainty and abstention indicators
  • explanation grounding score
  • invalid action count
  • anti-cheat reasons

This makes reward improvements auditable instead of relying on a single opaque scalar.

Explicit Non-Goals

PolyGuard does not produce clinical orders, patient-specific prescriptions, or medical advice. It is an RL environment and demonstration system for training/evaluating medication-safety agents.