Spaces:

TheJackBright
/

polyguard-openenv

Running

App Files Files Community

polyguard-openenv / docs /safety.md

TheJackBright's picture

Deploy PolyGuard OpenEnv Space

877add7 verified 12 days ago

|

history blame contribute delete

1.07 kB

Safety

PolyGuard is safety-first: the model is never allowed to apply an arbitrary free-text medication action directly to state.

Guardrails

Strict PolyGuardAction schema.
Candidate IDs generated by the environment.
Legality verifier before state transition.
Critic veto before execution.
Anti-cheat checks for reward hacking.
Timeout and step-budget termination.
Uncertainty-based abstention and review escalation.
Failure reasons surfaced in traces and API responses.

Clinical Trust Signals

The environment reports:

legal/illegal action status
safety violations
DDI risk deltas
medication burden changes
uncertainty and abstention indicators
explanation grounding score
invalid action count
anti-cheat reasons

This makes reward improvements auditable instead of relying on a single opaque scalar.

Explicit Non-Goals

PolyGuard does not produce clinical orders, patient-specific prescriptions, or medical advice. It is an RL environment and demonstration system for training/evaluating medication-safety agents.