polyguard-openenv / docs /safety.md
TheJackBright's picture
Deploy PolyGuard OpenEnv Space
877add7 verified
# Safety
PolyGuard is safety-first: the model is never allowed to apply an arbitrary free-text medication action directly to state.
## Guardrails
- Strict `PolyGuardAction` schema.
- Candidate IDs generated by the environment.
- Legality verifier before state transition.
- Critic veto before execution.
- Anti-cheat checks for reward hacking.
- Timeout and step-budget termination.
- Uncertainty-based abstention and review escalation.
- Failure reasons surfaced in traces and API responses.
## Clinical Trust Signals
The environment reports:
- legal/illegal action status
- safety violations
- DDI risk deltas
- medication burden changes
- uncertainty and abstention indicators
- explanation grounding score
- invalid action count
- anti-cheat reasons
This makes reward improvements auditable instead of relying on a single opaque scalar.
## Explicit Non-Goals
PolyGuard does not produce clinical orders, patient-specific prescriptions, or medical advice. It is an RL environment and demonstration system for training/evaluating medication-safety agents.