Spaces:

adithya9903
/

polyguard-openenv-training-3b-continuation

Paused

Deploy PolyGuard HF training Space

fd0c71a verified 12 days ago

1.16 kB

Agents

The orchestration graph runs once per environment step:

MedRec -> Evidence -> GraphSafety -> Dosing -> Candidate -> Supervisor -> Planner -> Critic -> Env -> Explainer

MedRecAgent: summarizes current regimen and medication burden.
EvidenceAgent: retrieves local or web-fallback evidence when missing data is present.
GraphSafetyAgent: scores high-risk drug pairs and duplicate/safety patterns.
DosingAgent: identifies dose-sensitive cases and dose-hold opportunities.
CandidateAgent: exposes legal candidate actions from the environment candidate builder.
SupervisorAgent: routes the planner toward regimen optimization, dose optimization, or review mode.
PlannerAgent: selects an action from candidates, optionally after contextual-bandit reranking.
CriticAgent: vetoes illegal or unsafe actions and can force safer review fallbacks.
ExplainerAgent: records grounded rationale for demo and audit.

Policy-stack ablations compare bandit-only, llm-only, and llm+bandit.