# Agents The orchestration graph runs once per environment step: ```text MedRec -> Evidence -> GraphSafety -> Dosing -> Candidate -> Supervisor -> Planner -> Critic -> Env -> Explainer ``` ## Roles - `MedRecAgent`: summarizes current regimen and medication burden. - `EvidenceAgent`: retrieves local or web-fallback evidence when missing data is present. - `GraphSafetyAgent`: scores high-risk drug pairs and duplicate/safety patterns. - `DosingAgent`: identifies dose-sensitive cases and dose-hold opportunities. - `CandidateAgent`: exposes legal candidate actions from the environment candidate builder. - `SupervisorAgent`: routes the planner toward regimen optimization, dose optimization, or review mode. - `PlannerAgent`: selects an action from candidates, optionally after contextual-bandit reranking. - `CriticAgent`: vetoes illegal or unsafe actions and can force safer review fallbacks. - `ExplainerAgent`: records grounded rationale for demo and audit. ## Coordination Modes - `sequential_pipeline` - `supervisor_routed` - `replan_on_veto` - `lightweight_debate` Policy-stack ablations compare `bandit-only`, `llm-only`, and `llm+bandit`.