| # Agents | |
| The orchestration graph runs once per environment step: | |
| ```text | |
| MedRec -> Evidence -> GraphSafety -> Dosing -> Candidate -> Supervisor -> Planner -> Critic -> Env -> Explainer | |
| ``` | |
| ## Roles | |
| - `MedRecAgent`: summarizes current regimen and medication burden. | |
| - `EvidenceAgent`: retrieves local or web-fallback evidence when missing data is present. | |
| - `GraphSafetyAgent`: scores high-risk drug pairs and duplicate/safety patterns. | |
| - `DosingAgent`: identifies dose-sensitive cases and dose-hold opportunities. | |
| - `CandidateAgent`: exposes legal candidate actions from the environment candidate builder. | |
| - `SupervisorAgent`: routes the planner toward regimen optimization, dose optimization, or review mode. | |
| - `PlannerAgent`: selects an action from candidates, optionally after contextual-bandit reranking. | |
| - `CriticAgent`: vetoes illegal or unsafe actions and can force safer review fallbacks. | |
| - `ExplainerAgent`: records grounded rationale for demo and audit. | |
| ## Coordination Modes | |
| - `sequential_pipeline` | |
| - `supervisor_routed` | |
| - `replan_on_veto` | |
| - `lightweight_debate` | |
| Policy-stack ablations compare `bandit-only`, `llm-only`, and `llm+bandit`. | |