# Agents

The orchestration graph runs once per environment step:

```text
MedRec -> Evidence -> GraphSafety -> Dosing -> Candidate -> Supervisor -> Planner -> Critic -> Env -> Explainer
```

## Roles

- `MedRecAgent`: summarizes current regimen and medication burden.
- `EvidenceAgent`: retrieves local or web-fallback evidence when missing data is present.
- `GraphSafetyAgent`: scores high-risk drug pairs and duplicate/safety patterns.
- `DosingAgent`: identifies dose-sensitive cases and dose-hold opportunities.
- `CandidateAgent`: exposes legal candidate actions from the environment candidate builder.
- `SupervisorAgent`: routes the planner toward regimen optimization, dose optimization, or review mode.
- `PlannerAgent`: selects an action from candidates, optionally after contextual-bandit reranking.
- `CriticAgent`: vetoes illegal or unsafe actions and can force safer review fallbacks.
- `ExplainerAgent`: records grounded rationale for demo and audit.

## Coordination Modes

- `sequential_pipeline`
- `supervisor_routed`
- `replan_on_veto`
- `lightweight_debate`

Policy-stack ablations compare `bandit-only`, `llm-only`, and `llm+bandit`.