| # Hierarchical RL | |
| Supervisor selects macro mode (`REGIMEN_OPT`, `DOSE_OPT`, `REVIEW`), planner selects constrained candidate action, and dosing policy specializes dose-sensitive transitions. | |
| # Hierarchical RL | |
| Supervisor selects macro mode (`REGIMEN_OPT`, `DOSE_OPT`, `REVIEW`), planner selects constrained candidate action, and dosing policy specializes dose-sensitive transitions. | |