# Architecture POLYGUARD-RL uses an OpenEnv-first monorepo architecture with six layers: 1. Data ingestion and retrieval index. 2. Predictive safety, graph, tabular risk, and dosing models. 3. Multi-agent orchestration graph. 4. Hierarchical RL training stack. 5. Safety governance and anti-cheat controls. 6. FastAPI, OpenEnv, and React deployment surfaces. ## Data Flow ```text raw/local knowledge -> processed tables -> scenarios -> SFT/GRPO corpora | v PolyGuardEnv reset/step/state | v agent stack -> verifier reward -> training/evaluation reports | v docs/results + README + HF Space ``` ## Runtime Boundaries - Environment code owns state transition, legality, rewards, anti-cheat, and traces. - Agent code owns candidate interpretation, routing, planning, critique, and explanation. - Training code owns SFT, GRPO, reward logging, adapters, and registry metadata. - Evaluation code owns baselines, perturbations, reports, and plots. - Deployment code owns OpenEnv validation and HF Space push.