Spaces:
Running
Running
| # Architecture | |
| POLYGUARD-RL uses an OpenEnv-first monorepo architecture with six layers: | |
| 1. Data ingestion and retrieval index. | |
| 2. Predictive safety, graph, tabular risk, and dosing models. | |
| 3. Multi-agent orchestration graph. | |
| 4. Hierarchical RL training stack. | |
| 5. Safety governance and anti-cheat controls. | |
| 6. FastAPI, OpenEnv, and React deployment surfaces. | |
| ## Data Flow | |
| ```text | |
| raw/local knowledge -> processed tables -> scenarios -> SFT/GRPO corpora | |
| | | |
| v | |
| PolyGuardEnv reset/step/state | |
| | | |
| v | |
| agent stack -> verifier reward -> training/evaluation reports | |
| | | |
| v | |
| docs/results + README + HF Space | |
| ``` | |
| ## Runtime Boundaries | |
| - Environment code owns state transition, legality, rewards, anti-cheat, and traces. | |
| - Agent code owns candidate interpretation, routing, planning, critique, and explanation. | |
| - Training code owns SFT, GRPO, reward logging, adapters, and registry metadata. | |
| - Evaluation code owns baselines, perturbations, reports, and plots. | |
| - Deployment code owns OpenEnv validation and HF Space push. | |