Spaces:

adithya9903
/

polyguard-openenv-training-3b-continuation

Paused

App Files Files Community

polyguard-openenv-training-3b-continuation / docs /architecture.md

adithya9903's picture

Deploy PolyGuard HF training Space

fd0c71a verified 12 days ago

|

history blame contribute delete

1.25 kB

	# Architecture

	POLYGUARD-RL uses an OpenEnv-first monorepo architecture with six layers:

	1. Data ingestion and retrieval index.
	2. Predictive safety, graph, tabular risk, and dosing models.
	3. Multi-agent orchestration graph.
	4. Hierarchical RL training stack.
	5. Safety governance and anti-cheat controls.
	6. FastAPI, OpenEnv, and React deployment surfaces.

	## Data Flow

	```text
	raw/local knowledge -> processed tables -> scenarios -> SFT/GRPO corpora
	\|
	v
	PolyGuardEnv reset/step/state
	\|
	v
	agent stack -> verifier reward -> training/evaluation reports
	\|
	v
	docs/results + README + HF Space
	```

	## Runtime Boundaries

	- Environment code owns state transition, legality, rewards, anti-cheat, and traces.
	- Agent code owns candidate interpretation, routing, planning, critique, and explanation.
	- Training code owns SFT, GRPO, reward logging, adapters, and registry metadata.
	- Evaluation code owns baselines, perturbations, reports, and plots.
	- Deployment code owns OpenEnv validation and HF Space push.