# Architecture ## System Overview ``` ┌──────────────────────────────────────────────────────────────┐ │ HF Space / Docker Container │ │ │ │ ┌──────────────┐ ┌──────────────────────────────────┐ │ │ │ Gradio UI │ │ FastAPI Server │ │ │ │ (port 7860) │ │ POST /reset GET /state │ │ │ │ │ │ POST /step GET /health │ │ │ └──────┬───────┘ └──────────────┬───────────────────┘ │ │ │ │ │ │ └──────────┬────────────────┘ │ │ │ │ │ ┌──────────▼──────────────┐ │ │ │ InvoiceExceptionEnv │ │ │ │ reset() step() state() │ │ │ │ grade() │ │ │ └──────────┬──────────────┘ │ │ │ │ │ ┌──────────▼──────────────┐ │ │ │ Task Registry │ │ │ │ task1_price_variance │ │ │ │ task2_duplicate_tax │ │ │ │ task3_compound_fraud │ │ │ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Key Design Decisions ### FastAPI + Gradio in same process HF Spaces requires a single port (7860). Gradio is mounted on FastAPI using `gr.mount_gradio_app()` so both the validator API and the interactive UI share the same process and port. ### Pydantic v2 for all models Required by the OpenEnv spec. Every field is typed. No `Any` fields without explicit documentation of why. ### EpisodeData vs EnvironmentState - **EpisodeData** is mutable internal state tracking what the agent has done - **EnvironmentState** is the immutable snapshot returned to the agent - Documents (PO, Invoice, GRN) are rebuilt from task factories each time, ensuring they are never accidentally mutated ### Separate task classes Each task is a self-contained class with its own documents, simulators, and grader. This makes it trivial to add new tasks — just implement BaseTask and register in TASK_REGISTRY. ### Deterministic simulation No randomness in simulators or graders. Same seed + same actions = same scores. The only randomness is in `action_space_sample()` for baseline agents.