Spaces:
Sleeping
Sleeping
| # Architecture | |
| ## System Overview | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β HF Space / Docker Container β | |
| β β | |
| β ββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β | |
| β β Gradio UI β β FastAPI Server β β | |
| β β (port 7860) β β POST /reset GET /state β β | |
| β β β β POST /step GET /health β β | |
| β ββββββββ¬ββββββββ ββββββββββββββββ¬ββββββββββββββββββββ β | |
| β β β β | |
| β ββββββββββββ¬βββββββββββββββββ β | |
| β β β | |
| β ββββββββββββΌβββββββββββββββ β | |
| β β InvoiceExceptionEnv β β | |
| β β reset() step() state() β β | |
| β β grade() β β | |
| β ββββββββββββ¬βββββββββββββββ β | |
| β β β | |
| β ββββββββββββΌβββββββββββββββ β | |
| β β Task Registry β β | |
| β β task1_price_variance β β | |
| β β task2_duplicate_tax β β | |
| β β task3_compound_fraud β β | |
| β βββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Key Design Decisions | |
| ### FastAPI + Gradio in same process | |
| HF Spaces requires a single port (7860). Gradio is mounted on FastAPI using | |
| `gr.mount_gradio_app()` so both the validator API and the interactive UI | |
| share the same process and port. | |
| ### Pydantic v2 for all models | |
| Required by the OpenEnv spec. Every field is typed. No `Any` fields without | |
| explicit documentation of why. | |
| ### EpisodeData vs EnvironmentState | |
| - **EpisodeData** is mutable internal state tracking what the agent has done | |
| - **EnvironmentState** is the immutable snapshot returned to the agent | |
| - Documents (PO, Invoice, GRN) are rebuilt from task factories each time, | |
| ensuring they are never accidentally mutated | |
| ### Separate task classes | |
| Each task is a self-contained class with its own documents, simulators, and | |
| grader. This makes it trivial to add new tasks β just implement BaseTask and | |
| register in TASK_REGISTRY. | |
| ### Deterministic simulation | |
| No randomness in simulators or graders. Same seed + same actions = same scores. | |
| The only randomness is in `action_space_sample()` for baseline agents. | |