Spaces:
Sleeping
Sleeping
Architecture
System Overview
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HF Space / Docker Container β
β β
β ββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
β β Gradio UI β β FastAPI Server β β
β β (port 7860) β β POST /reset GET /state β β
β β β β POST /step GET /health β β
β ββββββββ¬ββββββββ ββββββββββββββββ¬ββββββββββββββββββββ β
β β β β
β ββββββββββββ¬βββββββββββββββββ β
β β β
β ββββββββββββΌβββββββββββββββ β
β β InvoiceExceptionEnv β β
β β reset() step() state() β β
β β grade() β β
β ββββββββββββ¬βββββββββββββββ β
β β β
β ββββββββββββΌβββββββββββββββ β
β β Task Registry β β
β β task1_price_variance β β
β β task2_duplicate_tax β β
β β task3_compound_fraud β β
β βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Design Decisions
FastAPI + Gradio in same process
HF Spaces requires a single port (7860). Gradio is mounted on FastAPI using
gr.mount_gradio_app() so both the validator API and the interactive UI
share the same process and port.
Pydantic v2 for all models
Required by the OpenEnv spec. Every field is typed. No Any fields without
explicit documentation of why.
EpisodeData vs EnvironmentState
- EpisodeData is mutable internal state tracking what the agent has done
- EnvironmentState is the immutable snapshot returned to the agent
- Documents (PO, Invoice, GRN) are rebuilt from task factories each time, ensuring they are never accidentally mutated
Separate task classes
Each task is a self-contained class with its own documents, simulators, and grader. This makes it trivial to add new tasks β just implement BaseTask and register in TASK_REGISTRY.
Deterministic simulation
No randomness in simulators or graders. Same seed + same actions = same scores.
The only randomness is in action_space_sample() for baseline agents.