# Architecture

## System Overview

```
┌──────────────────────────────────────────────────────────────┐
│                     HF Space / Docker Container              │
│                                                              │
│  ┌──────────────┐    ┌──────────────────────────────────┐    │
│  │  Gradio UI   │    │         FastAPI Server           │    │
│  │  (port 7860) │    │  POST /reset  GET /state         │    │
│  │              │    │  POST /step   GET /health        │    │
│  └──────┬───────┘    └──────────────┬───────────────────┘    │
│         │                           │                        │
│         └──────────┬────────────────┘                        │
│                    │                                         │
│         ┌──────────▼──────────────┐                          │
│         │   InvoiceExceptionEnv   │                          │
│         │  reset() step() state() │                          │
│         │  grade()                │                          │
│         └──────────┬──────────────┘                          │
│                    │                                         │
│         ┌──────────▼──────────────┐                          │
│         │      Task Registry      │                          │
│         │  task1_price_variance   │                          │
│         │  task2_duplicate_tax    │                          │
│         │  task3_compound_fraud   │                          │
│         └─────────────────────────┘                          │
└─────────────────────────────────────────────────────────────┘
```

## Key Design Decisions

### FastAPI + Gradio in same process
HF Spaces requires a single port (7860). Gradio is mounted on FastAPI using
`gr.mount_gradio_app()` so both the validator API and the interactive UI
share the same process and port.

### Pydantic v2 for all models
Required by the OpenEnv spec. Every field is typed. No `Any` fields without
explicit documentation of why.

### EpisodeData vs EnvironmentState
- **EpisodeData** is mutable internal state tracking what the agent has done
- **EnvironmentState** is the immutable snapshot returned to the agent
- Documents (PO, Invoice, GRN) are rebuilt from task factories each time,
  ensuring they are never accidentally mutated

### Separate task classes
Each task is a self-contained class with its own documents, simulators, and
grader. This makes it trivial to add new tasks — just implement BaseTask and
register in TASK_REGISTRY.

### Deterministic simulation
No randomness in simulators or graders. Same seed + same actions = same scores.
The only randomness is in `action_space_sample()` for baseline agents.