File size: 3,567 Bytes
562f58d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Architecture

## System Overview

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     HF Space / Docker Container              β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Gradio UI   β”‚    β”‚         FastAPI Server           β”‚    β”‚
β”‚  β”‚  (port 7860) β”‚    β”‚  POST /reset  GET /state         β”‚    β”‚
β”‚  β”‚              β”‚    β”‚  POST /step   GET /health        β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         β”‚                           β”‚                        β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚                    β”‚                                         β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚
β”‚         β”‚   InvoiceExceptionEnv   β”‚                          β”‚
β”‚         β”‚  reset() step() state() β”‚                          β”‚
β”‚         β”‚  grade()                β”‚                          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β”‚                    β”‚                                         β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚
β”‚         β”‚      Task Registry      β”‚                          β”‚
β”‚         β”‚  task1_price_variance   β”‚                          β”‚
β”‚         β”‚  task2_duplicate_tax    β”‚                          β”‚
β”‚         β”‚  task3_compound_fraud   β”‚                          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Key Design Decisions

### FastAPI + Gradio in same process
HF Spaces requires a single port (7860). Gradio is mounted on FastAPI using
`gr.mount_gradio_app()` so both the validator API and the interactive UI
share the same process and port.

### Pydantic v2 for all models
Required by the OpenEnv spec. Every field is typed. No `Any` fields without
explicit documentation of why.

### EpisodeData vs EnvironmentState
- **EpisodeData** is mutable internal state tracking what the agent has done
- **EnvironmentState** is the immutable snapshot returned to the agent
- Documents (PO, Invoice, GRN) are rebuilt from task factories each time,
  ensuring they are never accidentally mutated

### Separate task classes
Each task is a self-contained class with its own documents, simulators, and
grader. This makes it trivial to add new tasks β€” just implement BaseTask and
register in TASK_REGISTRY.

### Deterministic simulation
No randomness in simulators or graders. Same seed + same actions = same scores.
The only randomness is in `action_space_sample()` for baseline agents.