```mermaid flowchart TD subgraph Agent A[Your Training Code] end subgraph SQLEnv_Environment B[DOCKER CONTAINER] C[FastAPI Server app.py] D[SQLEnvironment] E[SQLite databases + question sets] end A -- "WebSocket (persistent session)" --> B B --> C C --> D B -- "Contains" --> E subgraph SQLEnvironment_Methods I[reset] J[step action loop] S[state] end D -- "Calls" --> I D -- "Calls" --> J D -- "Calls" --> S subgraph Episode_Lifecycle_Reset I_a[Pick random question] I_b[Load corresponding SQLite database read-only] I_c[Return initial observation SQLObservation] end I --> I_a I --> I_b I --> I_c subgraph Episode_Lifecycle_Step J_a[DESCRIBE action] J_b[SAMPLE action] J_c[QUERY action] J_d[ANSWER action] end J --> J_a J --> J_b J --> J_c J --> J_d I_c -- "Contains" --> F[SQLObservation typed, IDE-friendly] J -- "Returns" --> G[float reward] J -- "Returns" --> H[bool done] J_d -- "Compares to gold, computes terminal reward" --> G classDef entity fill:#f9f,stroke:#333,stroke-width:2px; class A,B,C,D,E,F,G,H,I,J,S,I_a,I_b,I_c,J_a,J_b,J_c,J_d entity; ```