Spaces:
Running
Running
File size: 10,050 Bytes
df97e68 1447e0b df97e68 1447e0b df97e68 50770bb df97e68 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | ---
title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false
---
# Gov Workflow OpenEnv
## Quick Links
- Hugging Face Space URL (Dummy, update later): [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL)
This placeholder will be replaced with the final deployed demo link.
- Blog path in codebase: [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md)
Project write-up and narrative documentation for design choices and outcomes.
- Notebook path: `OPENENV_RL/GovWorkflow_RL_ENV.ipynb`
Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place.
- Notebook Colab URL: [https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing](https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing)
Cloud version of the same notebook so judges can run and review the complete workflow without local setup.
- GRPO Phase 1 training link: [https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing](https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing)
First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment.
- GRPO Phase 2 training link: [https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing](https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing)
Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment.
- PPO Phase 1 training (local): `rl/train_ppo.py`
Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression.
- PPO Phase 2 training link: [https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing](https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing)
PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.
Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations.
It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.
This repository is productionized for:
- local development (FastAPI + Vite)
- Docker runtime
- Hugging Face Spaces (Docker SDK)
## Why This Problem Matters
Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.
Typical daily decisions include:
- which queue to prioritize first
- where to allocate limited officers
- when to request missing documents
- when to use escalation budget
- how to reduce backlog without harming fairness across services
This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.
## How the Environment Works
At runtime, the environment follows the same loop for every task:
1. `reset(task_id, seed)`
Initializes a new episode with deterministic task configuration.
2. `step(action)`
Applies one operational action and advances system state.
3. `state()`
Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.
4. `grade(state)`
Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.
This forms a transparent policy-evaluation loop:
`reset -> repeated step -> state -> grade`.
## Reward and Grading Logic
### Dense Reward (per step)
The reward function gives continuous learning signal across an episode:
- positive for stage progress and completions
- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity
This avoids sparse “win/lose only at end” behavior and supports stable policy learning.
### Deterministic Task Graders
Final scoring is deterministic and bounded in `[0.0, 1.0]`:
- Easy task prioritizes completion + SLA
- Medium balances completion, SLA, urgency handling, and fairness
- Hard emphasizes all-round performance including fairness and escalation discipline
Because grading is deterministic, repeated runs with the same seed are reproducible.
## Baseline Results (Current Main Branch Artifacts)
The following scores are from the current codebase artifact file:
- source: `results/smoke_test_results.json`
- policy: `backlog_clearance`
- fixed seeds from task config (`11`, `22`, `33`)
| Task | Steps | Score | Completed | Backlog |
|---|---:|---:|---:|---:|
| `district_backlog_easy` | 33 | 0.6716 | 27 | 24 |
| `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 |
| `cross_department_hard` | 89 | 0.6522 | 73 | 92 |
Interpretation:
- Easy and hard both clear the 0.65 neighborhood in this run profile.
- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
- Scores are not placeholders; they come from run artifacts in this repository.
### Supported Operational Actions
- `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
- `assign_capacity`
- `request_missing_documents`
- `escalate_service`
- `advance_time`
- `reallocate_officers`
### What an Agent Actually Optimizes
- Increase completions
- Keep SLA breaches low
- Preserve cross-service fairness
- Avoid invalid actions
- Use escalation budget carefully
## Current Main-Branch Status
This README is aligned to the current `main` branch code paths, including:
- `app.main:app` as primary server runtime
- React UI served at `/ui` from built Vite assets when available
- OpenEnv contract endpoints (`/reset`, `/step`, `/state`, `/grade`)
- frontend API aliases (`/api/*`) and versioned aliases (`/api/v1/*`)
- training story endpoints (`/training/*`)
- simulation, RL, persistence, compliance, and history endpoints
## End-to-End Architecture
```mermaid
flowchart LR
UI["React UI"] --> API["FastAPI app.main"]
API --> ENV["GovWorkflowEnv app/env.py"]
API --> SIM["Simulation runtime app/simulator.py"]
API --> RL["RL train/eval rl/*"]
API --> STORE["PersistenceStore SQLite + filesystem"]
API --> STORY["Training Story router /training/*"]
API --> OPENENV["Optional OpenEnv adapter /openenv/*"]
```
## Core Runtime Components
- API server: `app/main.py`
- Environment kernel: `app/env.py`
- Typed models: `app/models.py`
- Task registry: `app/tasks.py`
- Reward shaping: `app/reward.py`
- Deterministic graders: `app/graders.py`
- Simulation runtime: `app/simulator.py`
- Training jobs manager: `app/training_jobs.py`
- Persistence layer: `app/persistence.py`
- Transport gateway: `app/api_gateway.py`
- React frontend: `frontend/react`
## Task Set (Current Runtime)
Configured in `app/tasks.py`:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`
- `district_backlog_easy_extreme`
Benchmark list used by APIs:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`
## Service Coverage
`ServiceType` includes:
- `passport`
- `driving_license`
- `aadhaar_card`
- `gst_registration`
- `income_certificate`
- `caste_certificate`
- `birth_certificate`
- `land_registration`
Medium and hard tasks currently run with:
- `income_certificate`
- `land_registration`
- `passport`
- `driving_license`
- `aadhaar_card`
## Local Development
### Prerequisites
- Python 3.11+
- Node 20+
- Docker
### Install dependencies
```bash
pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install
```
### Configure environment
```bash
copy .env.example .env
```
Populate as needed:
- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN` or `OPENAI_API_KEY`/`API_KEY`
- optional NVIDIA keys (`NVIDIA_API_KEY`, `NVIDIA_API_KEY_2`)
- storage settings (`STORAGE_ENABLED`, `OPENENV_DATA_DIR`)
### Run backend
```bash
python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload
```
### Run frontend
```bash
npm --prefix frontend/react run dev
```
Open:
- UI: `http://127.0.0.1:5173/ui`
- API docs: `http://127.0.0.1:7860/docs`
## Repository Layout
```text
app/
main.py FastAPI app + API routing + compatibility aliases
env.py GovWorkflowEnv kernel
models.py Typed Pydantic contracts
tasks.py Runtime task registry
reward.py Reward shaping
graders.py Deterministic graders
simulator.py Simulation runtime and live sessions
training_jobs.py Background RL training manager
persistence.py SQLite/filesystem persistence
api_gateway.py direct/http/auto environment transport layer
story_router.py training story endpoints
rl/
gov_workflow_env.py Gym adapter
train_ppo.py PPO phase training entrypoint
evaluate.py Checkpoint evaluator
feature_builder.py RL feature engineering
action_mask.py Action mask logic
frontend/react/
src/ React modules/components/api hooks
scripts/
run_local.py Local FastAPI launcher
convert_grpo_csv.py Training CSV to JSON converter for story endpoints
openenv.yaml OpenEnv manifest metadata
baseline_openai.py Baseline and LLM runner
inference.py Submission-style inference runner
Dockerfile Docker image definition
```
## License
BSD-3-Clause
|