Spaces:

Otter21
/

Gov_Workflow_RL

Running

File size: 10,050 Bytes

---
title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false
---

# Gov Workflow OpenEnv

## Quick Links

- Hugging Face Space URL (Dummy, update later): [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL)  
  This placeholder will be replaced with the final deployed demo link.
- Blog path in codebase: [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md)  
  Project write-up and narrative documentation for design choices and outcomes.
- Notebook path: `OPENENV_RL/GovWorkflow_RL_ENV.ipynb`  
  Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place.
- Notebook Colab URL: [https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing](https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing)  
  Cloud version of the same notebook so judges can run and review the complete workflow without local setup.
- GRPO Phase 1 training link: [https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing](https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing)  
  First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment.
- GRPO Phase 2 training link: [https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing](https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing)  
  Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment.
- PPO Phase 1 training (local): `rl/train_ppo.py`  
  Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression.
- PPO Phase 2 training link: [https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing](https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing)  
  PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.

Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations.
It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.

This repository is productionized for:
- local development (FastAPI + Vite)
- Docker runtime
- Hugging Face Spaces (Docker SDK)

## Why This Problem Matters

Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.  
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.

Typical daily decisions include:

- which queue to prioritize first
- where to allocate limited officers
- when to request missing documents
- when to use escalation budget
- how to reduce backlog without harming fairness across services

This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.

## How the Environment Works 

At runtime, the environment follows the same loop for every task:

1. `reset(task_id, seed)`  
   Initializes a new episode with deterministic task configuration.

2. `step(action)`  
   Applies one operational action and advances system state.

3. `state()`  
   Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.

4. `grade(state)`  
   Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.

This forms a transparent policy-evaluation loop:
`reset -> repeated step -> state -> grade`.

## Reward and Grading Logic

### Dense Reward (per step)

The reward function gives continuous learning signal across an episode:

- positive for stage progress and completions
- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity

This avoids sparse “win/lose only at end” behavior and supports stable policy learning.

### Deterministic Task Graders

Final scoring is deterministic and bounded in `[0.0, 1.0]`:

- Easy task prioritizes completion + SLA
- Medium balances completion, SLA, urgency handling, and fairness
- Hard emphasizes all-round performance including fairness and escalation discipline

Because grading is deterministic, repeated runs with the same seed are reproducible.

## Baseline Results (Current Main Branch Artifacts)

The following scores are from the current codebase artifact file:

- source: `results/smoke_test_results.json`
- policy: `backlog_clearance`
- fixed seeds from task config (`11`, `22`, `33`)

| Task | Steps | Score | Completed | Backlog |
|---|---:|---:|---:|---:|
| `district_backlog_easy` | 33 | 0.6716 | 27 | 24 |
| `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 |
| `cross_department_hard` | 89 | 0.6522 | 73 | 92 |

Interpretation:

- Easy and hard both clear the 0.65 neighborhood in this run profile.
- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
- Scores are not placeholders; they come from run artifacts in this repository.


### Supported Operational Actions

- `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
- `assign_capacity`
- `request_missing_documents`
- `escalate_service`
- `advance_time`
- `reallocate_officers`

### What an Agent Actually Optimizes

- Increase completions
- Keep SLA breaches low
- Preserve cross-service fairness
- Avoid invalid actions
- Use escalation budget carefully


## Current Main-Branch Status

This README is aligned to the current `main` branch code paths, including:
- `app.main:app` as primary server runtime
- React UI served at `/ui` from built Vite assets when available
- OpenEnv contract endpoints (`/reset`, `/step`, `/state`, `/grade`)
- frontend API aliases (`/api/*`) and versioned aliases (`/api/v1/*`)
- training story endpoints (`/training/*`)
- simulation, RL, persistence, compliance, and history endpoints

## End-to-End Architecture

```mermaid
flowchart LR
  UI["React UI"] --> API["FastAPI app.main"]
  API --> ENV["GovWorkflowEnv app/env.py"]
  API --> SIM["Simulation runtime app/simulator.py"]
  API --> RL["RL train/eval rl/*"]
  API --> STORE["PersistenceStore SQLite + filesystem"]
  API --> STORY["Training Story router /training/*"]
  API --> OPENENV["Optional OpenEnv adapter /openenv/*"]
```

## Core Runtime Components

- API server: `app/main.py`
- Environment kernel: `app/env.py`
- Typed models: `app/models.py`
- Task registry: `app/tasks.py`
- Reward shaping: `app/reward.py`
- Deterministic graders: `app/graders.py`
- Simulation runtime: `app/simulator.py`
- Training jobs manager: `app/training_jobs.py`
- Persistence layer: `app/persistence.py`
- Transport gateway: `app/api_gateway.py`
- React frontend: `frontend/react`

## Task Set (Current Runtime)

Configured in `app/tasks.py`:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`
- `district_backlog_easy_extreme`

Benchmark list used by APIs:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`

## Service Coverage

`ServiceType` includes:
- `passport`
- `driving_license`
- `aadhaar_card`
- `gst_registration`
- `income_certificate`
- `caste_certificate`
- `birth_certificate`
- `land_registration`

Medium and hard tasks currently run with:
- `income_certificate`
- `land_registration`
- `passport`
- `driving_license`
- `aadhaar_card`



## Local Development

### Prerequisites

- Python 3.11+
- Node 20+
- Docker

### Install dependencies

```bash
pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install
```

### Configure environment

```bash
copy .env.example .env
```

Populate as needed:
- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN` or `OPENAI_API_KEY`/`API_KEY`
- optional NVIDIA keys (`NVIDIA_API_KEY`, `NVIDIA_API_KEY_2`)
- storage settings (`STORAGE_ENABLED`, `OPENENV_DATA_DIR`)

### Run backend

```bash
python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload
```

### Run frontend

```bash
npm --prefix frontend/react run dev
```

Open:
- UI: `http://127.0.0.1:5173/ui`
- API docs: `http://127.0.0.1:7860/docs`




## Repository Layout

```text
app/
  main.py               FastAPI app + API routing + compatibility aliases
  env.py                GovWorkflowEnv kernel
  models.py             Typed Pydantic contracts
  tasks.py              Runtime task registry
  reward.py             Reward shaping
  graders.py            Deterministic graders
  simulator.py          Simulation runtime and live sessions
  training_jobs.py      Background RL training manager
  persistence.py        SQLite/filesystem persistence
  api_gateway.py        direct/http/auto environment transport layer
  story_router.py       training story endpoints
rl/
  gov_workflow_env.py   Gym adapter
  train_ppo.py          PPO phase training entrypoint
  evaluate.py           Checkpoint evaluator
  feature_builder.py    RL feature engineering
  action_mask.py        Action mask logic
frontend/react/
  src/                  React modules/components/api hooks
scripts/
  run_local.py          Local FastAPI launcher
  convert_grpo_csv.py   Training CSV to JSON converter for story endpoints
openenv.yaml            OpenEnv manifest metadata
baseline_openai.py      Baseline and LLM runner
inference.py            Submission-style inference runner
Dockerfile              Docker image definition
```

## License

BSD-3-Clause