Spaces:
Running
title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false
Gov Workflow OpenEnv
Quick Links
- Hugging Face Space URL (Dummy, update later): https://huggingface.co/spaces/Otter21/Gov_Workflow_RL
This placeholder will be replaced with the final deployed demo link. - Blog path in codebase: https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md
Project write-up and narrative documentation for design choices and outcomes. - Notebook path:
OPENENV_RL/GovWorkflow_RL_ENV.ipynb
Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place. - Notebook Colab URL: https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing
Cloud version of the same notebook so judges can run and review the complete workflow without local setup. - GRPO Phase 1 training link: https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing
First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment. - GRPO Phase 2 training link: https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing
Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment. - PPO Phase 1 training (local):
rl/train_ppo.py
Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression. - PPO Phase 2 training link: https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing
PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.
Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations. It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.
This repository is productionized for:
- local development (FastAPI + Vite)
- Docker runtime
- Hugging Face Spaces (Docker SDK)
Why This Problem Matters
Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.
Typical daily decisions include:
- which queue to prioritize first
- where to allocate limited officers
- when to request missing documents
- when to use escalation budget
- how to reduce backlog without harming fairness across services
This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.
How the Environment Works
At runtime, the environment follows the same loop for every task:
reset(task_id, seed)
Initializes a new episode with deterministic task configuration.step(action)
Applies one operational action and advances system state.state()
Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.grade(state)
Computes deterministic grader score in[0.0, 1.0]based on task-specific weighting.
This forms a transparent policy-evaluation loop:
reset -> repeated step -> state -> grade.
Reward and Grading Logic
Dense Reward (per step)
The reward function gives continuous learning signal across an episode:
- positive for stage progress and completions
- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity
This avoids sparse “win/lose only at end” behavior and supports stable policy learning.
Deterministic Task Graders
Final scoring is deterministic and bounded in [0.0, 1.0]:
- Easy task prioritizes completion + SLA
- Medium balances completion, SLA, urgency handling, and fairness
- Hard emphasizes all-round performance including fairness and escalation discipline
Because grading is deterministic, repeated runs with the same seed are reproducible.
Baseline Results (Current Main Branch Artifacts)
The following scores are from the current codebase artifact file:
- source:
results/smoke_test_results.json - policy:
backlog_clearance - fixed seeds from task config (
11,22,33)
| Task | Steps | Score | Completed | Backlog |
|---|---|---|---|---|
district_backlog_easy |
33 | 0.6716 | 27 | 24 |
mixed_urgency_medium |
61 | 0.5867 | 49 | 53 |
cross_department_hard |
89 | 0.6522 | 73 | 92 |
Interpretation:
- Easy and hard both clear the 0.65 neighborhood in this run profile.
- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
- Scores are not placeholders; they come from run artifacts in this repository.
Supported Operational Actions
set_priority_mode(urgent_first,oldest_first,balanced,backlog_clearance)assign_capacityrequest_missing_documentsescalate_serviceadvance_timereallocate_officers
What an Agent Actually Optimizes
- Increase completions
- Keep SLA breaches low
- Preserve cross-service fairness
- Avoid invalid actions
- Use escalation budget carefully
Current Main-Branch Status
This README is aligned to the current main branch code paths, including:
app.main:appas primary server runtime- React UI served at
/uifrom built Vite assets when available - OpenEnv contract endpoints (
/reset,/step,/state,/grade) - frontend API aliases (
/api/*) and versioned aliases (/api/v1/*) - training story endpoints (
/training/*) - simulation, RL, persistence, compliance, and history endpoints
End-to-End Architecture
flowchart LR
UI["React UI"] --> API["FastAPI app.main"]
API --> ENV["GovWorkflowEnv app/env.py"]
API --> SIM["Simulation runtime app/simulator.py"]
API --> RL["RL train/eval rl/*"]
API --> STORE["PersistenceStore SQLite + filesystem"]
API --> STORY["Training Story router /training/*"]
API --> OPENENV["Optional OpenEnv adapter /openenv/*"]
Core Runtime Components
- API server:
app/main.py - Environment kernel:
app/env.py - Typed models:
app/models.py - Task registry:
app/tasks.py - Reward shaping:
app/reward.py - Deterministic graders:
app/graders.py - Simulation runtime:
app/simulator.py - Training jobs manager:
app/training_jobs.py - Persistence layer:
app/persistence.py - Transport gateway:
app/api_gateway.py - React frontend:
frontend/react
Task Set (Current Runtime)
Configured in app/tasks.py:
district_backlog_easymixed_urgency_mediumcross_department_harddistrict_backlog_easy_extreme
Benchmark list used by APIs:
district_backlog_easymixed_urgency_mediumcross_department_hard
Service Coverage
ServiceType includes:
passportdriving_licenseaadhaar_cardgst_registrationincome_certificatecaste_certificatebirth_certificateland_registration
Medium and hard tasks currently run with:
income_certificateland_registrationpassportdriving_licenseaadhaar_card
Local Development
Prerequisites
- Python 3.11+
- Node 20+
- Docker
Install dependencies
pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install
Configure environment
copy .env.example .env
Populate as needed:
API_BASE_URLMODEL_NAMEHF_TOKENorOPENAI_API_KEY/API_KEY- optional NVIDIA keys (
NVIDIA_API_KEY,NVIDIA_API_KEY_2) - storage settings (
STORAGE_ENABLED,OPENENV_DATA_DIR)
Run backend
python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload
Run frontend
npm --prefix frontend/react run dev
Open:
- UI:
http://127.0.0.1:5173/ui - API docs:
http://127.0.0.1:7860/docs
Repository Layout
app/
main.py FastAPI app + API routing + compatibility aliases
env.py GovWorkflowEnv kernel
models.py Typed Pydantic contracts
tasks.py Runtime task registry
reward.py Reward shaping
graders.py Deterministic graders
simulator.py Simulation runtime and live sessions
training_jobs.py Background RL training manager
persistence.py SQLite/filesystem persistence
api_gateway.py direct/http/auto environment transport layer
story_router.py training story endpoints
rl/
gov_workflow_env.py Gym adapter
train_ppo.py PPO phase training entrypoint
evaluate.py Checkpoint evaluator
feature_builder.py RL feature engineering
action_mask.py Action mask logic
frontend/react/
src/ React modules/components/api hooks
scripts/
run_local.py Local FastAPI launcher
convert_grpo_csv.py Training CSV to JSON converter for story endpoints
openenv.yaml OpenEnv manifest metadata
baseline_openai.py Baseline and LLM runner
inference.py Submission-style inference runner
Dockerfile Docker image definition
License
BSD-3-Clause