Spaces:
Running
Running
| --- | |
| title: Gov Workflow OpenEnv | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # Gov Workflow OpenEnv | |
| ## Quick Links | |
| - Hugging Face Space URL (Dummy, update later): [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL) | |
| This placeholder will be replaced with the final deployed demo link. | |
| - Blog path in codebase: [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md) | |
| Project write-up and narrative documentation for design choices and outcomes. | |
| - Notebook path: `OPENENV_RL/GovWorkflow_RL_ENV.ipynb` | |
| Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place. | |
| - Notebook Colab URL: [https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing](https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing) | |
| Cloud version of the same notebook so judges can run and review the complete workflow without local setup. | |
| - GRPO Phase 1 training link: [https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing](https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing) | |
| First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment. | |
| - GRPO Phase 2 training link: [https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing](https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing) | |
| Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment. | |
| - PPO Phase 1 training (local): `rl/train_ppo.py` | |
| Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression. | |
| - PPO Phase 2 training link: [https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing](https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing) | |
| PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance. | |
| Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations. | |
| It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services. | |
| This repository is productionized for: | |
| - local development (FastAPI + Vite) | |
| - Docker runtime | |
| - Hugging Face Spaces (Docker SDK) | |
| ## Why This Problem Matters | |
| Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services. | |
| In real operations, delays are usually caused by sequential operational decisions, not one single technical bug. | |
| Typical daily decisions include: | |
| - which queue to prioritize first | |
| - where to allocate limited officers | |
| - when to request missing documents | |
| - when to use escalation budget | |
| - how to reduce backlog without harming fairness across services | |
| This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos. | |
| ## How the Environment Works | |
| At runtime, the environment follows the same loop for every task: | |
| 1. `reset(task_id, seed)` | |
| Initializes a new episode with deterministic task configuration. | |
| 2. `step(action)` | |
| Applies one operational action and advances system state. | |
| 3. `state()` | |
| Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions. | |
| 4. `grade(state)` | |
| Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting. | |
| This forms a transparent policy-evaluation loop: | |
| `reset -> repeated step -> state -> grade`. | |
| ## Reward and Grading Logic | |
| ### Dense Reward (per step) | |
| The reward function gives continuous learning signal across an episode: | |
| - positive for stage progress and completions | |
| - penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity | |
| This avoids sparse “win/lose only at end” behavior and supports stable policy learning. | |
| ### Deterministic Task Graders | |
| Final scoring is deterministic and bounded in `[0.0, 1.0]`: | |
| - Easy task prioritizes completion + SLA | |
| - Medium balances completion, SLA, urgency handling, and fairness | |
| - Hard emphasizes all-round performance including fairness and escalation discipline | |
| Because grading is deterministic, repeated runs with the same seed are reproducible. | |
| ## Baseline Results (Current Main Branch Artifacts) | |
| The following scores are from the current codebase artifact file: | |
| - source: `results/smoke_test_results.json` | |
| - policy: `backlog_clearance` | |
| - fixed seeds from task config (`11`, `22`, `33`) | |
| | Task | Steps | Score | Completed | Backlog | | |
| |---|---:|---:|---:|---:| | |
| | `district_backlog_easy` | 33 | 0.6716 | 27 | 24 | | |
| | `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 | | |
| | `cross_department_hard` | 89 | 0.6522 | 73 | 92 | | |
| Interpretation: | |
| - Easy and hard both clear the 0.65 neighborhood in this run profile. | |
| - Medium remains the most difficult balance point due to mixed urgency and fairness pressure. | |
| - Scores are not placeholders; they come from run artifacts in this repository. | |
| ### Supported Operational Actions | |
| - `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`) | |
| - `assign_capacity` | |
| - `request_missing_documents` | |
| - `escalate_service` | |
| - `advance_time` | |
| - `reallocate_officers` | |
| ### What an Agent Actually Optimizes | |
| - Increase completions | |
| - Keep SLA breaches low | |
| - Preserve cross-service fairness | |
| - Avoid invalid actions | |
| - Use escalation budget carefully | |
| ## Current Main-Branch Status | |
| This README is aligned to the current `main` branch code paths, including: | |
| - `app.main:app` as primary server runtime | |
| - React UI served at `/ui` from built Vite assets when available | |
| - OpenEnv contract endpoints (`/reset`, `/step`, `/state`, `/grade`) | |
| - frontend API aliases (`/api/*`) and versioned aliases (`/api/v1/*`) | |
| - training story endpoints (`/training/*`) | |
| - simulation, RL, persistence, compliance, and history endpoints | |
| ## End-to-End Architecture | |
| ```mermaid | |
| flowchart LR | |
| UI["React UI"] --> API["FastAPI app.main"] | |
| API --> ENV["GovWorkflowEnv app/env.py"] | |
| API --> SIM["Simulation runtime app/simulator.py"] | |
| API --> RL["RL train/eval rl/*"] | |
| API --> STORE["PersistenceStore SQLite + filesystem"] | |
| API --> STORY["Training Story router /training/*"] | |
| API --> OPENENV["Optional OpenEnv adapter /openenv/*"] | |
| ``` | |
| ## Core Runtime Components | |
| - API server: `app/main.py` | |
| - Environment kernel: `app/env.py` | |
| - Typed models: `app/models.py` | |
| - Task registry: `app/tasks.py` | |
| - Reward shaping: `app/reward.py` | |
| - Deterministic graders: `app/graders.py` | |
| - Simulation runtime: `app/simulator.py` | |
| - Training jobs manager: `app/training_jobs.py` | |
| - Persistence layer: `app/persistence.py` | |
| - Transport gateway: `app/api_gateway.py` | |
| - React frontend: `frontend/react` | |
| ## Task Set (Current Runtime) | |
| Configured in `app/tasks.py`: | |
| - `district_backlog_easy` | |
| - `mixed_urgency_medium` | |
| - `cross_department_hard` | |
| - `district_backlog_easy_extreme` | |
| Benchmark list used by APIs: | |
| - `district_backlog_easy` | |
| - `mixed_urgency_medium` | |
| - `cross_department_hard` | |
| ## Service Coverage | |
| `ServiceType` includes: | |
| - `passport` | |
| - `driving_license` | |
| - `aadhaar_card` | |
| - `gst_registration` | |
| - `income_certificate` | |
| - `caste_certificate` | |
| - `birth_certificate` | |
| - `land_registration` | |
| Medium and hard tasks currently run with: | |
| - `income_certificate` | |
| - `land_registration` | |
| - `passport` | |
| - `driving_license` | |
| - `aadhaar_card` | |
| ## Local Development | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - Node 20+ | |
| - Docker | |
| ### Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| pip install -r requirements_rl.txt | |
| pip install pytest pytest-asyncio | |
| npm --prefix frontend/react install | |
| ``` | |
| ### Configure environment | |
| ```bash | |
| copy .env.example .env | |
| ``` | |
| Populate as needed: | |
| - `API_BASE_URL` | |
| - `MODEL_NAME` | |
| - `HF_TOKEN` or `OPENAI_API_KEY`/`API_KEY` | |
| - optional NVIDIA keys (`NVIDIA_API_KEY`, `NVIDIA_API_KEY_2`) | |
| - storage settings (`STORAGE_ENABLED`, `OPENENV_DATA_DIR`) | |
| ### Run backend | |
| ```bash | |
| python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload | |
| ``` | |
| ### Run frontend | |
| ```bash | |
| npm --prefix frontend/react run dev | |
| ``` | |
| Open: | |
| - UI: `http://127.0.0.1:5173/ui` | |
| - API docs: `http://127.0.0.1:7860/docs` | |
| ## Repository Layout | |
| ```text | |
| app/ | |
| main.py FastAPI app + API routing + compatibility aliases | |
| env.py GovWorkflowEnv kernel | |
| models.py Typed Pydantic contracts | |
| tasks.py Runtime task registry | |
| reward.py Reward shaping | |
| graders.py Deterministic graders | |
| simulator.py Simulation runtime and live sessions | |
| training_jobs.py Background RL training manager | |
| persistence.py SQLite/filesystem persistence | |
| api_gateway.py direct/http/auto environment transport layer | |
| story_router.py training story endpoints | |
| rl/ | |
| gov_workflow_env.py Gym adapter | |
| train_ppo.py PPO phase training entrypoint | |
| evaluate.py Checkpoint evaluator | |
| feature_builder.py RL feature engineering | |
| action_mask.py Action mask logic | |
| frontend/react/ | |
| src/ React modules/components/api hooks | |
| scripts/ | |
| run_local.py Local FastAPI launcher | |
| convert_grpo_csv.py Training CSV to JSON converter for story endpoints | |
| openenv.yaml OpenEnv manifest metadata | |
| baseline_openai.py Baseline and LLM runner | |
| inference.py Submission-style inference runner | |
| Dockerfile Docker image definition | |
| ``` | |
| ## License | |
| BSD-3-Clause | |