Spaces:

Otter21
/

Gov_Workflow_RL

Running

App Files Files Community

Gov_Workflow_RL / README.md

Otter21

CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK

50770bb verified 9 days ago

preview code

raw

history blame contribute delete

10.1 kB

	---
	title: Gov Workflow OpenEnv
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# Gov Workflow OpenEnv

	## Quick Links

	- Hugging Face Space URL (Dummy, update later): [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL)
	This placeholder will be replaced with the final deployed demo link.
	- Blog path in codebase: [https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md](https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md)
	Project write-up and narrative documentation for design choices and outcomes.
	- Notebook path: `OPENENV_RL/GovWorkflow_RL_ENV.ipynb`
	Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place.
	- Notebook Colab URL: [https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing](https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing)
	Cloud version of the same notebook so judges can run and review the complete workflow without local setup.
	- GRPO Phase 1 training link: [https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing](https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing)
	First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment.
	- GRPO Phase 2 training link: [https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing](https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing)
	Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment.
	- PPO Phase 1 training (local): `rl/train_ppo.py`
	Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression.
	- PPO Phase 2 training link: [https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing](https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing)
	PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.

	Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations.
	It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.

	This repository is productionized for:
	- local development (FastAPI + Vite)
	- Docker runtime
	- Hugging Face Spaces (Docker SDK)

	## Why This Problem Matters

	Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
	In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.

	Typical daily decisions include:

	- which queue to prioritize first
	- where to allocate limited officers
	- when to request missing documents
	- when to use escalation budget
	- how to reduce backlog without harming fairness across services

	This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.

	## How the Environment Works

	At runtime, the environment follows the same loop for every task:

	1. `reset(task_id, seed)`
	Initializes a new episode with deterministic task configuration.

	2. `step(action)`
	Applies one operational action and advances system state.

	3. `state()`
	Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.

	4. `grade(state)`
	Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.

	This forms a transparent policy-evaluation loop:
	`reset -> repeated step -> state -> grade`.

	## Reward and Grading Logic

	### Dense Reward (per step)

	The reward function gives continuous learning signal across an episode:

	- positive for stage progress and completions
	- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity

	This avoids sparse “win/lose only at end” behavior and supports stable policy learning.

	### Deterministic Task Graders

	Final scoring is deterministic and bounded in `[0.0, 1.0]`:

	- Easy task prioritizes completion + SLA
	- Medium balances completion, SLA, urgency handling, and fairness
	- Hard emphasizes all-round performance including fairness and escalation discipline

	Because grading is deterministic, repeated runs with the same seed are reproducible.

	## Baseline Results (Current Main Branch Artifacts)

	The following scores are from the current codebase artifact file:

	- source: `results/smoke_test_results.json`
	- policy: `backlog_clearance`
	- fixed seeds from task config (`11`, `22`, `33`)

	\| Task \| Steps \| Score \| Completed \| Backlog \|
	\|---\|---:\|---:\|---:\|---:\|
	\| `district_backlog_easy` \| 33 \| 0.6716 \| 27 \| 24 \|
	\| `mixed_urgency_medium` \| 61 \| 0.5867 \| 49 \| 53 \|
	\| `cross_department_hard` \| 89 \| 0.6522 \| 73 \| 92 \|

	Interpretation:

	- Easy and hard both clear the 0.65 neighborhood in this run profile.
	- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
	- Scores are not placeholders; they come from run artifacts in this repository.


	### Supported Operational Actions

	- `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
	- `assign_capacity`
	- `request_missing_documents`
	- `escalate_service`
	- `advance_time`
	- `reallocate_officers`

	### What an Agent Actually Optimizes

	- Increase completions
	- Keep SLA breaches low
	- Preserve cross-service fairness
	- Avoid invalid actions
	- Use escalation budget carefully


	## Current Main-Branch Status

	This README is aligned to the current `main` branch code paths, including:
	- `app.main:app` as primary server runtime
	- React UI served at `/ui` from built Vite assets when available
	- OpenEnv contract endpoints (`/reset`, `/step`, `/state`, `/grade`)
	- frontend API aliases (`/api/`) and versioned aliases (`/api/v1/`)
	- training story endpoints (`/training/*`)
	- simulation, RL, persistence, compliance, and history endpoints

	## End-to-End Architecture

	```mermaid
	flowchart LR
	UI["React UI"] --> API["FastAPI app.main"]
	API --> ENV["GovWorkflowEnv app/env.py"]
	API --> SIM["Simulation runtime app/simulator.py"]
	API --> RL["RL train/eval rl/*"]
	API --> STORE["PersistenceStore SQLite + filesystem"]
	API --> STORY["Training Story router /training/*"]
	API --> OPENENV["Optional OpenEnv adapter /openenv/*"]
	```

	## Core Runtime Components

	- API server: `app/main.py`
	- Environment kernel: `app/env.py`
	- Typed models: `app/models.py`
	- Task registry: `app/tasks.py`
	- Reward shaping: `app/reward.py`
	- Deterministic graders: `app/graders.py`
	- Simulation runtime: `app/simulator.py`
	- Training jobs manager: `app/training_jobs.py`
	- Persistence layer: `app/persistence.py`
	- Transport gateway: `app/api_gateway.py`
	- React frontend: `frontend/react`

	## Task Set (Current Runtime)

	Configured in `app/tasks.py`:
	- `district_backlog_easy`
	- `mixed_urgency_medium`
	- `cross_department_hard`
	- `district_backlog_easy_extreme`

	Benchmark list used by APIs:
	- `district_backlog_easy`
	- `mixed_urgency_medium`
	- `cross_department_hard`

	## Service Coverage

	`ServiceType` includes:
	- `passport`
	- `driving_license`
	- `aadhaar_card`
	- `gst_registration`
	- `income_certificate`
	- `caste_certificate`
	- `birth_certificate`
	- `land_registration`

	Medium and hard tasks currently run with:
	- `income_certificate`
	- `land_registration`
	- `passport`
	- `driving_license`
	- `aadhaar_card`



	## Local Development

	### Prerequisites

	- Python 3.11+
	- Node 20+
	- Docker

	### Install dependencies

	```bash
	pip install -r requirements.txt
	pip install -r requirements_rl.txt
	pip install pytest pytest-asyncio
	npm --prefix frontend/react install
	```

	### Configure environment

	```bash
	copy .env.example .env
	```

	Populate as needed:
	- `API_BASE_URL`
	- `MODEL_NAME`
	- `HF_TOKEN` or `OPENAI_API_KEY`/`API_KEY`
	- optional NVIDIA keys (`NVIDIA_API_KEY`, `NVIDIA_API_KEY_2`)
	- storage settings (`STORAGE_ENABLED`, `OPENENV_DATA_DIR`)

	### Run backend

	```bash
	python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload
	```

	### Run frontend

	```bash
	npm --prefix frontend/react run dev
	```

	Open:
	- UI: `http://127.0.0.1:5173/ui`
	- API docs: `http://127.0.0.1:7860/docs`




	## Repository Layout

	```text
	app/
	main.py FastAPI app + API routing + compatibility aliases
	env.py GovWorkflowEnv kernel
	models.py Typed Pydantic contracts
	tasks.py Runtime task registry
	reward.py Reward shaping
	graders.py Deterministic graders
	simulator.py Simulation runtime and live sessions
	training_jobs.py Background RL training manager
	persistence.py SQLite/filesystem persistence
	api_gateway.py direct/http/auto environment transport layer
	story_router.py training story endpoints
	rl/
	gov_workflow_env.py Gym adapter
	train_ppo.py PPO phase training entrypoint
	evaluate.py Checkpoint evaluator
	feature_builder.py RL feature engineering
	action_mask.py Action mask logic
	frontend/react/
	src/ React modules/components/api hooks
	scripts/
	run_local.py Local FastAPI launcher
	convert_grpo_csv.py Training CSV to JSON converter for story endpoints
	openenv.yaml OpenEnv manifest metadata
	baseline_openai.py Baseline and LLM runner
	inference.py Submission-style inference runner
	Dockerfile Docker image definition
	```

	## License

	BSD-3-Clause