Spaces:

Otter21
/

Gov_Workflow_RL

Running

App Files Files Community

Gov_Workflow_RL / README.md

Otter21

CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK

50770bb verified 9 days ago

preview code

raw

history blame contribute delete

10.1 kB

metadata

title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false

Gov Workflow OpenEnv

Quick Links

Hugging Face Space URL (Dummy, update later): https://huggingface.co/spaces/Otter21/Gov_Workflow_RL
This placeholder will be replaced with the final deployed demo link.
Blog path in codebase: https://huggingface.co/spaces/Otter21/Gov_Workflow_RL/blob/main/Blog.md
Project write-up and narrative documentation for design choices and outcomes.
Notebook path: OPENENV_RL/GovWorkflow_RL_ENV.ipynb
Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place.
Notebook Colab URL: https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing
Cloud version of the same notebook so judges can run and review the complete workflow without local setup.
GRPO Phase 1 training link: https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing
First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment.
GRPO Phase 2 training link: https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing
Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment.
PPO Phase 1 training (local): rl/train_ppo.py
Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression.
PPO Phase 2 training link: https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing
PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.

Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations. It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.

This repository is productionized for:

local development (FastAPI + Vite)
Docker runtime
Hugging Face Spaces (Docker SDK)

Why This Problem Matters

Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.

Typical daily decisions include:

which queue to prioritize first
where to allocate limited officers
when to request missing documents
when to use escalation budget
how to reduce backlog without harming fairness across services

This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.

How the Environment Works

At runtime, the environment follows the same loop for every task:

reset(task_id, seed)
Initializes a new episode with deterministic task configuration.
step(action)
Applies one operational action and advances system state.
state()
Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.
grade(state)
Computes deterministic grader score in [0.0, 1.0] based on task-specific weighting.

This forms a transparent policy-evaluation loop: reset -> repeated step -> state -> grade.

Reward and Grading Logic

Dense Reward (per step)

The reward function gives continuous learning signal across an episode:

positive for stage progress and completions
penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity

This avoids sparse “win/lose only at end” behavior and supports stable policy learning.

Deterministic Task Graders

Final scoring is deterministic and bounded in [0.0, 1.0]:

Easy task prioritizes completion + SLA
Medium balances completion, SLA, urgency handling, and fairness
Hard emphasizes all-round performance including fairness and escalation discipline

Because grading is deterministic, repeated runs with the same seed are reproducible.

Baseline Results (Current Main Branch Artifacts)

The following scores are from the current codebase artifact file:

source: results/smoke_test_results.json
policy: backlog_clearance
fixed seeds from task config (11, 22, 33)

Task	Steps	Score	Completed	Backlog
`district_backlog_easy`	33	0.6716	27	24
`mixed_urgency_medium`	61	0.5867	49	53
`cross_department_hard`	89	0.6522	73	92

Interpretation:

Easy and hard both clear the 0.65 neighborhood in this run profile.
Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
Scores are not placeholders; they come from run artifacts in this repository.

Supported Operational Actions

set_priority_mode (urgent_first, oldest_first, balanced, backlog_clearance)
assign_capacity
request_missing_documents
escalate_service
advance_time
reallocate_officers

What an Agent Actually Optimizes

Increase completions
Keep SLA breaches low
Preserve cross-service fairness
Avoid invalid actions
Use escalation budget carefully

Current Main-Branch Status

This README is aligned to the current main branch code paths, including:

app.main:app as primary server runtime
React UI served at /ui from built Vite assets when available
OpenEnv contract endpoints (/reset, /step, /state, /grade)
frontend API aliases (/api/*) and versioned aliases (/api/v1/*)
training story endpoints (/training/*)
simulation, RL, persistence, compliance, and history endpoints

End-to-End Architecture

flowchart LR
  UI["React UI"] --> API["FastAPI app.main"]
  API --> ENV["GovWorkflowEnv app/env.py"]
  API --> SIM["Simulation runtime app/simulator.py"]
  API --> RL["RL train/eval rl/*"]
  API --> STORE["PersistenceStore SQLite + filesystem"]
  API --> STORY["Training Story router /training/*"]
  API --> OPENENV["Optional OpenEnv adapter /openenv/*"]

Core Runtime Components

API server: app/main.py
Environment kernel: app/env.py
Typed models: app/models.py
Task registry: app/tasks.py
Reward shaping: app/reward.py
Deterministic graders: app/graders.py
Simulation runtime: app/simulator.py
Training jobs manager: app/training_jobs.py
Persistence layer: app/persistence.py
Transport gateway: app/api_gateway.py
React frontend: frontend/react

Task Set (Current Runtime)

Configured in app/tasks.py:

district_backlog_easy
mixed_urgency_medium
cross_department_hard
district_backlog_easy_extreme

Benchmark list used by APIs:

district_backlog_easy
mixed_urgency_medium
cross_department_hard

Service Coverage

ServiceType includes:

passport
driving_license
aadhaar_card
gst_registration
income_certificate
caste_certificate
birth_certificate
land_registration

Medium and hard tasks currently run with:

income_certificate
land_registration
passport
driving_license
aadhaar_card

Local Development

Prerequisites

Python 3.11+
Node 20+
Docker

Install dependencies

pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install

Configure environment

copy .env.example .env

Populate as needed:

API_BASE_URL
MODEL_NAME
HF_TOKEN or OPENAI_API_KEY/API_KEY
optional NVIDIA keys (NVIDIA_API_KEY, NVIDIA_API_KEY_2)
storage settings (STORAGE_ENABLED, OPENENV_DATA_DIR)

Run backend

python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload

Run frontend

npm --prefix frontend/react run dev

Open:

UI: http://127.0.0.1:5173/ui
API docs: http://127.0.0.1:7860/docs

Repository Layout

app/
  main.py               FastAPI app + API routing + compatibility aliases
  env.py                GovWorkflowEnv kernel
  models.py             Typed Pydantic contracts
  tasks.py              Runtime task registry
  reward.py             Reward shaping
  graders.py            Deterministic graders
  simulator.py          Simulation runtime and live sessions
  training_jobs.py      Background RL training manager
  persistence.py        SQLite/filesystem persistence
  api_gateway.py        direct/http/auto environment transport layer
  story_router.py       training story endpoints
rl/
  gov_workflow_env.py   Gym adapter
  train_ppo.py          PPO phase training entrypoint
  evaluate.py           Checkpoint evaluator
  feature_builder.py    RL feature engineering
  action_mask.py        Action mask logic
frontend/react/
  src/                  React modules/components/api hooks
scripts/
  run_local.py          Local FastAPI launcher
  convert_grpo_csv.py   Training CSV to JSON converter for story endpoints
openenv.yaml            OpenEnv manifest metadata
baseline_openai.py      Baseline and LLM runner
inference.py            Submission-style inference runner
Dockerfile              Docker image definition

License

BSD-3-Clause