Gov_Workflow_RL / README.md
Otter21's picture
CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK
50770bb verified
metadata
title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false

Gov Workflow OpenEnv

Quick Links

Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations. It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.

This repository is productionized for:

  • local development (FastAPI + Vite)
  • Docker runtime
  • Hugging Face Spaces (Docker SDK)

Why This Problem Matters

Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.

Typical daily decisions include:

  • which queue to prioritize first
  • where to allocate limited officers
  • when to request missing documents
  • when to use escalation budget
  • how to reduce backlog without harming fairness across services

This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.

How the Environment Works

At runtime, the environment follows the same loop for every task:

  1. reset(task_id, seed)
    Initializes a new episode with deterministic task configuration.

  2. step(action)
    Applies one operational action and advances system state.

  3. state()
    Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.

  4. grade(state)
    Computes deterministic grader score in [0.0, 1.0] based on task-specific weighting.

This forms a transparent policy-evaluation loop: reset -> repeated step -> state -> grade.

Reward and Grading Logic

Dense Reward (per step)

The reward function gives continuous learning signal across an episode:

  • positive for stage progress and completions
  • penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity

This avoids sparse “win/lose only at end” behavior and supports stable policy learning.

Deterministic Task Graders

Final scoring is deterministic and bounded in [0.0, 1.0]:

  • Easy task prioritizes completion + SLA
  • Medium balances completion, SLA, urgency handling, and fairness
  • Hard emphasizes all-round performance including fairness and escalation discipline

Because grading is deterministic, repeated runs with the same seed are reproducible.

Baseline Results (Current Main Branch Artifacts)

The following scores are from the current codebase artifact file:

  • source: results/smoke_test_results.json
  • policy: backlog_clearance
  • fixed seeds from task config (11, 22, 33)
Task Steps Score Completed Backlog
district_backlog_easy 33 0.6716 27 24
mixed_urgency_medium 61 0.5867 49 53
cross_department_hard 89 0.6522 73 92

Interpretation:

  • Easy and hard both clear the 0.65 neighborhood in this run profile.
  • Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
  • Scores are not placeholders; they come from run artifacts in this repository.

Supported Operational Actions

  • set_priority_mode (urgent_first, oldest_first, balanced, backlog_clearance)
  • assign_capacity
  • request_missing_documents
  • escalate_service
  • advance_time
  • reallocate_officers

What an Agent Actually Optimizes

  • Increase completions
  • Keep SLA breaches low
  • Preserve cross-service fairness
  • Avoid invalid actions
  • Use escalation budget carefully

Current Main-Branch Status

This README is aligned to the current main branch code paths, including:

  • app.main:app as primary server runtime
  • React UI served at /ui from built Vite assets when available
  • OpenEnv contract endpoints (/reset, /step, /state, /grade)
  • frontend API aliases (/api/*) and versioned aliases (/api/v1/*)
  • training story endpoints (/training/*)
  • simulation, RL, persistence, compliance, and history endpoints

End-to-End Architecture

flowchart LR
  UI["React UI"] --> API["FastAPI app.main"]
  API --> ENV["GovWorkflowEnv app/env.py"]
  API --> SIM["Simulation runtime app/simulator.py"]
  API --> RL["RL train/eval rl/*"]
  API --> STORE["PersistenceStore SQLite + filesystem"]
  API --> STORY["Training Story router /training/*"]
  API --> OPENENV["Optional OpenEnv adapter /openenv/*"]

Core Runtime Components

  • API server: app/main.py
  • Environment kernel: app/env.py
  • Typed models: app/models.py
  • Task registry: app/tasks.py
  • Reward shaping: app/reward.py
  • Deterministic graders: app/graders.py
  • Simulation runtime: app/simulator.py
  • Training jobs manager: app/training_jobs.py
  • Persistence layer: app/persistence.py
  • Transport gateway: app/api_gateway.py
  • React frontend: frontend/react

Task Set (Current Runtime)

Configured in app/tasks.py:

  • district_backlog_easy
  • mixed_urgency_medium
  • cross_department_hard
  • district_backlog_easy_extreme

Benchmark list used by APIs:

  • district_backlog_easy
  • mixed_urgency_medium
  • cross_department_hard

Service Coverage

ServiceType includes:

  • passport
  • driving_license
  • aadhaar_card
  • gst_registration
  • income_certificate
  • caste_certificate
  • birth_certificate
  • land_registration

Medium and hard tasks currently run with:

  • income_certificate
  • land_registration
  • passport
  • driving_license
  • aadhaar_card

Local Development

Prerequisites

  • Python 3.11+
  • Node 20+
  • Docker

Install dependencies

pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install

Configure environment

copy .env.example .env

Populate as needed:

  • API_BASE_URL
  • MODEL_NAME
  • HF_TOKEN or OPENAI_API_KEY/API_KEY
  • optional NVIDIA keys (NVIDIA_API_KEY, NVIDIA_API_KEY_2)
  • storage settings (STORAGE_ENABLED, OPENENV_DATA_DIR)

Run backend

python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload

Run frontend

npm --prefix frontend/react run dev

Open:

  • UI: http://127.0.0.1:5173/ui
  • API docs: http://127.0.0.1:7860/docs

Repository Layout

app/
  main.py               FastAPI app + API routing + compatibility aliases
  env.py                GovWorkflowEnv kernel
  models.py             Typed Pydantic contracts
  tasks.py              Runtime task registry
  reward.py             Reward shaping
  graders.py            Deterministic graders
  simulator.py          Simulation runtime and live sessions
  training_jobs.py      Background RL training manager
  persistence.py        SQLite/filesystem persistence
  api_gateway.py        direct/http/auto environment transport layer
  story_router.py       training story endpoints
rl/
  gov_workflow_env.py   Gym adapter
  train_ppo.py          PPO phase training entrypoint
  evaluate.py           Checkpoint evaluator
  feature_builder.py    RL feature engineering
  action_mask.py        Action mask logic
frontend/react/
  src/                  React modules/components/api hooks
scripts/
  run_local.py          Local FastAPI launcher
  convert_grpo_csv.py   Training CSV to JSON converter for story endpoints
openenv.yaml            OpenEnv manifest metadata
baseline_openai.py      Baseline and LLM runner
inference.py            Submission-style inference runner
Dockerfile              Docker image definition

License

BSD-3-Clause