OPENENV_RL_01 / README.md
Siddharaj Shirke
deploy: fresh snapshot to Hugging Face Space
3eae4cc
---
title: Gov Workflow OpenEnv
sdk: docker
app_port: 7860
pinned: false
---
# Gov Workflow OpenEnv
## Quick Links
- Hugging Face Space URL (Dummy, update later): [https://huggingface.co/spaces/your-username/your-space-name](https://huggingface.co/spaces/your-username/your-space-name)
This placeholder will be replaced with the final deployed demo link.
- Blog path in codebase: `OPENENV_RL/Blog.md`
Project write-up and narrative documentation for design choices and outcomes.
- Notebook path: `OPENENV_RL/GovWorkflow_RL_ENV.ipynb`
Main OpenEnv RL government workflow notebook used as the judge-facing criteria book. It contains the practical judging context, environment setup, and the full end-to-end flow in one place.
- Notebook Colab URL: [https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing](https://colab.research.google.com/drive/1ssTnxKoU1nOfSNA3nOeiNM8S4fKFpkby?usp=sharing)
Cloud version of the same notebook so judges can run and review the complete workflow without local setup.
- GRPO Phase 1 training link: [https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing](https://colab.research.google.com/drive/1ND_DZ6xcT2JuH7uGB2AYbiZ1dcHKFfIw?usp=sharing)
First-stage GRPO training run where the LLM agent starts learning policy behavior inside the RL environment.
- GRPO Phase 2 training link: [https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing](https://colab.research.google.com/drive/1ofxEADct_gTX5DGhcnk8lW6p31gFCIFV?usp=sharing)
Second-stage GRPO continuation where the same LLM agent is further trained and refined on the RL environment.
- PPO Phase 1 training (local): `rl/train_ppo.py`
Phase 1 PPO baseline training was executed on the local system to establish the RL algorithm baseline before phase-2 progression.
- PPO Phase 2 training link: [https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing](https://colab.research.google.com/drive/1RVXQs-QAuXLBw0YXJtN4cbEootCTfHO7?usp=sharing)
PPO phase 2 training notebook where the RL algorithm is further trained on the same environment for improved policy performance.
Gov Workflow OpenEnv is a FastAPI-first simulation environment for public service workflow operations.
It models queue prioritization, officer allocation, missing-document recovery, escalation usage, and fairness-aware SLA management across government services.
This repository is productionized for:
- local development (FastAPI + Vite)
- Docker runtime
- Hugging Face Spaces (Docker SDK)
## Current Main-Branch Status
This README is aligned to the current `main` branch code paths, including:
- `app.main:app` as primary server runtime
- React UI served at `/ui` from built Vite assets when available
- OpenEnv contract endpoints (`/reset`, `/step`, `/state`, `/grade`)
- frontend API aliases (`/api/*`) and versioned aliases (`/api/v1/*`)
- training story endpoints (`/training/*`)
- simulation, RL, persistence, compliance, and history endpoints
## End-to-End Architecture
```mermaid
flowchart LR
UI["React UI"] --> API["FastAPI app.main"]
API --> ENV["GovWorkflowEnv app/env.py"]
API --> SIM["Simulation runtime app/simulator.py"]
API --> RL["RL train/eval rl/*"]
API --> STORE["PersistenceStore SQLite + filesystem"]
API --> STORY["Training Story router /training/*"]
API --> OPENENV["Optional OpenEnv adapter /openenv/*"]
```
## Core Runtime Components
- API server: `app/main.py`
- Environment kernel: `app/env.py`
- Typed models: `app/models.py`
- Task registry: `app/tasks.py`
- Reward shaping: `app/reward.py`
- Deterministic graders: `app/graders.py`
- Simulation runtime: `app/simulator.py`
- Training jobs manager: `app/training_jobs.py`
- Persistence layer: `app/persistence.py`
- Transport gateway: `app/api_gateway.py`
- React frontend: `frontend/react`
## Task Set (Current Runtime)
Configured in `app/tasks.py`:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`
- `district_backlog_easy_extreme`
Benchmark list used by APIs:
- `district_backlog_easy`
- `mixed_urgency_medium`
- `cross_department_hard`
## Service Coverage
`ServiceType` includes:
- `passport`
- `driving_license`
- `aadhaar_card`
- `gst_registration`
- `income_certificate`
- `caste_certificate`
- `birth_certificate`
- `land_registration`
Medium and hard tasks currently run with:
- `income_certificate`
- `land_registration`
- `passport`
- `driving_license`
- `aadhaar_card`
## Local Development
### Prerequisites
- Python 3.11+
- Node 20+
- Docker
### Install dependencies
```bash
pip install -r requirements.txt
pip install -r requirements_rl.txt
pip install pytest pytest-asyncio
npm --prefix frontend/react install
```
### Configure environment
```bash
copy .env.example .env
```
Populate as needed:
- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN` or `OPENAI_API_KEY`/`API_KEY`
- optional NVIDIA keys (`NVIDIA_API_KEY`, `NVIDIA_API_KEY_2`)
- storage settings (`STORAGE_ENABLED`, `OPENENV_DATA_DIR`)
### Run backend
```bash
python scripts/run_local.py --host 127.0.0.1 --port 7860 --reload
```
### Run frontend
```bash
npm --prefix frontend/react run dev
```
Open:
- UI: `http://127.0.0.1:5173/ui`
- API docs: `http://127.0.0.1:7860/docs`
## Repository Layout
```text
app/
main.py FastAPI app + API routing + compatibility aliases
env.py GovWorkflowEnv kernel
models.py Typed Pydantic contracts
tasks.py Runtime task registry
reward.py Reward shaping
graders.py Deterministic graders
simulator.py Simulation runtime and live sessions
training_jobs.py Background RL training manager
persistence.py SQLite/filesystem persistence
api_gateway.py direct/http/auto environment transport layer
story_router.py training story endpoints
rl/
gov_workflow_env.py Gym adapter
train_ppo.py PPO phase training entrypoint
evaluate.py Checkpoint evaluator
feature_builder.py RL feature engineering
action_mask.py Action mask logic
frontend/react/
src/ React modules/components/api hooks
scripts/
run_local.py Local FastAPI launcher
convert_grpo_csv.py Training CSV to JSON converter for story endpoints
openenv.yaml OpenEnv manifest metadata
baseline_openai.py Baseline and LLM runner
inference.py Submission-style inference runner
Dockerfile Docker image definition
```
## License
BSD-3-Clause