Spaces:

Otter21
/

Gov_Workflow_RL

Running

App Files Files Community

Otter21 commited on 9 days ago

Commit

50770bb

verified ·

1 Parent(s): 6898867

CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK

Browse files

GOOGLE COLLAB LINKS WERE FAUTLY , EXCHANGED THEM WITH CORRECT ONES .

Files changed (1) hide show

README.md +94 -0

README.md CHANGED Viewed

@@ -34,6 +34,100 @@ This repository is productionized for:
 - Docker runtime
 - Hugging Face Spaces (Docker SDK)
 ## Current Main-Branch Status
 This README is aligned to the current `main` branch code paths, including:

 - Docker runtime
 - Hugging Face Spaces (Docker SDK)
+## Why This Problem Matters
+Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
+In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.
+Typical daily decisions include:
+- which queue to prioritize first
+- where to allocate limited officers
+- when to request missing documents
+- when to use escalation budget
+- how to reduce backlog without harming fairness across services
+This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.
+## How the Environment Works
+At runtime, the environment follows the same loop for every task:
+1. `reset(task_id, seed)`
+   Initializes a new episode with deterministic task configuration.
+2. `step(action)`
+   Applies one operational action and advances system state.
+3. `state()`
+   Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.
+4. `grade(state)`
+   Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.
+This forms a transparent policy-evaluation loop:
+`reset -> repeated step -> state -> grade`.
+## Reward and Grading Logic
+### Dense Reward (per step)
+The reward function gives continuous learning signal across an episode:
+- positive for stage progress and completions
+- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity
+This avoids sparse “win/lose only at end” behavior and supports stable policy learning.
+### Deterministic Task Graders
+Final scoring is deterministic and bounded in `[0.0, 1.0]`:
+- Easy task prioritizes completion + SLA
+- Medium balances completion, SLA, urgency handling, and fairness
+- Hard emphasizes all-round performance including fairness and escalation discipline
+Because grading is deterministic, repeated runs with the same seed are reproducible.
+## Baseline Results (Current Main Branch Artifacts)
+The following scores are from the current codebase artifact file:
+- source: `results/smoke_test_results.json`
+- policy: `backlog_clearance`
+- fixed seeds from task config (`11`, `22`, `33`)
+| Task | Steps | Score | Completed | Backlog |
+|---|---:|---:|---:|---:|
+| `district_backlog_easy` | 33 | 0.6716 | 27 | 24 |
+| `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 |
+| `cross_department_hard` | 89 | 0.6522 | 73 | 92 |
+Interpretation:
+- Easy and hard both clear the 0.65 neighborhood in this run profile.
+- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
+- Scores are not placeholders; they come from run artifacts in this repository.
+### Supported Operational Actions
+- `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
+- `assign_capacity`
+- `request_missing_documents`
+- `escalate_service`
+- `advance_time`
+- `reallocate_officers`
+### What an Agent Actually Optimizes
+- Increase completions
+- Keep SLA breaches low
+- Preserve cross-service fairness
+- Avoid invalid actions
+- Use escalation budget carefully
 ## Current Main-Branch Status
 This README is aligned to the current `main` branch code paths, including: