Spaces:
Running
Running
CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK
Browse filesGOOGLE COLLAB LINKS WERE FAUTLY , EXCHANGED THEM WITH CORRECT ONES .
README.md
CHANGED
|
@@ -34,6 +34,100 @@ This repository is productionized for:
|
|
| 34 |
- Docker runtime
|
| 35 |
- Hugging Face Spaces (Docker SDK)
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## Current Main-Branch Status
|
| 38 |
|
| 39 |
This README is aligned to the current `main` branch code paths, including:
|
|
|
|
| 34 |
- Docker runtime
|
| 35 |
- Hugging Face Spaces (Docker SDK)
|
| 36 |
|
| 37 |
+
## Why This Problem Matters
|
| 38 |
+
|
| 39 |
+
Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
|
| 40 |
+
In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.
|
| 41 |
+
|
| 42 |
+
Typical daily decisions include:
|
| 43 |
+
|
| 44 |
+
- which queue to prioritize first
|
| 45 |
+
- where to allocate limited officers
|
| 46 |
+
- when to request missing documents
|
| 47 |
+
- when to use escalation budget
|
| 48 |
+
- how to reduce backlog without harming fairness across services
|
| 49 |
+
|
| 50 |
+
This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.
|
| 51 |
+
|
| 52 |
+
## How the Environment Works
|
| 53 |
+
|
| 54 |
+
At runtime, the environment follows the same loop for every task:
|
| 55 |
+
|
| 56 |
+
1. `reset(task_id, seed)`
|
| 57 |
+
Initializes a new episode with deterministic task configuration.
|
| 58 |
+
|
| 59 |
+
2. `step(action)`
|
| 60 |
+
Applies one operational action and advances system state.
|
| 61 |
+
|
| 62 |
+
3. `state()`
|
| 63 |
+
Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.
|
| 64 |
+
|
| 65 |
+
4. `grade(state)`
|
| 66 |
+
Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.
|
| 67 |
+
|
| 68 |
+
This forms a transparent policy-evaluation loop:
|
| 69 |
+
`reset -> repeated step -> state -> grade`.
|
| 70 |
+
|
| 71 |
+
## Reward and Grading Logic
|
| 72 |
+
|
| 73 |
+
### Dense Reward (per step)
|
| 74 |
+
|
| 75 |
+
The reward function gives continuous learning signal across an episode:
|
| 76 |
+
|
| 77 |
+
- positive for stage progress and completions
|
| 78 |
+
- penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity
|
| 79 |
+
|
| 80 |
+
This avoids sparse “win/lose only at end” behavior and supports stable policy learning.
|
| 81 |
+
|
| 82 |
+
### Deterministic Task Graders
|
| 83 |
+
|
| 84 |
+
Final scoring is deterministic and bounded in `[0.0, 1.0]`:
|
| 85 |
+
|
| 86 |
+
- Easy task prioritizes completion + SLA
|
| 87 |
+
- Medium balances completion, SLA, urgency handling, and fairness
|
| 88 |
+
- Hard emphasizes all-round performance including fairness and escalation discipline
|
| 89 |
+
|
| 90 |
+
Because grading is deterministic, repeated runs with the same seed are reproducible.
|
| 91 |
+
|
| 92 |
+
## Baseline Results (Current Main Branch Artifacts)
|
| 93 |
+
|
| 94 |
+
The following scores are from the current codebase artifact file:
|
| 95 |
+
|
| 96 |
+
- source: `results/smoke_test_results.json`
|
| 97 |
+
- policy: `backlog_clearance`
|
| 98 |
+
- fixed seeds from task config (`11`, `22`, `33`)
|
| 99 |
+
|
| 100 |
+
| Task | Steps | Score | Completed | Backlog |
|
| 101 |
+
|---|---:|---:|---:|---:|
|
| 102 |
+
| `district_backlog_easy` | 33 | 0.6716 | 27 | 24 |
|
| 103 |
+
| `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 |
|
| 104 |
+
| `cross_department_hard` | 89 | 0.6522 | 73 | 92 |
|
| 105 |
+
|
| 106 |
+
Interpretation:
|
| 107 |
+
|
| 108 |
+
- Easy and hard both clear the 0.65 neighborhood in this run profile.
|
| 109 |
+
- Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
|
| 110 |
+
- Scores are not placeholders; they come from run artifacts in this repository.
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
### Supported Operational Actions
|
| 114 |
+
|
| 115 |
+
- `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
|
| 116 |
+
- `assign_capacity`
|
| 117 |
+
- `request_missing_documents`
|
| 118 |
+
- `escalate_service`
|
| 119 |
+
- `advance_time`
|
| 120 |
+
- `reallocate_officers`
|
| 121 |
+
|
| 122 |
+
### What an Agent Actually Optimizes
|
| 123 |
+
|
| 124 |
+
- Increase completions
|
| 125 |
+
- Keep SLA breaches low
|
| 126 |
+
- Preserve cross-service fairness
|
| 127 |
+
- Avoid invalid actions
|
| 128 |
+
- Use escalation budget carefully
|
| 129 |
+
|
| 130 |
+
|
| 131 |
## Current Main-Branch Status
|
| 132 |
|
| 133 |
This README is aligned to the current `main` branch code paths, including:
|