Otter21 commited on
Commit
50770bb
·
verified ·
1 Parent(s): 6898867

CORRECTED FAULTY QUICK LINKS TO THE NOTEBOOK

Browse files

GOOGLE COLLAB LINKS WERE FAUTLY , EXCHANGED THEM WITH CORRECT ONES .

Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -34,6 +34,100 @@ This repository is productionized for:
34
  - Docker runtime
35
  - Hugging Face Spaces (Docker SDK)
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Current Main-Branch Status
38
 
39
  This README is aligned to the current `main` branch code paths, including:
 
34
  - Docker runtime
35
  - Hugging Face Spaces (Docker SDK)
36
 
37
+ ## Why This Problem Matters
38
+
39
+ Government service offices handle high-volume, high-stakes citizen requests such as income certificates, land registration, passports, driving licenses, and Aadhaar-linked services.
40
+ In real operations, delays are usually caused by sequential operational decisions, not one single technical bug.
41
+
42
+ Typical daily decisions include:
43
+
44
+ - which queue to prioritize first
45
+ - where to allocate limited officers
46
+ - when to request missing documents
47
+ - when to use escalation budget
48
+ - how to reduce backlog without harming fairness across services
49
+
50
+ This project models those decisions as a deterministic RL/OpenEnv environment so we can evaluate policy quality using measurable outcomes (throughput, SLA compliance, fairness, and operational discipline), not subjective demos.
51
+
52
+ ## How the Environment Works
53
+
54
+ At runtime, the environment follows the same loop for every task:
55
+
56
+ 1. `reset(task_id, seed)`
57
+ Initializes a new episode with deterministic task configuration.
58
+
59
+ 2. `step(action)`
60
+ Applies one operational action and advances system state.
61
+
62
+ 3. `state()`
63
+ Returns full episode-level metrics such as backlog, completed cases, SLA breaches, fairness gap, and invalid actions.
64
+
65
+ 4. `grade(state)`
66
+ Computes deterministic grader score in `[0.0, 1.0]` based on task-specific weighting.
67
+
68
+ This forms a transparent policy-evaluation loop:
69
+ `reset -> repeated step -> state -> grade`.
70
+
71
+ ## Reward and Grading Logic
72
+
73
+ ### Dense Reward (per step)
74
+
75
+ The reward function gives continuous learning signal across an episode:
76
+
77
+ - positive for stage progress and completions
78
+ - penalties for backlog pressure, new SLA breaches, fairness excess beyond threshold, invalid actions, and idle officer capacity
79
+
80
+ This avoids sparse “win/lose only at end” behavior and supports stable policy learning.
81
+
82
+ ### Deterministic Task Graders
83
+
84
+ Final scoring is deterministic and bounded in `[0.0, 1.0]`:
85
+
86
+ - Easy task prioritizes completion + SLA
87
+ - Medium balances completion, SLA, urgency handling, and fairness
88
+ - Hard emphasizes all-round performance including fairness and escalation discipline
89
+
90
+ Because grading is deterministic, repeated runs with the same seed are reproducible.
91
+
92
+ ## Baseline Results (Current Main Branch Artifacts)
93
+
94
+ The following scores are from the current codebase artifact file:
95
+
96
+ - source: `results/smoke_test_results.json`
97
+ - policy: `backlog_clearance`
98
+ - fixed seeds from task config (`11`, `22`, `33`)
99
+
100
+ | Task | Steps | Score | Completed | Backlog |
101
+ |---|---:|---:|---:|---:|
102
+ | `district_backlog_easy` | 33 | 0.6716 | 27 | 24 |
103
+ | `mixed_urgency_medium` | 61 | 0.5867 | 49 | 53 |
104
+ | `cross_department_hard` | 89 | 0.6522 | 73 | 92 |
105
+
106
+ Interpretation:
107
+
108
+ - Easy and hard both clear the 0.65 neighborhood in this run profile.
109
+ - Medium remains the most difficult balance point due to mixed urgency and fairness pressure.
110
+ - Scores are not placeholders; they come from run artifacts in this repository.
111
+
112
+
113
+ ### Supported Operational Actions
114
+
115
+ - `set_priority_mode` (`urgent_first`, `oldest_first`, `balanced`, `backlog_clearance`)
116
+ - `assign_capacity`
117
+ - `request_missing_documents`
118
+ - `escalate_service`
119
+ - `advance_time`
120
+ - `reallocate_officers`
121
+
122
+ ### What an Agent Actually Optimizes
123
+
124
+ - Increase completions
125
+ - Keep SLA breaches low
126
+ - Preserve cross-service fairness
127
+ - Avoid invalid actions
128
+ - Use escalation budget carefully
129
+
130
+
131
  ## Current Main-Branch Status
132
 
133
  This README is aligned to the current `main` branch code paths, including: