File size: 8,846 Bytes
6298125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# CivicAI β€” Real-World Problem Statement

## Problem Definition

> **AI-driven societal policy optimization under uncertainty**

Modern governments face a combinatorial decision-making problem: thousands of
interdependent policy levers (taxes, healthcare spending, education, policing,
subsidies, emergency responses) interact through complex causal chains to
produce emergent societal outcomes across economic, public-health, and social
cohesion dimensions β€” often with weeks-to-years of lag and high uncertainty.

No human decision-maker can simultaneously optimise all dimensions. AI agents
trained in CivicAI learn to:

1. Observe rich societal state (12+ indicators)
2. Act across a continuous multi-dimensional policy space
3. Receive delayed, multi-objective feedback
4. Adapt to unexpected shocks (pandemics, market crashes, social unrest)

---

## Real-World Domain Mapping

| CivicAI dimension | Real-world counterpart | Real data anchor |
|---|---|---|
| `gdp`, `gdp_growth`, `inflation` | Macroeconomic fiscal policy | World Bank GDP / IMF inflation data |
| `employment_rate` | Labour market policy | ILO unemployment statistics |
| `tax_rate`, `budget_balance` | Government revenue & deficit | OECD fiscal balance data |
| `health_index`, `infection_rate` | Public-health capacity & epidemics | WHO health expenditure / GHI |
| `crime_rate` | Rule-of-law & public safety | UNODC crime indices |
| `public_satisfaction` | Democratic legitimacy / approval | Edelman Trust Barometer |
| `emergent.wealth_inequality` | Distributional equity | Gini coefficient (World Bank) |
| `emergent.social_unrest` | Political stability | World Governance Indicators |
| `food_reserves`, `energy_reserves` | Strategic resource security | FAO / IEA stockpile data |
| `education_quality` | Human capital investment | UNESCO / PISA |

### Domain 1 β€” Governance (Fiscal Policy)

**Real-world problem:** Governments must set tax rates that raise revenue
without suppressing growth, and allocate budgets across competing public goods
(healthcare vs. education vs. security) while maintaining fiscal sustainability.

**CivicAI mapping:**
- Action: `tax_rate` ∈ [0, 1], `healthcare_budget`, `education_budget`, `police_budget`
- State: `gdp`, `inflation`, `employment_rate`, `budget_balance`
- Challenge: High taxes β†’ GDP drag; low taxes β†’ deficit spiral

### Domain 2 β€” Economy (Macroeconomic Stabilisation)

**Real-world problem:** Recessions require countercyclical stimulus, but
overspending triggers inflation. Optimal fiscal multipliers depend on the
current economic regime.

**CivicAI mapping:**
- Action: `subsidy_policy` ∈ {none, agriculture, industry, technology}
- State: `gdp_growth`, `inflation`, `employment_rate`
- Challenge: Technology subsidies boost long-run growth but worsen near-term
  inequality; agriculture subsidies improve food security but reduce GDP growth

### Domain 3 β€” Public Health (Epidemic Management)

**Real-world problem:** Pandemics create tradeoffs between infection
suppression (via lockdowns) and economic activity. Optimal policies depend on
medical supply capacity, infection dynamics, and public compliance.

**CivicAI mapping:**
- Action: `healthcare_budget`, `emergency_response` (lockdown / stimulus / open)
- State: `infection_rate`, `health_index`, `medical_supplies`, `gdp`
- Challenge: Lockdown reduces infection but crushes GDP; premature opening
  causes epidemic rebound

### Domain 4 β€” Social Cohesion (Crisis Management)

**Real-world problem:** Compound crises (unemployment + crime + inequality +
unrest) exhibit non-linear cascade dynamics: once social unrest exceeds a
threshold, even good economic data fails to restore stability.

**CivicAI mapping:**
- Action: All levers simultaneously; no single dominant strategy
- State: `public_satisfaction`, `crime_rate`, `emergent.wealth_inequality`,
  `emergent.social_unrest`
- Challenge: Inequality is a slow-moving structural variable; quick fixes
  (police budget) address symptoms, not causes

---

## Tasks

### Task 1 β€” Economic Stability `[EASY]`

**Objective:** Restore a mild recession economy to fiscal stability.

| Criterion | Target | Failure |
|---|---|---|
| Inflation | < 6% | β‰₯ 15% |
| Employment | > 85% | ≀ 65% |
| GDP | > $400B | ≀ $250B |
| Budget Balance | Surplus preferred | ≀ βˆ’30% deficit |

**Initial conditions:** GDP $450B, inflation 7%, employment 82%, satisfaction 55%

**Deterministic grader** (`EconomicStabilityGrader`):

```
score = 0.40 Γ— inflation_score
      + 0.40 Γ— employment_score
      + 0.10 Γ— gdp_score
      + 0.10 Γ— budget_score

inflation_score  = linear_inv(inflation, ideal=3%, fail=15%)
                   Γ— 0.40 if hyperinflation (>20%)
employment_score = linear(employment_rate, fail=65%, ideal=90%)
gdp_score        = linear(gdp, fail=$250B, ideal=$500B)
budget_score     = linear(budget_balance, fail=βˆ’30%, ideal=0%)

All linear() / linear_inv() produce values in [0.0, 1.0].
No random calls. Always deterministic.
```

**Success threshold:** score β‰₯ 0.75

---

### Task 2 β€” Pandemic Management `[MEDIUM]`

**Objective:** Suppress a 20% infection-rate epidemic without destroying the
economy.

| Criterion | Target | Failure |
|---|---|---|
| Infection rate | < 10% | β‰₯ 30% |
| Health index | > 0.60 | ≀ 0.30 |
| GDP | > $300B | ≀ $200B |
| Medical supplies | > 0.60 | ≀ 0.20 |

**Initial conditions:** Infection 20%, health index 0.55, GDP $480B, medical supplies 0.50

**Deterministic grader** (`PandemicManagementGrader`):

```
score = 0.40 Γ— infection_score
      + 0.30 Γ— health_score
      + 0.20 Γ— gdp_score
      + 0.10 Γ— supplies_score

infection_score = linear_inv(infection_rate, ideal=2%, fail=30%)
                  Γ— 0.50 if epidemic OOC (β‰₯40%)
health_score    = linear(health_index, fail=0.30, ideal=0.80)
gdp_score       = linear(gdp, fail=$200B, ideal=$480B)
supplies_score  = linear(medical_supplies, fail=0.20, ideal=0.80)

No random calls. Always deterministic.
```

**Core tension:** Lockdown ↑ infection_score but ↓ gdp_score β€” agent must
find the optimal tradeoff trajectory.

**Success threshold:** score β‰₯ 0.75

---

### Task 3 β€” Social Stability Crisis `[HARD]`

**Objective:** Restore social order from a compound multi-domain crisis with
cascading failure risk.

| Criterion | Target | Failure |
|---|---|---|
| Public satisfaction | > 50% | ≀ 15% |
| Crime rate | < 12% | β‰₯ 35% |
| Employment rate | > 80% | ≀ 55% |
| Wealth inequality (Gini) | < 0.40 | β‰₯ 0.70 |

**Initial conditions:** Employment 68%, crime 25%, satisfaction 30%, Gini 0.55, social unrest 0.45

**Deterministic grader** (`SocialCrisisGrader`):

```
score = 0.30 Γ— satisfaction_score
      + 0.25 Γ— crime_score
      + 0.25 Γ— employment_score
      + 0.20 Γ— inequality_score
      Γ— 0.60 if social_unrest > 0.65 (cascade penalty)

satisfaction_score  = linear(public_satisfaction, fail=0.15, ideal=0.70)
crime_score         = linear_inv(crime_rate, ideal=5%, fail=35%)
                      Γ— 0.50 if crime_rate β‰₯ 40%
employment_score    = linear(employment_rate, fail=55%, ideal=88%)
inequality_score    = linear_inv(gini, ideal=0.20, fail=0.70)

No random calls. Always deterministic.
```

**Why it's hard:**
- Gini is structural β€” requires sustained tax redistribution over many turns
- Social unrest cascade multiplier punishes instability even when individual
  metrics improve
- No single dominant strategy; agents must balance all four dimensions
  simultaneously

**Success threshold:** score β‰₯ 0.75

---

## Grader API

```python
from civicai.graders import grade, GradeResult

result: GradeResult = grade(state, task_id="stabilize_economy")

print(result.score)        # float ∈ [0.0, 1.0]
print(result.success)      # bool: True if score β‰₯ 0.75
print(result.summary)      # human-readable verdict
print(result.to_dict())    # full component breakdown (JSON-serializable)
```

Every `env.step()` call returns this grade in `info["task_grade"]`:

```python
obs, reward, done, info = env.step(action)
grade_result = info["task_grade"]   # dict: {score, success, components, ...}
```

---

## Why This Is Non-Trivial

| Challenge | Description |
|---|---|
| **Multi-objective** | 5 rubric dimensions + task-specific grader β€” no single scalar fully captures the objective |
| **Long-horizon** | 50-turn episodes; many actions have 5–10 turn lag before effects appear |
| **Non-linear dynamics** | Social unrest cascade, hyperinflation multiplier, epidemic OOC penalty |
| **Structural vs. tactical** | Gini responds slowly to redistribution; crime responds quickly to policing |
| **Real-world data** | GDP growth, inflation, unemployment, life expectancy anchored to World Bank baseline |
| **Emergent behaviour** | Wealth inequality β†’ unrest β†’ protest β†’ GDP drag (3-step causal chain) |