CivicAI / PROBLEM_STATEMENT.md
mahammadaftab's picture
Final updated
6298125
# CivicAI β€” Real-World Problem Statement
## Problem Definition
> **AI-driven societal policy optimization under uncertainty**
Modern governments face a combinatorial decision-making problem: thousands of
interdependent policy levers (taxes, healthcare spending, education, policing,
subsidies, emergency responses) interact through complex causal chains to
produce emergent societal outcomes across economic, public-health, and social
cohesion dimensions β€” often with weeks-to-years of lag and high uncertainty.
No human decision-maker can simultaneously optimise all dimensions. AI agents
trained in CivicAI learn to:
1. Observe rich societal state (12+ indicators)
2. Act across a continuous multi-dimensional policy space
3. Receive delayed, multi-objective feedback
4. Adapt to unexpected shocks (pandemics, market crashes, social unrest)
---
## Real-World Domain Mapping
| CivicAI dimension | Real-world counterpart | Real data anchor |
|---|---|---|
| `gdp`, `gdp_growth`, `inflation` | Macroeconomic fiscal policy | World Bank GDP / IMF inflation data |
| `employment_rate` | Labour market policy | ILO unemployment statistics |
| `tax_rate`, `budget_balance` | Government revenue & deficit | OECD fiscal balance data |
| `health_index`, `infection_rate` | Public-health capacity & epidemics | WHO health expenditure / GHI |
| `crime_rate` | Rule-of-law & public safety | UNODC crime indices |
| `public_satisfaction` | Democratic legitimacy / approval | Edelman Trust Barometer |
| `emergent.wealth_inequality` | Distributional equity | Gini coefficient (World Bank) |
| `emergent.social_unrest` | Political stability | World Governance Indicators |
| `food_reserves`, `energy_reserves` | Strategic resource security | FAO / IEA stockpile data |
| `education_quality` | Human capital investment | UNESCO / PISA |
### Domain 1 β€” Governance (Fiscal Policy)
**Real-world problem:** Governments must set tax rates that raise revenue
without suppressing growth, and allocate budgets across competing public goods
(healthcare vs. education vs. security) while maintaining fiscal sustainability.
**CivicAI mapping:**
- Action: `tax_rate` ∈ [0, 1], `healthcare_budget`, `education_budget`, `police_budget`
- State: `gdp`, `inflation`, `employment_rate`, `budget_balance`
- Challenge: High taxes β†’ GDP drag; low taxes β†’ deficit spiral
### Domain 2 β€” Economy (Macroeconomic Stabilisation)
**Real-world problem:** Recessions require countercyclical stimulus, but
overspending triggers inflation. Optimal fiscal multipliers depend on the
current economic regime.
**CivicAI mapping:**
- Action: `subsidy_policy` ∈ {none, agriculture, industry, technology}
- State: `gdp_growth`, `inflation`, `employment_rate`
- Challenge: Technology subsidies boost long-run growth but worsen near-term
inequality; agriculture subsidies improve food security but reduce GDP growth
### Domain 3 β€” Public Health (Epidemic Management)
**Real-world problem:** Pandemics create tradeoffs between infection
suppression (via lockdowns) and economic activity. Optimal policies depend on
medical supply capacity, infection dynamics, and public compliance.
**CivicAI mapping:**
- Action: `healthcare_budget`, `emergency_response` (lockdown / stimulus / open)
- State: `infection_rate`, `health_index`, `medical_supplies`, `gdp`
- Challenge: Lockdown reduces infection but crushes GDP; premature opening
causes epidemic rebound
### Domain 4 β€” Social Cohesion (Crisis Management)
**Real-world problem:** Compound crises (unemployment + crime + inequality +
unrest) exhibit non-linear cascade dynamics: once social unrest exceeds a
threshold, even good economic data fails to restore stability.
**CivicAI mapping:**
- Action: All levers simultaneously; no single dominant strategy
- State: `public_satisfaction`, `crime_rate`, `emergent.wealth_inequality`,
`emergent.social_unrest`
- Challenge: Inequality is a slow-moving structural variable; quick fixes
(police budget) address symptoms, not causes
---
## Tasks
### Task 1 β€” Economic Stability `[EASY]`
**Objective:** Restore a mild recession economy to fiscal stability.
| Criterion | Target | Failure |
|---|---|---|
| Inflation | < 6% | β‰₯ 15% |
| Employment | > 85% | ≀ 65% |
| GDP | > $400B | ≀ $250B |
| Budget Balance | Surplus preferred | ≀ βˆ’30% deficit |
**Initial conditions:** GDP $450B, inflation 7%, employment 82%, satisfaction 55%
**Deterministic grader** (`EconomicStabilityGrader`):
```
score = 0.40 Γ— inflation_score
+ 0.40 Γ— employment_score
+ 0.10 Γ— gdp_score
+ 0.10 Γ— budget_score
inflation_score = linear_inv(inflation, ideal=3%, fail=15%)
Γ— 0.40 if hyperinflation (>20%)
employment_score = linear(employment_rate, fail=65%, ideal=90%)
gdp_score = linear(gdp, fail=$250B, ideal=$500B)
budget_score = linear(budget_balance, fail=βˆ’30%, ideal=0%)
All linear() / linear_inv() produce values in [0.0, 1.0].
No random calls. Always deterministic.
```
**Success threshold:** score β‰₯ 0.75
---
### Task 2 β€” Pandemic Management `[MEDIUM]`
**Objective:** Suppress a 20% infection-rate epidemic without destroying the
economy.
| Criterion | Target | Failure |
|---|---|---|
| Infection rate | < 10% | β‰₯ 30% |
| Health index | > 0.60 | ≀ 0.30 |
| GDP | > $300B | ≀ $200B |
| Medical supplies | > 0.60 | ≀ 0.20 |
**Initial conditions:** Infection 20%, health index 0.55, GDP $480B, medical supplies 0.50
**Deterministic grader** (`PandemicManagementGrader`):
```
score = 0.40 Γ— infection_score
+ 0.30 Γ— health_score
+ 0.20 Γ— gdp_score
+ 0.10 Γ— supplies_score
infection_score = linear_inv(infection_rate, ideal=2%, fail=30%)
Γ— 0.50 if epidemic OOC (β‰₯40%)
health_score = linear(health_index, fail=0.30, ideal=0.80)
gdp_score = linear(gdp, fail=$200B, ideal=$480B)
supplies_score = linear(medical_supplies, fail=0.20, ideal=0.80)
No random calls. Always deterministic.
```
**Core tension:** Lockdown ↑ infection_score but ↓ gdp_score β€” agent must
find the optimal tradeoff trajectory.
**Success threshold:** score β‰₯ 0.75
---
### Task 3 β€” Social Stability Crisis `[HARD]`
**Objective:** Restore social order from a compound multi-domain crisis with
cascading failure risk.
| Criterion | Target | Failure |
|---|---|---|
| Public satisfaction | > 50% | ≀ 15% |
| Crime rate | < 12% | β‰₯ 35% |
| Employment rate | > 80% | ≀ 55% |
| Wealth inequality (Gini) | < 0.40 | β‰₯ 0.70 |
**Initial conditions:** Employment 68%, crime 25%, satisfaction 30%, Gini 0.55, social unrest 0.45
**Deterministic grader** (`SocialCrisisGrader`):
```
score = 0.30 Γ— satisfaction_score
+ 0.25 Γ— crime_score
+ 0.25 Γ— employment_score
+ 0.20 Γ— inequality_score
Γ— 0.60 if social_unrest > 0.65 (cascade penalty)
satisfaction_score = linear(public_satisfaction, fail=0.15, ideal=0.70)
crime_score = linear_inv(crime_rate, ideal=5%, fail=35%)
Γ— 0.50 if crime_rate β‰₯ 40%
employment_score = linear(employment_rate, fail=55%, ideal=88%)
inequality_score = linear_inv(gini, ideal=0.20, fail=0.70)
No random calls. Always deterministic.
```
**Why it's hard:**
- Gini is structural β€” requires sustained tax redistribution over many turns
- Social unrest cascade multiplier punishes instability even when individual
metrics improve
- No single dominant strategy; agents must balance all four dimensions
simultaneously
**Success threshold:** score β‰₯ 0.75
---
## Grader API
```python
from civicai.graders import grade, GradeResult
result: GradeResult = grade(state, task_id="stabilize_economy")
print(result.score) # float ∈ [0.0, 1.0]
print(result.success) # bool: True if score β‰₯ 0.75
print(result.summary) # human-readable verdict
print(result.to_dict()) # full component breakdown (JSON-serializable)
```
Every `env.step()` call returns this grade in `info["task_grade"]`:
```python
obs, reward, done, info = env.step(action)
grade_result = info["task_grade"] # dict: {score, success, components, ...}
```
---
## Why This Is Non-Trivial
| Challenge | Description |
|---|---|
| **Multi-objective** | 5 rubric dimensions + task-specific grader β€” no single scalar fully captures the objective |
| **Long-horizon** | 50-turn episodes; many actions have 5–10 turn lag before effects appear |
| **Non-linear dynamics** | Social unrest cascade, hyperinflation multiplier, epidemic OOC penalty |
| **Structural vs. tactical** | Gini responds slowly to redistribution; crime responds quickly to policing |
| **Real-world data** | GDP growth, inflation, unemployment, life expectancy anchored to World Bank baseline |
| **Emergent behaviour** | Wealth inequality β†’ unrest β†’ protest β†’ GDP drag (3-step causal chain) |