Spaces:
Sleeping
CivicAI β Real-World Problem Statement
Problem Definition
AI-driven societal policy optimization under uncertainty
Modern governments face a combinatorial decision-making problem: thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies, emergency responses) interact through complex causal chains to produce emergent societal outcomes across economic, public-health, and social cohesion dimensions β often with weeks-to-years of lag and high uncertainty.
No human decision-maker can simultaneously optimise all dimensions. AI agents trained in CivicAI learn to:
- Observe rich societal state (12+ indicators)
- Act across a continuous multi-dimensional policy space
- Receive delayed, multi-objective feedback
- Adapt to unexpected shocks (pandemics, market crashes, social unrest)
Real-World Domain Mapping
| CivicAI dimension | Real-world counterpart | Real data anchor |
|---|---|---|
gdp, gdp_growth, inflation |
Macroeconomic fiscal policy | World Bank GDP / IMF inflation data |
employment_rate |
Labour market policy | ILO unemployment statistics |
tax_rate, budget_balance |
Government revenue & deficit | OECD fiscal balance data |
health_index, infection_rate |
Public-health capacity & epidemics | WHO health expenditure / GHI |
crime_rate |
Rule-of-law & public safety | UNODC crime indices |
public_satisfaction |
Democratic legitimacy / approval | Edelman Trust Barometer |
emergent.wealth_inequality |
Distributional equity | Gini coefficient (World Bank) |
emergent.social_unrest |
Political stability | World Governance Indicators |
food_reserves, energy_reserves |
Strategic resource security | FAO / IEA stockpile data |
education_quality |
Human capital investment | UNESCO / PISA |
Domain 1 β Governance (Fiscal Policy)
Real-world problem: Governments must set tax rates that raise revenue without suppressing growth, and allocate budgets across competing public goods (healthcare vs. education vs. security) while maintaining fiscal sustainability.
CivicAI mapping:
- Action:
tax_rateβ [0, 1],healthcare_budget,education_budget,police_budget - State:
gdp,inflation,employment_rate,budget_balance - Challenge: High taxes β GDP drag; low taxes β deficit spiral
Domain 2 β Economy (Macroeconomic Stabilisation)
Real-world problem: Recessions require countercyclical stimulus, but overspending triggers inflation. Optimal fiscal multipliers depend on the current economic regime.
CivicAI mapping:
- Action:
subsidy_policyβ {none, agriculture, industry, technology} - State:
gdp_growth,inflation,employment_rate - Challenge: Technology subsidies boost long-run growth but worsen near-term inequality; agriculture subsidies improve food security but reduce GDP growth
Domain 3 β Public Health (Epidemic Management)
Real-world problem: Pandemics create tradeoffs between infection suppression (via lockdowns) and economic activity. Optimal policies depend on medical supply capacity, infection dynamics, and public compliance.
CivicAI mapping:
- Action:
healthcare_budget,emergency_response(lockdown / stimulus / open) - State:
infection_rate,health_index,medical_supplies,gdp - Challenge: Lockdown reduces infection but crushes GDP; premature opening causes epidemic rebound
Domain 4 β Social Cohesion (Crisis Management)
Real-world problem: Compound crises (unemployment + crime + inequality + unrest) exhibit non-linear cascade dynamics: once social unrest exceeds a threshold, even good economic data fails to restore stability.
CivicAI mapping:
- Action: All levers simultaneously; no single dominant strategy
- State:
public_satisfaction,crime_rate,emergent.wealth_inequality,emergent.social_unrest - Challenge: Inequality is a slow-moving structural variable; quick fixes (police budget) address symptoms, not causes
Tasks
Task 1 β Economic Stability [EASY]
Objective: Restore a mild recession economy to fiscal stability.
| Criterion | Target | Failure |
|---|---|---|
| Inflation | < 6% | β₯ 15% |
| Employment | > 85% | β€ 65% |
| GDP | > $400B | β€ $250B |
| Budget Balance | Surplus preferred | β€ β30% deficit |
Initial conditions: GDP $450B, inflation 7%, employment 82%, satisfaction 55%
Deterministic grader (EconomicStabilityGrader):
score = 0.40 Γ inflation_score
+ 0.40 Γ employment_score
+ 0.10 Γ gdp_score
+ 0.10 Γ budget_score
inflation_score = linear_inv(inflation, ideal=3%, fail=15%)
Γ 0.40 if hyperinflation (>20%)
employment_score = linear(employment_rate, fail=65%, ideal=90%)
gdp_score = linear(gdp, fail=$250B, ideal=$500B)
budget_score = linear(budget_balance, fail=β30%, ideal=0%)
All linear() / linear_inv() produce values in [0.0, 1.0].
No random calls. Always deterministic.
Success threshold: score β₯ 0.75
Task 2 β Pandemic Management [MEDIUM]
Objective: Suppress a 20% infection-rate epidemic without destroying the economy.
| Criterion | Target | Failure |
|---|---|---|
| Infection rate | < 10% | β₯ 30% |
| Health index | > 0.60 | β€ 0.30 |
| GDP | > $300B | β€ $200B |
| Medical supplies | > 0.60 | β€ 0.20 |
Initial conditions: Infection 20%, health index 0.55, GDP $480B, medical supplies 0.50
Deterministic grader (PandemicManagementGrader):
score = 0.40 Γ infection_score
+ 0.30 Γ health_score
+ 0.20 Γ gdp_score
+ 0.10 Γ supplies_score
infection_score = linear_inv(infection_rate, ideal=2%, fail=30%)
Γ 0.50 if epidemic OOC (β₯40%)
health_score = linear(health_index, fail=0.30, ideal=0.80)
gdp_score = linear(gdp, fail=$200B, ideal=$480B)
supplies_score = linear(medical_supplies, fail=0.20, ideal=0.80)
No random calls. Always deterministic.
Core tension: Lockdown β infection_score but β gdp_score β agent must find the optimal tradeoff trajectory.
Success threshold: score β₯ 0.75
Task 3 β Social Stability Crisis [HARD]
Objective: Restore social order from a compound multi-domain crisis with cascading failure risk.
| Criterion | Target | Failure |
|---|---|---|
| Public satisfaction | > 50% | β€ 15% |
| Crime rate | < 12% | β₯ 35% |
| Employment rate | > 80% | β€ 55% |
| Wealth inequality (Gini) | < 0.40 | β₯ 0.70 |
Initial conditions: Employment 68%, crime 25%, satisfaction 30%, Gini 0.55, social unrest 0.45
Deterministic grader (SocialCrisisGrader):
score = 0.30 Γ satisfaction_score
+ 0.25 Γ crime_score
+ 0.25 Γ employment_score
+ 0.20 Γ inequality_score
Γ 0.60 if social_unrest > 0.65 (cascade penalty)
satisfaction_score = linear(public_satisfaction, fail=0.15, ideal=0.70)
crime_score = linear_inv(crime_rate, ideal=5%, fail=35%)
Γ 0.50 if crime_rate β₯ 40%
employment_score = linear(employment_rate, fail=55%, ideal=88%)
inequality_score = linear_inv(gini, ideal=0.20, fail=0.70)
No random calls. Always deterministic.
Why it's hard:
- Gini is structural β requires sustained tax redistribution over many turns
- Social unrest cascade multiplier punishes instability even when individual metrics improve
- No single dominant strategy; agents must balance all four dimensions simultaneously
Success threshold: score β₯ 0.75
Grader API
from civicai.graders import grade, GradeResult
result: GradeResult = grade(state, task_id="stabilize_economy")
print(result.score) # float β [0.0, 1.0]
print(result.success) # bool: True if score β₯ 0.75
print(result.summary) # human-readable verdict
print(result.to_dict()) # full component breakdown (JSON-serializable)
Every env.step() call returns this grade in info["task_grade"]:
obs, reward, done, info = env.step(action)
grade_result = info["task_grade"] # dict: {score, success, components, ...}
Why This Is Non-Trivial
| Challenge | Description |
|---|---|
| Multi-objective | 5 rubric dimensions + task-specific grader β no single scalar fully captures the objective |
| Long-horizon | 50-turn episodes; many actions have 5β10 turn lag before effects appear |
| Non-linear dynamics | Social unrest cascade, hyperinflation multiplier, epidemic OOC penalty |
| Structural vs. tactical | Gini responds slowly to redistribution; crime responds quickly to policing |
| Real-world data | GDP growth, inflation, unemployment, life expectancy anchored to World Bank baseline |
| Emergent behaviour | Wealth inequality β unrest β protest β GDP drag (3-step causal chain) |