CivicAI / PROBLEM_STATEMENT.md
mahammadaftab's picture
Final updated
6298125

CivicAI β€” Real-World Problem Statement

Problem Definition

AI-driven societal policy optimization under uncertainty

Modern governments face a combinatorial decision-making problem: thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies, emergency responses) interact through complex causal chains to produce emergent societal outcomes across economic, public-health, and social cohesion dimensions β€” often with weeks-to-years of lag and high uncertainty.

No human decision-maker can simultaneously optimise all dimensions. AI agents trained in CivicAI learn to:

  1. Observe rich societal state (12+ indicators)
  2. Act across a continuous multi-dimensional policy space
  3. Receive delayed, multi-objective feedback
  4. Adapt to unexpected shocks (pandemics, market crashes, social unrest)

Real-World Domain Mapping

CivicAI dimension Real-world counterpart Real data anchor
gdp, gdp_growth, inflation Macroeconomic fiscal policy World Bank GDP / IMF inflation data
employment_rate Labour market policy ILO unemployment statistics
tax_rate, budget_balance Government revenue & deficit OECD fiscal balance data
health_index, infection_rate Public-health capacity & epidemics WHO health expenditure / GHI
crime_rate Rule-of-law & public safety UNODC crime indices
public_satisfaction Democratic legitimacy / approval Edelman Trust Barometer
emergent.wealth_inequality Distributional equity Gini coefficient (World Bank)
emergent.social_unrest Political stability World Governance Indicators
food_reserves, energy_reserves Strategic resource security FAO / IEA stockpile data
education_quality Human capital investment UNESCO / PISA

Domain 1 β€” Governance (Fiscal Policy)

Real-world problem: Governments must set tax rates that raise revenue without suppressing growth, and allocate budgets across competing public goods (healthcare vs. education vs. security) while maintaining fiscal sustainability.

CivicAI mapping:

  • Action: tax_rate ∈ [0, 1], healthcare_budget, education_budget, police_budget
  • State: gdp, inflation, employment_rate, budget_balance
  • Challenge: High taxes β†’ GDP drag; low taxes β†’ deficit spiral

Domain 2 β€” Economy (Macroeconomic Stabilisation)

Real-world problem: Recessions require countercyclical stimulus, but overspending triggers inflation. Optimal fiscal multipliers depend on the current economic regime.

CivicAI mapping:

  • Action: subsidy_policy ∈ {none, agriculture, industry, technology}
  • State: gdp_growth, inflation, employment_rate
  • Challenge: Technology subsidies boost long-run growth but worsen near-term inequality; agriculture subsidies improve food security but reduce GDP growth

Domain 3 β€” Public Health (Epidemic Management)

Real-world problem: Pandemics create tradeoffs between infection suppression (via lockdowns) and economic activity. Optimal policies depend on medical supply capacity, infection dynamics, and public compliance.

CivicAI mapping:

  • Action: healthcare_budget, emergency_response (lockdown / stimulus / open)
  • State: infection_rate, health_index, medical_supplies, gdp
  • Challenge: Lockdown reduces infection but crushes GDP; premature opening causes epidemic rebound

Domain 4 β€” Social Cohesion (Crisis Management)

Real-world problem: Compound crises (unemployment + crime + inequality + unrest) exhibit non-linear cascade dynamics: once social unrest exceeds a threshold, even good economic data fails to restore stability.

CivicAI mapping:

  • Action: All levers simultaneously; no single dominant strategy
  • State: public_satisfaction, crime_rate, emergent.wealth_inequality, emergent.social_unrest
  • Challenge: Inequality is a slow-moving structural variable; quick fixes (police budget) address symptoms, not causes

Tasks

Task 1 β€” Economic Stability [EASY]

Objective: Restore a mild recession economy to fiscal stability.

Criterion Target Failure
Inflation < 6% β‰₯ 15%
Employment > 85% ≀ 65%
GDP > $400B ≀ $250B
Budget Balance Surplus preferred ≀ βˆ’30% deficit

Initial conditions: GDP $450B, inflation 7%, employment 82%, satisfaction 55%

Deterministic grader (EconomicStabilityGrader):

score = 0.40 Γ— inflation_score
      + 0.40 Γ— employment_score
      + 0.10 Γ— gdp_score
      + 0.10 Γ— budget_score

inflation_score  = linear_inv(inflation, ideal=3%, fail=15%)
                   Γ— 0.40 if hyperinflation (>20%)
employment_score = linear(employment_rate, fail=65%, ideal=90%)
gdp_score        = linear(gdp, fail=$250B, ideal=$500B)
budget_score     = linear(budget_balance, fail=βˆ’30%, ideal=0%)

All linear() / linear_inv() produce values in [0.0, 1.0].
No random calls. Always deterministic.

Success threshold: score β‰₯ 0.75


Task 2 β€” Pandemic Management [MEDIUM]

Objective: Suppress a 20% infection-rate epidemic without destroying the economy.

Criterion Target Failure
Infection rate < 10% β‰₯ 30%
Health index > 0.60 ≀ 0.30
GDP > $300B ≀ $200B
Medical supplies > 0.60 ≀ 0.20

Initial conditions: Infection 20%, health index 0.55, GDP $480B, medical supplies 0.50

Deterministic grader (PandemicManagementGrader):

score = 0.40 Γ— infection_score
      + 0.30 Γ— health_score
      + 0.20 Γ— gdp_score
      + 0.10 Γ— supplies_score

infection_score = linear_inv(infection_rate, ideal=2%, fail=30%)
                  Γ— 0.50 if epidemic OOC (β‰₯40%)
health_score    = linear(health_index, fail=0.30, ideal=0.80)
gdp_score       = linear(gdp, fail=$200B, ideal=$480B)
supplies_score  = linear(medical_supplies, fail=0.20, ideal=0.80)

No random calls. Always deterministic.

Core tension: Lockdown ↑ infection_score but ↓ gdp_score β€” agent must find the optimal tradeoff trajectory.

Success threshold: score β‰₯ 0.75


Task 3 β€” Social Stability Crisis [HARD]

Objective: Restore social order from a compound multi-domain crisis with cascading failure risk.

Criterion Target Failure
Public satisfaction > 50% ≀ 15%
Crime rate < 12% β‰₯ 35%
Employment rate > 80% ≀ 55%
Wealth inequality (Gini) < 0.40 β‰₯ 0.70

Initial conditions: Employment 68%, crime 25%, satisfaction 30%, Gini 0.55, social unrest 0.45

Deterministic grader (SocialCrisisGrader):

score = 0.30 Γ— satisfaction_score
      + 0.25 Γ— crime_score
      + 0.25 Γ— employment_score
      + 0.20 Γ— inequality_score
      Γ— 0.60 if social_unrest > 0.65 (cascade penalty)

satisfaction_score  = linear(public_satisfaction, fail=0.15, ideal=0.70)
crime_score         = linear_inv(crime_rate, ideal=5%, fail=35%)
                      Γ— 0.50 if crime_rate β‰₯ 40%
employment_score    = linear(employment_rate, fail=55%, ideal=88%)
inequality_score    = linear_inv(gini, ideal=0.20, fail=0.70)

No random calls. Always deterministic.

Why it's hard:

  • Gini is structural β€” requires sustained tax redistribution over many turns
  • Social unrest cascade multiplier punishes instability even when individual metrics improve
  • No single dominant strategy; agents must balance all four dimensions simultaneously

Success threshold: score β‰₯ 0.75


Grader API

from civicai.graders import grade, GradeResult

result: GradeResult = grade(state, task_id="stabilize_economy")

print(result.score)        # float ∈ [0.0, 1.0]
print(result.success)      # bool: True if score β‰₯ 0.75
print(result.summary)      # human-readable verdict
print(result.to_dict())    # full component breakdown (JSON-serializable)

Every env.step() call returns this grade in info["task_grade"]:

obs, reward, done, info = env.step(action)
grade_result = info["task_grade"]   # dict: {score, success, components, ...}

Why This Is Non-Trivial

Challenge Description
Multi-objective 5 rubric dimensions + task-specific grader β€” no single scalar fully captures the objective
Long-horizon 50-turn episodes; many actions have 5–10 turn lag before effects appear
Non-linear dynamics Social unrest cascade, hyperinflation multiplier, epidemic OOC penalty
Structural vs. tactical Gini responds slowly to redistribution; crime responds quickly to policing
Real-world data GDP growth, inflation, unemployment, life expectancy anchored to World Bank baseline
Emergent behaviour Wealth inequality β†’ unrest β†’ protest β†’ GDP drag (3-step causal chain)