Spaces:
Sleeping
Sleeping
| # CivicAI β Real-World Problem Statement | |
| ## Problem Definition | |
| > **AI-driven societal policy optimization under uncertainty** | |
| Modern governments face a combinatorial decision-making problem: thousands of | |
| interdependent policy levers (taxes, healthcare spending, education, policing, | |
| subsidies, emergency responses) interact through complex causal chains to | |
| produce emergent societal outcomes across economic, public-health, and social | |
| cohesion dimensions β often with weeks-to-years of lag and high uncertainty. | |
| No human decision-maker can simultaneously optimise all dimensions. AI agents | |
| trained in CivicAI learn to: | |
| 1. Observe rich societal state (12+ indicators) | |
| 2. Act across a continuous multi-dimensional policy space | |
| 3. Receive delayed, multi-objective feedback | |
| 4. Adapt to unexpected shocks (pandemics, market crashes, social unrest) | |
| --- | |
| ## Real-World Domain Mapping | |
| | CivicAI dimension | Real-world counterpart | Real data anchor | | |
| |---|---|---| | |
| | `gdp`, `gdp_growth`, `inflation` | Macroeconomic fiscal policy | World Bank GDP / IMF inflation data | | |
| | `employment_rate` | Labour market policy | ILO unemployment statistics | | |
| | `tax_rate`, `budget_balance` | Government revenue & deficit | OECD fiscal balance data | | |
| | `health_index`, `infection_rate` | Public-health capacity & epidemics | WHO health expenditure / GHI | | |
| | `crime_rate` | Rule-of-law & public safety | UNODC crime indices | | |
| | `public_satisfaction` | Democratic legitimacy / approval | Edelman Trust Barometer | | |
| | `emergent.wealth_inequality` | Distributional equity | Gini coefficient (World Bank) | | |
| | `emergent.social_unrest` | Political stability | World Governance Indicators | | |
| | `food_reserves`, `energy_reserves` | Strategic resource security | FAO / IEA stockpile data | | |
| | `education_quality` | Human capital investment | UNESCO / PISA | | |
| ### Domain 1 β Governance (Fiscal Policy) | |
| **Real-world problem:** Governments must set tax rates that raise revenue | |
| without suppressing growth, and allocate budgets across competing public goods | |
| (healthcare vs. education vs. security) while maintaining fiscal sustainability. | |
| **CivicAI mapping:** | |
| - Action: `tax_rate` β [0, 1], `healthcare_budget`, `education_budget`, `police_budget` | |
| - State: `gdp`, `inflation`, `employment_rate`, `budget_balance` | |
| - Challenge: High taxes β GDP drag; low taxes β deficit spiral | |
| ### Domain 2 β Economy (Macroeconomic Stabilisation) | |
| **Real-world problem:** Recessions require countercyclical stimulus, but | |
| overspending triggers inflation. Optimal fiscal multipliers depend on the | |
| current economic regime. | |
| **CivicAI mapping:** | |
| - Action: `subsidy_policy` β {none, agriculture, industry, technology} | |
| - State: `gdp_growth`, `inflation`, `employment_rate` | |
| - Challenge: Technology subsidies boost long-run growth but worsen near-term | |
| inequality; agriculture subsidies improve food security but reduce GDP growth | |
| ### Domain 3 β Public Health (Epidemic Management) | |
| **Real-world problem:** Pandemics create tradeoffs between infection | |
| suppression (via lockdowns) and economic activity. Optimal policies depend on | |
| medical supply capacity, infection dynamics, and public compliance. | |
| **CivicAI mapping:** | |
| - Action: `healthcare_budget`, `emergency_response` (lockdown / stimulus / open) | |
| - State: `infection_rate`, `health_index`, `medical_supplies`, `gdp` | |
| - Challenge: Lockdown reduces infection but crushes GDP; premature opening | |
| causes epidemic rebound | |
| ### Domain 4 β Social Cohesion (Crisis Management) | |
| **Real-world problem:** Compound crises (unemployment + crime + inequality + | |
| unrest) exhibit non-linear cascade dynamics: once social unrest exceeds a | |
| threshold, even good economic data fails to restore stability. | |
| **CivicAI mapping:** | |
| - Action: All levers simultaneously; no single dominant strategy | |
| - State: `public_satisfaction`, `crime_rate`, `emergent.wealth_inequality`, | |
| `emergent.social_unrest` | |
| - Challenge: Inequality is a slow-moving structural variable; quick fixes | |
| (police budget) address symptoms, not causes | |
| --- | |
| ## Tasks | |
| ### Task 1 β Economic Stability `[EASY]` | |
| **Objective:** Restore a mild recession economy to fiscal stability. | |
| | Criterion | Target | Failure | | |
| |---|---|---| | |
| | Inflation | < 6% | β₯ 15% | | |
| | Employment | > 85% | β€ 65% | | |
| | GDP | > $400B | β€ $250B | | |
| | Budget Balance | Surplus preferred | β€ β30% deficit | | |
| **Initial conditions:** GDP $450B, inflation 7%, employment 82%, satisfaction 55% | |
| **Deterministic grader** (`EconomicStabilityGrader`): | |
| ``` | |
| score = 0.40 Γ inflation_score | |
| + 0.40 Γ employment_score | |
| + 0.10 Γ gdp_score | |
| + 0.10 Γ budget_score | |
| inflation_score = linear_inv(inflation, ideal=3%, fail=15%) | |
| Γ 0.40 if hyperinflation (>20%) | |
| employment_score = linear(employment_rate, fail=65%, ideal=90%) | |
| gdp_score = linear(gdp, fail=$250B, ideal=$500B) | |
| budget_score = linear(budget_balance, fail=β30%, ideal=0%) | |
| All linear() / linear_inv() produce values in [0.0, 1.0]. | |
| No random calls. Always deterministic. | |
| ``` | |
| **Success threshold:** score β₯ 0.75 | |
| --- | |
| ### Task 2 β Pandemic Management `[MEDIUM]` | |
| **Objective:** Suppress a 20% infection-rate epidemic without destroying the | |
| economy. | |
| | Criterion | Target | Failure | | |
| |---|---|---| | |
| | Infection rate | < 10% | β₯ 30% | | |
| | Health index | > 0.60 | β€ 0.30 | | |
| | GDP | > $300B | β€ $200B | | |
| | Medical supplies | > 0.60 | β€ 0.20 | | |
| **Initial conditions:** Infection 20%, health index 0.55, GDP $480B, medical supplies 0.50 | |
| **Deterministic grader** (`PandemicManagementGrader`): | |
| ``` | |
| score = 0.40 Γ infection_score | |
| + 0.30 Γ health_score | |
| + 0.20 Γ gdp_score | |
| + 0.10 Γ supplies_score | |
| infection_score = linear_inv(infection_rate, ideal=2%, fail=30%) | |
| Γ 0.50 if epidemic OOC (β₯40%) | |
| health_score = linear(health_index, fail=0.30, ideal=0.80) | |
| gdp_score = linear(gdp, fail=$200B, ideal=$480B) | |
| supplies_score = linear(medical_supplies, fail=0.20, ideal=0.80) | |
| No random calls. Always deterministic. | |
| ``` | |
| **Core tension:** Lockdown β infection_score but β gdp_score β agent must | |
| find the optimal tradeoff trajectory. | |
| **Success threshold:** score β₯ 0.75 | |
| --- | |
| ### Task 3 β Social Stability Crisis `[HARD]` | |
| **Objective:** Restore social order from a compound multi-domain crisis with | |
| cascading failure risk. | |
| | Criterion | Target | Failure | | |
| |---|---|---| | |
| | Public satisfaction | > 50% | β€ 15% | | |
| | Crime rate | < 12% | β₯ 35% | | |
| | Employment rate | > 80% | β€ 55% | | |
| | Wealth inequality (Gini) | < 0.40 | β₯ 0.70 | | |
| **Initial conditions:** Employment 68%, crime 25%, satisfaction 30%, Gini 0.55, social unrest 0.45 | |
| **Deterministic grader** (`SocialCrisisGrader`): | |
| ``` | |
| score = 0.30 Γ satisfaction_score | |
| + 0.25 Γ crime_score | |
| + 0.25 Γ employment_score | |
| + 0.20 Γ inequality_score | |
| Γ 0.60 if social_unrest > 0.65 (cascade penalty) | |
| satisfaction_score = linear(public_satisfaction, fail=0.15, ideal=0.70) | |
| crime_score = linear_inv(crime_rate, ideal=5%, fail=35%) | |
| Γ 0.50 if crime_rate β₯ 40% | |
| employment_score = linear(employment_rate, fail=55%, ideal=88%) | |
| inequality_score = linear_inv(gini, ideal=0.20, fail=0.70) | |
| No random calls. Always deterministic. | |
| ``` | |
| **Why it's hard:** | |
| - Gini is structural β requires sustained tax redistribution over many turns | |
| - Social unrest cascade multiplier punishes instability even when individual | |
| metrics improve | |
| - No single dominant strategy; agents must balance all four dimensions | |
| simultaneously | |
| **Success threshold:** score β₯ 0.75 | |
| --- | |
| ## Grader API | |
| ```python | |
| from civicai.graders import grade, GradeResult | |
| result: GradeResult = grade(state, task_id="stabilize_economy") | |
| print(result.score) # float β [0.0, 1.0] | |
| print(result.success) # bool: True if score β₯ 0.75 | |
| print(result.summary) # human-readable verdict | |
| print(result.to_dict()) # full component breakdown (JSON-serializable) | |
| ``` | |
| Every `env.step()` call returns this grade in `info["task_grade"]`: | |
| ```python | |
| obs, reward, done, info = env.step(action) | |
| grade_result = info["task_grade"] # dict: {score, success, components, ...} | |
| ``` | |
| --- | |
| ## Why This Is Non-Trivial | |
| | Challenge | Description | | |
| |---|---| | |
| | **Multi-objective** | 5 rubric dimensions + task-specific grader β no single scalar fully captures the objective | | |
| | **Long-horizon** | 50-turn episodes; many actions have 5β10 turn lag before effects appear | | |
| | **Non-linear dynamics** | Social unrest cascade, hyperinflation multiplier, epidemic OOC penalty | | |
| | **Structural vs. tactical** | Gini responds slowly to redistribution; crime responds quickly to policing | | |
| | **Real-world data** | GDP growth, inflation, unemployment, life expectancy anchored to World Bank baseline | | |
| | **Emergent behaviour** | Wealth inequality β unrest β protest β GDP drag (3-step causal chain) | | |