CivicAI / README.md
mahammadaftab's picture
Update README.md
e97e92a verified
metadata
title: CivicAI Society Simulator
emoji: πŸ›οΈ
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
app_file: server/app.py
pinned: false

πŸ›οΈ CivicAI: AI-Driven Societal Policy Optimization Under Uncertainty

OpenEnv Python License

Governing a society of 10 million people is not a game of chess. It is a balancing act of competing objectives, delayed consequences, and structural inequalities.

CivicAI is a production-grade, multi-agent societal decision-making environment designed for the OpenEnv Hackathon. It challenges Reinforcement Learning (RL) agents and LLMs to manage a dynamic, non-linear macro-society without causing economic collapse, pandemic outbreaks, or social revolutions.


🎯 The Problem

What real-world problem do we solve?

Modern governments face a combinatorial decision-making problem. Thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies) interact through complex causal chains to produce emergent societal outcomesβ€”often with weeks-to-years of lag and high uncertainty.

Current AI agents excel at static datasets, text completion, or simple video games. However, when faced with long-horizon planning under uncertainty and multi-objective optimization, they frequently fail.

CivicAI bridges this capability gap. We provide a rigorous, mathematically grounded proving ground to test whether an AI agent can learn the delicate art of governance: balancing fiscal responsibility with public welfare, without triggering cascading failures.

πŸš€ Why This Environment Is Novel

CivicAI is not a grid-world or static dataset problem. It introduces:

  • Long-horizon decision making (50 steps)
  • Delayed consequences (policy effects over time)
  • Multi-objective optimization (economy + health + society)
  • Emergent behavior (crime, inequality, unrest)

πŸ‘‰ This makes it suitable for training real-world decision-making agents, not toy environments.


βš™οΈ OpenEnv Compliance (MANDATORY API)

CivicAI fully follows the OpenEnv specification:

  • reset() β†’ initializes environment with task-specific conditions
  • step(action) β†’ returns (observation, reward, done, info)
  • state() β†’ returns full internal state

Typed Models (Pydantic):

  • Observation: structured societal metrics
  • Action: policy vector (tax, budgets, subsidies)
  • Reward: normalized score [0.0 – 1.0]

openenv.yaml includes:

  • Environment metadata
  • Action/Observation schema
  • Task definitions (easy β†’ hard)

🌍 The Environment

The agent acts as the central policy-maker for a society over a 50-turn episode (where 1 turn = 1 quarter).

πŸ” Observation Space (12+ Indicators)

Agents observe a dense, continuous state space mapped to real-world equivalents:

  • Macroeconomics: GDP ($), GDP Growth (%), Inflation Rate (%), Employment Rate (%).
  • Public Health & Resources: Health Index (0-1), Infection Rate (%), Medical/Food/Energy Supplies.
  • Social Cohesion: Public Satisfaction (0-1), Crime Rate (%), Wealth Inequality (Gini coefficient), Social Unrest.

βš™οΈ Action Space (Continuous & Categorical)

Agents control federal budgets and policy levers at every turn:

  • Tax Rate (0.0 - 1.0): Raises revenue but creates economic drag.
  • Budget Allocations (0.0 - 1.0): Healthcare, Education, and Police budgets.
  • Subsidy Policy: none, agriculture, industry, or technology.
  • Emergency Response: Lockdowns or stimulus packages.

βš–οΈ Reward Logic (Dense & Hard-to-Game)

We abandoned naive 0/1 binary rewards for a highly continuous, anti-exploitation OpenEnv Rubric System. The reward function is explicitly designed to prevent "gaming" the metrics:

  1. Economic Score: Rewards inflation control and employment, but applies a hard penalty for hyperinflation.
  2. Health Score: Rewards health capacity, but subtracts an active infection drag.
  3. Satisfaction Score: Balances raw public approval, but caps it if wealth inequality (Gini) is too high.
  4. Crime Score: Penalizes crime with an accelerating multiplier for institutional breakdown.
  5. Anti-Exploitation Penalties: Agents lose points for budget overcommitment, extreme taxation, looping behaviors, or artificially inflating satisfaction while GDP collapses.

πŸ“‹ Tasks & Grader Logic

CivicAI features three difficulty-tiered tasks with distinct initial conditions and deterministic grading logic:

🟒 Easy: Economic Stability (stabilize_economy)

  • Scenario: A mild recession is underway.
  • Success Criteria: Inflation < 6%, Employment > 85%, maintain GDP without deficit spending.
  • Grader Score: Continuous reward based on deviation from targets.

🟑 Medium: Pandemic Management (manage_pandemic)

  • Scenario: A severe virus is sweeping the nation with a 20% infection rate.
  • Success Criteria: Infection rate < 10%, GDP > $300B.
  • Grader Score: Tradeoff scoringβ€”balances health capacity vs economic damage from lockdowns.

πŸ”΄ Hard: Social Crisis (control_crisis)

  • Scenario: Compound multi-domain crisisβ€”high unemployment (32%), high crime (25%), and deep wealth inequality.
  • Success Criteria: Crime < 12%, Inequality reduced, Employment > 80%.
  • Grader Penalty: Cascade failure triggered if social unrest breaches threshold.

πŸ“ˆ Training Results (Quantitative)

We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimization - PPO) directly in the CivicAI environment.

Key Results (Economic Stability Task):

  • Baseline reward: 0.42
  • Trained agent reward: 0.68
  • Improvement: +0.26 (+61%)

πŸ‘‰ This demonstrates measurable learning, not random behavior.

Reward Curve

Training Reward Curve

Screenshot 2026-04-26 163716

The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.

Baseline vs. Trained Comparison

Comparison Chart

Screenshot 2026-04-26 164009

The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.


πŸ§ͺ Reproducibility

You can reproduce results in under 5 minutes:

  1. Open the Colab notebook
  2. Enable GPU
  3. Run all cells
  4. Observe reward improvement
  • The training script uses standard TRL PPO.
  • The environment is not static β€” the agent interacts live.
  • Plots are generated and saved automatically to /assets.

πŸ“– Complete Guide: How It Works (Step-by-Step)

  1. Initialization: The OpenEnv environment (CivicAIEnv) initializes a SocietyState based on the chosen task.
  2. Observation: The agent receives the current state of the nation. In the dashboard, you see this visually. In training, the LLM receives this as a text prompt.
  3. Action / Debate:
    • In Training: The LLM policy outputs a JSON action.
    • In Dashboard: A multi-agent orchestrator facilitates a debate among specialized agents (Economic, Health, Citizen, Ethics) before proposing an optimal consensus action.
  4. Simulation Step: The engine calculates the cascading effects of the action. E.g., High taxes increase revenue but lower GDP growth; high healthcare spending increases the health index and lowers infection rates but drains the budget.
  5. Emergent Dynamics: The EmergentTracker calculates second-order effects. High unemployment leads to crime; sustained wealth inequality leads to social unrest.
  6. Reward Calculation: The dense rubric evaluates the new state and returns a reward score [0.0, 1.0], alongside explicit penalties for bad governance.
  7. Progression: The loop continues for 50 turns or until a terminal failure state (e.g., mass unemployment, societal collapse) is reached.

🎭 Storytelling: What the Agent Learned

Initially, the agent exploited short-term gainsβ€”cutting taxes and overspending to inflate satisfaction.

This strategy collapsed under delayed consequences: GDP contraction, rising crime, and systemic instability.

Through PPO training, the agent learned policy discipline:

  • Maintain sustainable taxation
  • Allocate budgets efficiently
  • Avoid extreme oscillations

πŸ‘‰ The agent did not just optimize rewardsβ€”it learned stable governance strategies under uncertainty.


🌍 Why This Matters

CivicAI demonstrates that:

  • AI can learn policy trade-offs, not just predictions.
  • Reward design can enforce ethical and stable behavior.
  • Simulation environments can act as safe testing grounds for governance.

πŸ‘‰ This opens pathways for:

  • Policy simulation tools
  • Economic modeling
  • Crisis response planning

πŸ”— Links & Resources