Spaces:

mahammadaftab
/

CivicAI

Sleeping

App Files Files Community

CivicAI / README.md

mahammadaftab

Update README.md

e97e92a verified 11 days ago

preview code

raw

history blame contribute delete

10 kB

metadata

title: CivicAI Society Simulator
emoji: 🏛️
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
app_file: server/app.py
pinned: false

🏛️ CivicAI: AI-Driven Societal Policy Optimization Under Uncertainty

Governing a society of 10 million people is not a game of chess. It is a balancing act of competing objectives, delayed consequences, and structural inequalities.

CivicAI is a production-grade, multi-agent societal decision-making environment designed for the OpenEnv Hackathon. It challenges Reinforcement Learning (RL) agents and LLMs to manage a dynamic, non-linear macro-society without causing economic collapse, pandemic outbreaks, or social revolutions.

🎯 The Problem

What real-world problem do we solve?

Modern governments face a combinatorial decision-making problem. Thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies) interact through complex causal chains to produce emergent societal outcomes—often with weeks-to-years of lag and high uncertainty.

Current AI agents excel at static datasets, text completion, or simple video games. However, when faced with long-horizon planning under uncertainty and multi-objective optimization, they frequently fail.

CivicAI bridges this capability gap. We provide a rigorous, mathematically grounded proving ground to test whether an AI agent can learn the delicate art of governance: balancing fiscal responsibility with public welfare, without triggering cascading failures.

🚀 Why This Environment Is Novel

CivicAI is not a grid-world or static dataset problem. It introduces:

Long-horizon decision making (50 steps)
Delayed consequences (policy effects over time)
Multi-objective optimization (economy + health + society)
Emergent behavior (crime, inequality, unrest)

👉 This makes it suitable for training real-world decision-making agents, not toy environments.

⚙️ OpenEnv Compliance (MANDATORY API)

CivicAI fully follows the OpenEnv specification:

reset() → initializes environment with task-specific conditions
step(action) → returns (observation, reward, done, info)
state() → returns full internal state

Typed Models (Pydantic):

Observation: structured societal metrics
Action: policy vector (tax, budgets, subsidies)
Reward: normalized score [0.0 – 1.0]

openenv.yaml includes:

Environment metadata
Action/Observation schema
Task definitions (easy → hard)

🌍 The Environment

The agent acts as the central policy-maker for a society over a 50-turn episode (where 1 turn = 1 quarter).

🔍 Observation Space (12+ Indicators)

Agents observe a dense, continuous state space mapped to real-world equivalents:

Macroeconomics: GDP ($), GDP Growth (%), Inflation Rate (%), Employment Rate (%).
Public Health & Resources: Health Index (0-1), Infection Rate (%), Medical/Food/Energy Supplies.
Social Cohesion: Public Satisfaction (0-1), Crime Rate (%), Wealth Inequality (Gini coefficient), Social Unrest.

⚙️ Action Space (Continuous & Categorical)

Agents control federal budgets and policy levers at every turn:

Tax Rate (0.0 - 1.0): Raises revenue but creates economic drag.
Budget Allocations (0.0 - 1.0): Healthcare, Education, and Police budgets.
Subsidy Policy: none, agriculture, industry, or technology.
Emergency Response: Lockdowns or stimulus packages.

⚖️ Reward Logic (Dense & Hard-to-Game)

We abandoned naive 0/1 binary rewards for a highly continuous, anti-exploitation OpenEnv Rubric System. The reward function is explicitly designed to prevent "gaming" the metrics:

Economic Score: Rewards inflation control and employment, but applies a hard penalty for hyperinflation.
Health Score: Rewards health capacity, but subtracts an active infection drag.
Satisfaction Score: Balances raw public approval, but caps it if wealth inequality (Gini) is too high.
Crime Score: Penalizes crime with an accelerating multiplier for institutional breakdown.
Anti-Exploitation Penalties: Agents lose points for budget overcommitment, extreme taxation, looping behaviors, or artificially inflating satisfaction while GDP collapses.

📋 Tasks & Grader Logic

CivicAI features three difficulty-tiered tasks with distinct initial conditions and deterministic grading logic:

🟢 Easy: Economic Stability (stabilize_economy)

Scenario: A mild recession is underway.
Success Criteria: Inflation < 6%, Employment > 85%, maintain GDP without deficit spending.
Grader Score: Continuous reward based on deviation from targets.

🟡 Medium: Pandemic Management (manage_pandemic)

Scenario: A severe virus is sweeping the nation with a 20% infection rate.
Success Criteria: Infection rate < 10%, GDP > $300B.
Grader Score: Tradeoff scoring—balances health capacity vs economic damage from lockdowns.

🔴 Hard: Social Crisis (control_crisis)

Scenario: Compound multi-domain crisis—high unemployment (32%), high crime (25%), and deep wealth inequality.
Success Criteria: Crime < 12%, Inequality reduced, Employment > 80%.
Grader Penalty: Cascade failure triggered if social unrest breaches threshold.

📈 Training Results (Quantitative)

We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimization - PPO) directly in the CivicAI environment.

Key Results (Economic Stability Task):

Baseline reward: 0.42
Trained agent reward: 0.68
Improvement: +0.26 (+61%)

👉 This demonstrates measurable learning, not random behavior.

Reward Curve

Training Reward Curve

The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.

Baseline vs. Trained Comparison

Comparison Chart

The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.

🧪 Reproducibility

You can reproduce results in under 5 minutes:

Open the Colab notebook
Enable GPU
Run all cells
Observe reward improvement

The training script uses standard TRL PPO.
The environment is not static — the agent interacts live.
Plots are generated and saved automatically to /assets.

📖 Complete Guide: How It Works (Step-by-Step)

Initialization: The OpenEnv environment (CivicAIEnv) initializes a SocietyState based on the chosen task.
Observation: The agent receives the current state of the nation. In the dashboard, you see this visually. In training, the LLM receives this as a text prompt.
Action / Debate:
- In Training: The LLM policy outputs a JSON action.
- In Dashboard: A multi-agent orchestrator facilitates a debate among specialized agents (Economic, Health, Citizen, Ethics) before proposing an optimal consensus action.
Simulation Step: The engine calculates the cascading effects of the action. E.g., High taxes increase revenue but lower GDP growth; high healthcare spending increases the health index and lowers infection rates but drains the budget.
Emergent Dynamics: The EmergentTracker calculates second-order effects. High unemployment leads to crime; sustained wealth inequality leads to social unrest.
Reward Calculation: The dense rubric evaluates the new state and returns a reward score [0.0, 1.0], alongside explicit penalties for bad governance.
Progression: The loop continues for 50 turns or until a terminal failure state (e.g., mass unemployment, societal collapse) is reached.

🎭 Storytelling: What the Agent Learned

Initially, the agent exploited short-term gains—cutting taxes and overspending to inflate satisfaction.

This strategy collapsed under delayed consequences: GDP contraction, rising crime, and systemic instability.

Through PPO training, the agent learned policy discipline:

Maintain sustainable taxation
Allocate budgets efficiently
Avoid extreme oscillations

👉 The agent did not just optimize rewards—it learned stable governance strategies under uncertainty.

🌍 Why This Matters

CivicAI demonstrates that:

AI can learn policy trade-offs, not just predictions.
Reward design can enforce ethical and stable behavior.
Simulation environments can act as safe testing grounds for governance.

👉 This opens pathways for:

Policy simulation tools
Economic modeling
Crisis response planning

🔗 Links & Resources

🚀 Demo (HuggingFace Space): https://huggingface.co/spaces/mahammadaftab/CivicAI/
📓 Training Notebook (Colab): https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing
📝 Write-up / HuggingFace Blog: Read the HF Blog Post