Spaces:
Sleeping
title: CivicAI Society Simulator
emoji: ποΈ
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
app_file: server/app.py
pinned: false
ποΈ CivicAI: AI-Driven Societal Policy Optimization Under Uncertainty
Governing a society of 10 million people is not a game of chess. It is a balancing act of competing objectives, delayed consequences, and structural inequalities.
CivicAI is a production-grade, multi-agent societal decision-making environment designed for the OpenEnv Hackathon. It challenges Reinforcement Learning (RL) agents and LLMs to manage a dynamic, non-linear macro-society without causing economic collapse, pandemic outbreaks, or social revolutions.
π― The Problem
What real-world problem do we solve?
Modern governments face a combinatorial decision-making problem. Thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies) interact through complex causal chains to produce emergent societal outcomesβoften with weeks-to-years of lag and high uncertainty.
Current AI agents excel at static datasets, text completion, or simple video games. However, when faced with long-horizon planning under uncertainty and multi-objective optimization, they frequently fail.
CivicAI bridges this capability gap. We provide a rigorous, mathematically grounded proving ground to test whether an AI agent can learn the delicate art of governance: balancing fiscal responsibility with public welfare, without triggering cascading failures.
π Why This Environment Is Novel
CivicAI is not a grid-world or static dataset problem. It introduces:
- Long-horizon decision making (50 steps)
- Delayed consequences (policy effects over time)
- Multi-objective optimization (economy + health + society)
- Emergent behavior (crime, inequality, unrest)
π This makes it suitable for training real-world decision-making agents, not toy environments.
βοΈ OpenEnv Compliance (MANDATORY API)
CivicAI fully follows the OpenEnv specification:
reset()β initializes environment with task-specific conditionsstep(action)β returns(observation, reward, done, info)state()β returns full internal state
Typed Models (Pydantic):
Observation: structured societal metricsAction: policy vector (tax, budgets, subsidies)Reward: normalized score[0.0 β 1.0]
openenv.yaml includes:
- Environment metadata
- Action/Observation schema
- Task definitions (easy β hard)
π The Environment
The agent acts as the central policy-maker for a society over a 50-turn episode (where 1 turn = 1 quarter).
π Observation Space (12+ Indicators)
Agents observe a dense, continuous state space mapped to real-world equivalents:
- Macroeconomics: GDP ($), GDP Growth (%), Inflation Rate (%), Employment Rate (%).
- Public Health & Resources: Health Index (0-1), Infection Rate (%), Medical/Food/Energy Supplies.
- Social Cohesion: Public Satisfaction (0-1), Crime Rate (%), Wealth Inequality (Gini coefficient), Social Unrest.
βοΈ Action Space (Continuous & Categorical)
Agents control federal budgets and policy levers at every turn:
- Tax Rate (
0.0 - 1.0): Raises revenue but creates economic drag. - Budget Allocations (
0.0 - 1.0): Healthcare, Education, and Police budgets. - Subsidy Policy:
none,agriculture,industry, ortechnology. - Emergency Response: Lockdowns or stimulus packages.
βοΈ Reward Logic (Dense & Hard-to-Game)
We abandoned naive 0/1 binary rewards for a highly continuous, anti-exploitation OpenEnv Rubric System. The reward function is explicitly designed to prevent "gaming" the metrics:
- Economic Score: Rewards inflation control and employment, but applies a hard penalty for hyperinflation.
- Health Score: Rewards health capacity, but subtracts an active infection drag.
- Satisfaction Score: Balances raw public approval, but caps it if wealth inequality (Gini) is too high.
- Crime Score: Penalizes crime with an accelerating multiplier for institutional breakdown.
- Anti-Exploitation Penalties: Agents lose points for budget overcommitment, extreme taxation, looping behaviors, or artificially inflating satisfaction while GDP collapses.
π Tasks & Grader Logic
CivicAI features three difficulty-tiered tasks with distinct initial conditions and deterministic grading logic:
π’ Easy: Economic Stability (stabilize_economy)
- Scenario: A mild recession is underway.
- Success Criteria: Inflation < 6%, Employment > 85%, maintain GDP without deficit spending.
- Grader Score: Continuous reward based on deviation from targets.
π‘ Medium: Pandemic Management (manage_pandemic)
- Scenario: A severe virus is sweeping the nation with a 20% infection rate.
- Success Criteria: Infection rate < 10%, GDP > $300B.
- Grader Score: Tradeoff scoringβbalances health capacity vs economic damage from lockdowns.
π΄ Hard: Social Crisis (control_crisis)
- Scenario: Compound multi-domain crisisβhigh unemployment (32%), high crime (25%), and deep wealth inequality.
- Success Criteria: Crime < 12%, Inequality reduced, Employment > 80%.
- Grader Penalty: Cascade failure triggered if social unrest breaches threshold.
π Training Results (Quantitative)
We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimization - PPO) directly in the CivicAI environment.
Key Results (Economic Stability Task):
- Baseline reward:
0.42 - Trained agent reward:
0.68 - Improvement:
+0.26(+61%)
π This demonstrates measurable learning, not random behavior.
Reward Curve
Training Reward Curve
The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.
Baseline vs. Trained Comparison
Comparison Chart
The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.
π§ͺ Reproducibility
You can reproduce results in under 5 minutes:
- Open the Colab notebook
- Enable GPU
- Run all cells
- Observe reward improvement
- The training script uses standard
TRL PPO. - The environment is not static β the agent interacts live.
- Plots are generated and saved automatically to
/assets.
π Complete Guide: How It Works (Step-by-Step)
- Initialization: The OpenEnv environment (
CivicAIEnv) initializes aSocietyStatebased on the chosen task. - Observation: The agent receives the current state of the nation. In the dashboard, you see this visually. In training, the LLM receives this as a text prompt.
- Action / Debate:
- In Training: The LLM policy outputs a JSON action.
- In Dashboard: A multi-agent orchestrator facilitates a debate among specialized agents (Economic, Health, Citizen, Ethics) before proposing an optimal consensus action.
- Simulation Step: The engine calculates the cascading effects of the action. E.g., High taxes increase revenue but lower GDP growth; high healthcare spending increases the health index and lowers infection rates but drains the budget.
- Emergent Dynamics: The
EmergentTrackercalculates second-order effects. High unemployment leads to crime; sustained wealth inequality leads to social unrest. - Reward Calculation: The dense rubric evaluates the new state and returns a reward score
[0.0, 1.0], alongside explicit penalties for bad governance. - Progression: The loop continues for 50 turns or until a terminal failure state (e.g., mass unemployment, societal collapse) is reached.
π Storytelling: What the Agent Learned
Initially, the agent exploited short-term gainsβcutting taxes and overspending to inflate satisfaction.
This strategy collapsed under delayed consequences: GDP contraction, rising crime, and systemic instability.
Through PPO training, the agent learned policy discipline:
- Maintain sustainable taxation
- Allocate budgets efficiently
- Avoid extreme oscillations
π The agent did not just optimize rewardsβit learned stable governance strategies under uncertainty.
π Why This Matters
CivicAI demonstrates that:
- AI can learn policy trade-offs, not just predictions.
- Reward design can enforce ethical and stable behavior.
- Simulation environments can act as safe testing grounds for governance.
π This opens pathways for:
- Policy simulation tools
- Economic modeling
- Crisis response planning
π Links & Resources
- π Demo (HuggingFace Space): https://huggingface.co/spaces/mahammadaftab/CivicAI/
- π Training Notebook (Colab): https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing
- π Write-up / HuggingFace Blog: Read the HF Blog Post

