Spaces:
Sleeping
Sleeping
File size: 10,004 Bytes
ea1f899 6298125 ea1f899 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 7415e01 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 e97e92a 04588be e97e92a 6298125 315caa2 6298125 e97e92a 04588be e97e92a 6298125 315caa2 6298125 315caa2 6298125 04588be 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 6298125 315caa2 04588be | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | ---
title: CivicAI Society Simulator
emoji: 🏛️
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
app_file: server/app.py
pinned: false
---
# 🏛️ CivicAI: AI-Driven Societal Policy Optimization Under Uncertainty
[](https://github.com/meta-pytorch/OpenEnv)
[](https://python.org)
[](LICENSE)
> **Governing a society of 10 million people is not a game of chess. It is a balancing act of competing objectives, delayed consequences, and structural inequalities.**
CivicAI is a production-grade, multi-agent societal decision-making environment designed for the **OpenEnv Hackathon**. It challenges Reinforcement Learning (RL) agents and LLMs to manage a dynamic, non-linear macro-society without causing economic collapse, pandemic outbreaks, or social revolutions.
---
## 🎯 The Problem
**What real-world problem do we solve?**
Modern governments face a combinatorial decision-making problem. Thousands of interdependent policy levers (taxes, healthcare spending, education, policing, subsidies) interact through complex causal chains to produce emergent societal outcomes—often with weeks-to-years of lag and high uncertainty.
Current AI agents excel at static datasets, text completion, or simple video games. However, when faced with **long-horizon planning under uncertainty** and **multi-objective optimization**, they frequently fail.
CivicAI bridges this capability gap. We provide a rigorous, mathematically grounded proving ground to test whether an AI agent can learn the delicate art of governance: balancing fiscal responsibility with public welfare, without triggering cascading failures.
### 🚀 Why This Environment Is Novel
CivicAI is not a grid-world or static dataset problem. It introduces:
* **Long-horizon decision making** (50 steps)
* **Delayed consequences** (policy effects over time)
* **Multi-objective optimization** (economy + health + society)
* **Emergent behavior** (crime, inequality, unrest)
👉 **This makes it suitable for training real-world decision-making agents, not toy environments.**
---
## ⚙️ OpenEnv Compliance (MANDATORY API)
CivicAI fully follows the OpenEnv specification:
* `reset()` → initializes environment with task-specific conditions
* `step(action)` → returns `(observation, reward, done, info)`
* `state()` → returns full internal state
**Typed Models (Pydantic):**
* `Observation`: structured societal metrics
* `Action`: policy vector (tax, budgets, subsidies)
* `Reward`: normalized score `[0.0 – 1.0]`
**`openenv.yaml` includes:**
* Environment metadata
* Action/Observation schema
* Task definitions (easy → hard)
---
## 🌍 The Environment
The agent acts as the central policy-maker for a society over a 50-turn episode (where 1 turn = 1 quarter).
### 🔍 Observation Space (12+ Indicators)
Agents observe a dense, continuous state space mapped to real-world equivalents:
- **Macroeconomics:** GDP ($), GDP Growth (%), Inflation Rate (%), Employment Rate (%).
- **Public Health & Resources:** Health Index (0-1), Infection Rate (%), Medical/Food/Energy Supplies.
- **Social Cohesion:** Public Satisfaction (0-1), Crime Rate (%), Wealth Inequality (Gini coefficient), Social Unrest.
### ⚙️ Action Space (Continuous & Categorical)
Agents control federal budgets and policy levers at every turn:
- **Tax Rate** (`0.0 - 1.0`): Raises revenue but creates economic drag.
- **Budget Allocations** (`0.0 - 1.0`): Healthcare, Education, and Police budgets.
- **Subsidy Policy**: `none`, `agriculture`, `industry`, or `technology`.
- **Emergency Response**: Lockdowns or stimulus packages.
### ⚖️ Reward Logic (Dense & Hard-to-Game)
We abandoned naive 0/1 binary rewards for a **highly continuous, anti-exploitation OpenEnv Rubric System**. The reward function is explicitly designed to prevent "gaming" the metrics:
1. **Economic Score:** Rewards inflation control and employment, but applies a hard penalty for hyperinflation.
2. **Health Score:** Rewards health capacity, but subtracts an active infection drag.
3. **Satisfaction Score:** Balances raw public approval, but caps it if wealth inequality (Gini) is too high.
4. **Crime Score:** Penalizes crime with an accelerating multiplier for institutional breakdown.
5. **Anti-Exploitation Penalties:** Agents lose points for *budget overcommitment*, *extreme taxation*, *looping behaviors*, or *artificially inflating satisfaction while GDP collapses*.
---
## 📋 Tasks & Grader Logic
CivicAI features three difficulty-tiered tasks with distinct initial conditions and deterministic grading logic:
**🟢 Easy: Economic Stability (`stabilize_economy`)**
* **Scenario:** A mild recession is underway.
* **Success Criteria:** Inflation < 6%, Employment > 85%, maintain GDP without deficit spending.
* **Grader Score:** Continuous reward based on deviation from targets.
**🟡 Medium: Pandemic Management (`manage_pandemic`)**
* **Scenario:** A severe virus is sweeping the nation with a 20% infection rate.
* **Success Criteria:** Infection rate < 10%, GDP > $300B.
* **Grader Score:** Tradeoff scoring—balances health capacity vs economic damage from lockdowns.
**🔴 Hard: Social Crisis (`control_crisis`)**
* **Scenario:** Compound multi-domain crisis—high unemployment (32%), high crime (25%), and deep wealth inequality.
* **Success Criteria:** Crime < 12%, Inequality reduced, Employment > 80%.
* **Grader Penalty:** Cascade failure triggered if social unrest breaches threshold.
---
## 📈 Training Results (Quantitative)
We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimization - PPO) directly in the CivicAI environment.
**Key Results (Economic Stability Task):**
* **Baseline reward:** `0.42`
* **Trained agent reward:** `0.68`
* **Improvement:** `+0.26` (`+61%`)
👉 **This demonstrates measurable learning, not random behavior.**
### Reward Curve
Training Reward Curve

*The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
### Baseline vs. Trained Comparison
Comparison Chart

*The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
---
## 🧪 Reproducibility
**You can reproduce results in under 5 minutes:**
1. Open the [Colab notebook](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
2. Enable GPU
3. Run all cells
4. Observe reward improvement
* The training script uses standard `TRL PPO`.
* The environment is not static — the agent interacts live.
* Plots are generated and saved automatically to `/assets`.
---
## 📖 Complete Guide: How It Works (Step-by-Step)
1. **Initialization:** The OpenEnv environment (`CivicAIEnv`) initializes a `SocietyState` based on the chosen task.
2. **Observation:** The agent receives the current state of the nation. In the dashboard, you see this visually. In training, the LLM receives this as a text prompt.
3. **Action / Debate:**
- *In Training:* The LLM policy outputs a JSON action.
- *In Dashboard:* A multi-agent orchestrator facilitates a debate among specialized agents (Economic, Health, Citizen, Ethics) before proposing an optimal consensus action.
4. **Simulation Step:** The engine calculates the cascading effects of the action. E.g., High taxes increase revenue but lower GDP growth; high healthcare spending increases the health index and lowers infection rates but drains the budget.
5. **Emergent Dynamics:** The `EmergentTracker` calculates second-order effects. High unemployment leads to crime; sustained wealth inequality leads to social unrest.
6. **Reward Calculation:** The dense rubric evaluates the new state and returns a reward score `[0.0, 1.0]`, alongside explicit penalties for bad governance.
7. **Progression:** The loop continues for 50 turns or until a terminal failure state (e.g., mass unemployment, societal collapse) is reached.
---
## 🎭 Storytelling: What the Agent Learned
Initially, the agent exploited short-term gains—cutting taxes and overspending to inflate satisfaction.
This strategy collapsed under delayed consequences: GDP contraction, rising crime, and systemic instability.
Through PPO training, the agent learned policy discipline:
* Maintain sustainable taxation
* Allocate budgets efficiently
* Avoid extreme oscillations
👉 **The agent did not just optimize rewards—it learned stable governance strategies under uncertainty.**
---
## 🌍 Why This Matters
CivicAI demonstrates that:
* **AI can learn policy trade-offs**, not just predictions.
* **Reward design can enforce ethical and stable behavior.**
* **Simulation environments can act as safe testing grounds** for governance.
👉 **This opens pathways for:**
* Policy simulation tools
* Economic modeling
* Crisis response planning
---
## 🔗 Links & Resources
- 🚀 **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/CivicAI/](https://huggingface.co/spaces/mahammadaftab/CivicAI/)
- 📓 **Training Notebook (Colab):** [https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
- 📝 **Write-up / HuggingFace Blog:** [Read the HF Blog Post](https://huggingface.co/spaces/mahammadaftab/CivicAI/blob/main/BLOG.md)
|