# 🏛 CivicAI: Teaching AI to Govern a Society

> *Can AI learn to balance economics, healthcare, crime, and public satisfaction — all at once?*

## 🎯 1. The Problem: The Capability Gap

The world doesn't need another polished, simple game environment. Real-world governance is **messy, ambitious, and impossibly hard**. Every policy decision has:
- **Competing objectives** (e.g., raising taxes funds healthcare but hurts economic growth).
- **Delayed consequences** (e.g., education spending takes years to show results).
- **Cascading failures** (e.g., unemployment → crime → protests → satisfaction collapse).

We built **CivicAI** — an OpenEnv-compliant simulation where AI agents learn to govern a society of 10 million people. It is designed to test if LLMs and RL agents can actually handle **long-horizon planning under uncertainty** and **multi-objective optimization**, a capability gap in modern token-predicting LLMs.

---

## 🌍 2. The Environment: What Agents See and Do

CivicAI simulates a complete society with interconnected metrics over a 50-turn episode:

**What the agent sees:**
- 💰 **GDP & Inflation** (Economic output and price stability)
- 💼 **Employment** (Job market health)
- 😊 **Satisfaction & Health** (Public approval and healthcare quality)
- 🚨 **Crime & Events** (Security and random events like pandemics)

**What the agent does:**
- Sets tax rates, healthcare budgets, education budgets, and police budgets.
- Dictates active sector subsidies and emergency responses (like lockdowns).

**How it gets rewarded (Hard-to-Game Rubrics):**
The agent is judged by a strict set of OpenEnv Rubrics that prevent gaming. For example, the agent gets heavily penalized for achieving a high GDP if it causes hyperinflation, or for keeping satisfaction high while letting wealth inequality spiral out of control.

---

## 📊 3. The Results: What Changed After Training?

We didn't just build the environment; we proved it is learnable. We trained a PyTorch REINFORCE policy against the CivicAI environment.

### Agent vs Random Performance

| Task | Random Baseline | RL Agent | Improvement |
|------|-------------|-------------|-------------|
| Economic Stability | 0.6360 | 0.7725 (Peak) | **+0.1365 (Peak)** |
| Pandemic Management | 0.5494 | 0.5768 | **+0.0274** |
| Social Crisis | 0.4649 | 0.4881 | **+0.0232** |

### Training Reward Curve

![Training Reward Curve and Agent Comparison](assets/reward_curve.png)
*Left: The agent's reward curve over 150 epochs on the Economic Stability task, showing clear policy improvement before stabilization/degradation. Right: Agent vs random baseline performance across the three difficulty tiers.*

The curve clearly demonstrates that the agent learns to stabilize its policy choices over time, avoiding extreme tax rates and balancing the budget allocations.

---

## 💡 4. Why Does It Matter?

This matters because it pushes agents out of the "chatbot" paradigm. We are building the proving ground for **AI system managers**. 
- **Safety researchers** can test how agents handle complex moral tradeoffs.
- **RL researchers** get a non-linear, delayed-reward benchmark that is far more realistic than block-world games.
- **Policy makers** get a primitive simulation to understand how AI might approach governance.

---

## 🚀 Try It Yourself

- 🖥 **[Live Dashboard](https://huggingface.co/spaces/mahammadaftab/CivicAI)** — Interactive society simulation
- 💻 **GitHub Code** — Full source code

---

## 🏷 Tags

`openenv` · `multi-agent` · `society-simulation` · `reinforcement-learning`

---

*Built for the OpenEnv Competition 🏆*