CivicAI / BLOG.md
mahammadaftab's picture
Update BLOG.md
35fabb8 verified
# πŸ› CivicAI: Teaching AI to Govern a Society
> *Can AI learn to balance economics, healthcare, crime, and public satisfaction β€” all at once?*
## 🎯 1. The Problem: The Capability Gap
The world doesn't need another polished, simple game environment. Real-world governance is **messy, ambitious, and impossibly hard**. Every policy decision has:
- **Competing objectives** (e.g., raising taxes funds healthcare but hurts economic growth).
- **Delayed consequences** (e.g., education spending takes years to show results).
- **Cascading failures** (e.g., unemployment β†’ crime β†’ protests β†’ satisfaction collapse).
We built **CivicAI** β€” an OpenEnv-compliant simulation where AI agents learn to govern a society of 10 million people. It is designed to test if LLMs and RL agents can actually handle **long-horizon planning under uncertainty** and **multi-objective optimization**, a capability gap in modern token-predicting LLMs.
---
## 🌍 2. The Environment: What Agents See and Do
CivicAI simulates a complete society with interconnected metrics over a 50-turn episode:
**What the agent sees:**
- πŸ’° **GDP & Inflation** (Economic output and price stability)
- πŸ’Ό **Employment** (Job market health)
- 😊 **Satisfaction & Health** (Public approval and healthcare quality)
- 🚨 **Crime & Events** (Security and random events like pandemics)
**What the agent does:**
- Sets tax rates, healthcare budgets, education budgets, and police budgets.
- Dictates active sector subsidies and emergency responses (like lockdowns).
**How it gets rewarded (Hard-to-Game Rubrics):**
The agent is judged by a strict set of OpenEnv Rubrics that prevent gaming. For example, the agent gets heavily penalized for achieving a high GDP if it causes hyperinflation, or for keeping satisfaction high while letting wealth inequality spiral out of control.
---
## πŸ“Š 3. The Results: What Changed After Training?
We didn't just build the environment; we proved it is learnable. We trained a PyTorch REINFORCE policy against the CivicAI environment.
### Agent vs Random Performance
| Task | Random Baseline | RL Agent | Improvement |
|------|-------------|-------------|-------------|
| Economic Stability | 0.6360 | 0.7725 (Peak) | **+0.1365 (Peak)** |
| Pandemic Management | 0.5494 | 0.5768 | **+0.0274** |
| Social Crisis | 0.4649 | 0.4881 | **+0.0232** |
### Training Reward Curve
![Training Reward Curve and Agent Comparison](assets/reward_curve.png)
*Left: The agent's reward curve over 150 epochs on the Economic Stability task, showing clear policy improvement before stabilization/degradation. Right: Agent vs random baseline performance across the three difficulty tiers.*
The curve clearly demonstrates that the agent learns to stabilize its policy choices over time, avoiding extreme tax rates and balancing the budget allocations.
---
## πŸ’‘ 4. Why Does It Matter?
This matters because it pushes agents out of the "chatbot" paradigm. We are building the proving ground for **AI system managers**.
- **Safety researchers** can test how agents handle complex moral tradeoffs.
- **RL researchers** get a non-linear, delayed-reward benchmark that is far more realistic than block-world games.
- **Policy makers** get a primitive simulation to understand how AI might approach governance.
---
## πŸš€ Try It Yourself
- πŸ–₯ **[Live Dashboard](https://huggingface.co/spaces/mahammadaftab/CivicAI)** β€” Interactive society simulation
- πŸ’» **GitHub Code** β€” Full source code
---
## 🏷 Tags
`openenv` Β· `multi-agent` Β· `society-simulation` Β· `reinforcement-learning`
---
*Built for the OpenEnv Competition πŸ†*