File size: 3,654 Bytes
315caa2
 
 
 
7415e01
315caa2
 
7415e01
 
 
315caa2
7415e01
315caa2
 
 
7415e01
315caa2
7415e01
315caa2
7415e01
 
 
 
 
315caa2
7415e01
 
 
315caa2
7415e01
 
315caa2
 
 
7415e01
315caa2
7415e01
315caa2
 
 
7415e01
315caa2
7415e01
 
 
315caa2
 
 
7415e01
 
315caa2
7415e01
315caa2
 
 
7415e01
315caa2
7415e01
 
 
 
315caa2
 
 
 
 
35fabb8
315caa2
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# πŸ› CivicAI: Teaching AI to Govern a Society

> *Can AI learn to balance economics, healthcare, crime, and public satisfaction β€” all at once?*

## 🎯 1. The Problem: The Capability Gap

The world doesn't need another polished, simple game environment. Real-world governance is **messy, ambitious, and impossibly hard**. Every policy decision has:
- **Competing objectives** (e.g., raising taxes funds healthcare but hurts economic growth).
- **Delayed consequences** (e.g., education spending takes years to show results).
- **Cascading failures** (e.g., unemployment β†’ crime β†’ protests β†’ satisfaction collapse).

We built **CivicAI** β€” an OpenEnv-compliant simulation where AI agents learn to govern a society of 10 million people. It is designed to test if LLMs and RL agents can actually handle **long-horizon planning under uncertainty** and **multi-objective optimization**, a capability gap in modern token-predicting LLMs.

---

## 🌍 2. The Environment: What Agents See and Do

CivicAI simulates a complete society with interconnected metrics over a 50-turn episode:

**What the agent sees:**
- πŸ’° **GDP & Inflation** (Economic output and price stability)
- πŸ’Ό **Employment** (Job market health)
- 😊 **Satisfaction & Health** (Public approval and healthcare quality)
- 🚨 **Crime & Events** (Security and random events like pandemics)

**What the agent does:**
- Sets tax rates, healthcare budgets, education budgets, and police budgets.
- Dictates active sector subsidies and emergency responses (like lockdowns).

**How it gets rewarded (Hard-to-Game Rubrics):**
The agent is judged by a strict set of OpenEnv Rubrics that prevent gaming. For example, the agent gets heavily penalized for achieving a high GDP if it causes hyperinflation, or for keeping satisfaction high while letting wealth inequality spiral out of control.

---

## πŸ“Š 3. The Results: What Changed After Training?

We didn't just build the environment; we proved it is learnable. We trained a PyTorch REINFORCE policy against the CivicAI environment.

### Agent vs Random Performance

| Task | Random Baseline | RL Agent | Improvement |
|------|-------------|-------------|-------------|
| Economic Stability | 0.6360 | 0.7725 (Peak) | **+0.1365 (Peak)** |
| Pandemic Management | 0.5494 | 0.5768 | **+0.0274** |
| Social Crisis | 0.4649 | 0.4881 | **+0.0232** |

### Training Reward Curve

![Training Reward Curve and Agent Comparison](assets/reward_curve.png)
*Left: The agent's reward curve over 150 epochs on the Economic Stability task, showing clear policy improvement before stabilization/degradation. Right: Agent vs random baseline performance across the three difficulty tiers.*

The curve clearly demonstrates that the agent learns to stabilize its policy choices over time, avoiding extreme tax rates and balancing the budget allocations.

---

## πŸ’‘ 4. Why Does It Matter?

This matters because it pushes agents out of the "chatbot" paradigm. We are building the proving ground for **AI system managers**. 
- **Safety researchers** can test how agents handle complex moral tradeoffs.
- **RL researchers** get a non-linear, delayed-reward benchmark that is far more realistic than block-world games.
- **Policy makers** get a primitive simulation to understand how AI might approach governance.

---

## πŸš€ Try It Yourself

- πŸ–₯ **[Live Dashboard](https://huggingface.co/spaces/mahammadaftab/CivicAI)** β€” Interactive society simulation
- πŸ’» **GitHub Code** β€” Full source code

---

## 🏷 Tags

`openenv` Β· `multi-agent` Β· `society-simulation` Β· `reinforcement-learning`

---

*Built for the OpenEnv Competition πŸ†*