Spaces:
Sleeping
Sleeping
| # π CivicAI: Teaching AI to Govern a Society | |
| > *Can AI learn to balance economics, healthcare, crime, and public satisfaction β all at once?* | |
| ## π― 1. The Problem: The Capability Gap | |
| The world doesn't need another polished, simple game environment. Real-world governance is **messy, ambitious, and impossibly hard**. Every policy decision has: | |
| - **Competing objectives** (e.g., raising taxes funds healthcare but hurts economic growth). | |
| - **Delayed consequences** (e.g., education spending takes years to show results). | |
| - **Cascading failures** (e.g., unemployment β crime β protests β satisfaction collapse). | |
| We built **CivicAI** β an OpenEnv-compliant simulation where AI agents learn to govern a society of 10 million people. It is designed to test if LLMs and RL agents can actually handle **long-horizon planning under uncertainty** and **multi-objective optimization**, a capability gap in modern token-predicting LLMs. | |
| --- | |
| ## π 2. The Environment: What Agents See and Do | |
| CivicAI simulates a complete society with interconnected metrics over a 50-turn episode: | |
| **What the agent sees:** | |
| - π° **GDP & Inflation** (Economic output and price stability) | |
| - πΌ **Employment** (Job market health) | |
| - π **Satisfaction & Health** (Public approval and healthcare quality) | |
| - π¨ **Crime & Events** (Security and random events like pandemics) | |
| **What the agent does:** | |
| - Sets tax rates, healthcare budgets, education budgets, and police budgets. | |
| - Dictates active sector subsidies and emergency responses (like lockdowns). | |
| **How it gets rewarded (Hard-to-Game Rubrics):** | |
| The agent is judged by a strict set of OpenEnv Rubrics that prevent gaming. For example, the agent gets heavily penalized for achieving a high GDP if it causes hyperinflation, or for keeping satisfaction high while letting wealth inequality spiral out of control. | |
| --- | |
| ## π 3. The Results: What Changed After Training? | |
| We didn't just build the environment; we proved it is learnable. We trained a PyTorch REINFORCE policy against the CivicAI environment. | |
| ### Agent vs Random Performance | |
| | Task | Random Baseline | RL Agent | Improvement | | |
| |------|-------------|-------------|-------------| | |
| | Economic Stability | 0.6360 | 0.7725 (Peak) | **+0.1365 (Peak)** | | |
| | Pandemic Management | 0.5494 | 0.5768 | **+0.0274** | | |
| | Social Crisis | 0.4649 | 0.4881 | **+0.0232** | | |
| ### Training Reward Curve | |
|  | |
| *Left: The agent's reward curve over 150 epochs on the Economic Stability task, showing clear policy improvement before stabilization/degradation. Right: Agent vs random baseline performance across the three difficulty tiers.* | |
| The curve clearly demonstrates that the agent learns to stabilize its policy choices over time, avoiding extreme tax rates and balancing the budget allocations. | |
| --- | |
| ## π‘ 4. Why Does It Matter? | |
| This matters because it pushes agents out of the "chatbot" paradigm. We are building the proving ground for **AI system managers**. | |
| - **Safety researchers** can test how agents handle complex moral tradeoffs. | |
| - **RL researchers** get a non-linear, delayed-reward benchmark that is far more realistic than block-world games. | |
| - **Policy makers** get a primitive simulation to understand how AI might approach governance. | |
| --- | |
| ## π Try It Yourself | |
| - π₯ **[Live Dashboard](https://huggingface.co/spaces/mahammadaftab/CivicAI)** β Interactive society simulation | |
| - π» **GitHub Code** β Full source code | |
| --- | |
| ## π· Tags | |
| `openenv` Β· `multi-agent` Β· `society-simulation` Β· `reinforcement-learning` | |
| --- | |
| *Built for the OpenEnv Competition π* | |