# ๐Ÿ› CivicAI: Teaching AI to Govern a Society > *Can AI learn to balance economics, healthcare, crime, and public satisfaction โ€” all at once?* ## ๐ŸŽฏ 1. The Problem: The Capability Gap The world doesn't need another polished, simple game environment. Real-world governance is **messy, ambitious, and impossibly hard**. Every policy decision has: - **Competing objectives** (e.g., raising taxes funds healthcare but hurts economic growth). - **Delayed consequences** (e.g., education spending takes years to show results). - **Cascading failures** (e.g., unemployment โ†’ crime โ†’ protests โ†’ satisfaction collapse). We built **CivicAI** โ€” an OpenEnv-compliant simulation where AI agents learn to govern a society of 10 million people. It is designed to test if LLMs and RL agents can actually handle **long-horizon planning under uncertainty** and **multi-objective optimization**, a capability gap in modern token-predicting LLMs. --- ## ๐ŸŒ 2. The Environment: What Agents See and Do CivicAI simulates a complete society with interconnected metrics over a 50-turn episode: **What the agent sees:** - ๐Ÿ’ฐ **GDP & Inflation** (Economic output and price stability) - ๐Ÿ’ผ **Employment** (Job market health) - ๐Ÿ˜Š **Satisfaction & Health** (Public approval and healthcare quality) - ๐Ÿšจ **Crime & Events** (Security and random events like pandemics) **What the agent does:** - Sets tax rates, healthcare budgets, education budgets, and police budgets. - Dictates active sector subsidies and emergency responses (like lockdowns). **How it gets rewarded (Hard-to-Game Rubrics):** The agent is judged by a strict set of OpenEnv Rubrics that prevent gaming. For example, the agent gets heavily penalized for achieving a high GDP if it causes hyperinflation, or for keeping satisfaction high while letting wealth inequality spiral out of control. --- ## ๐Ÿ“Š 3. The Results: What Changed After Training? We didn't just build the environment; we proved it is learnable. We trained a PyTorch REINFORCE policy against the CivicAI environment. ### Agent vs Random Performance | Task | Random Baseline | RL Agent | Improvement | |------|-------------|-------------|-------------| | Economic Stability | 0.6360 | 0.7725 (Peak) | **+0.1365 (Peak)** | | Pandemic Management | 0.5494 | 0.5768 | **+0.0274** | | Social Crisis | 0.4649 | 0.4881 | **+0.0232** | ### Training Reward Curve ![Training Reward Curve and Agent Comparison](assets/reward_curve.png) *Left: The agent's reward curve over 150 epochs on the Economic Stability task, showing clear policy improvement before stabilization/degradation. Right: Agent vs random baseline performance across the three difficulty tiers.* The curve clearly demonstrates that the agent learns to stabilize its policy choices over time, avoiding extreme tax rates and balancing the budget allocations. --- ## ๐Ÿ’ก 4. Why Does It Matter? This matters because it pushes agents out of the "chatbot" paradigm. We are building the proving ground for **AI system managers**. - **Safety researchers** can test how agents handle complex moral tradeoffs. - **RL researchers** get a non-linear, delayed-reward benchmark that is far more realistic than block-world games. - **Policy makers** get a primitive simulation to understand how AI might approach governance. --- ## ๐Ÿš€ Try It Yourself - ๐Ÿ–ฅ **[Live Dashboard](https://huggingface.co/spaces/mahammadaftab/CivicAI)** โ€” Interactive society simulation - ๐Ÿ’ป **GitHub Code** โ€” Full source code --- ## ๐Ÿท Tags `openenv` ยท `multi-agent` ยท `society-simulation` ยท `reinforcement-learning` --- *Built for the OpenEnv Competition ๐Ÿ†*