Spaces:

mahammadaftab
/

CivicAI

Sleeping

App Files Files Community

mahammadaftab commited on 15 days ago

Commit

04588be

verified ·

1 Parent(s): 35fabb8

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -6

README.md CHANGED Viewed

@@ -122,11 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
 👉 **This demonstrates measurable learning, not random behavior.**
 ### Reward Curve
-![Training Reward Curve](assets/reward_curve.png)
 *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
 ### Baseline vs. Trained Comparison
-![Comparison Chart](assets/comparison_chart.png)
 *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
 ---
@@ -134,7 +142,7 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
 ## 🧪 Reproducibility
 **You can reproduce results in under 5 minutes:**
-1. Open the [Colab notebook](https://colab.research.google.com/drive/1examplelinkplaceholder123)
 2. Enable GPU
 3. Run all cells
 4. Observe reward improvement
@@ -190,6 +198,6 @@ CivicAI demonstrates that:
 ## 🔗 Links & Resources
-- 🚀 **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/AI_Society_Simulator](https://huggingface.co/spaces/mahammadaftab/AI_Society_Simulator)
-- 📓 **Training Notebook (Colab):** [https://colab.research.google.com/drive/1examplelinkplaceholder123](https://colab.research.google.com/drive/1examplelinkplaceholder123)
-- 📝 **Write-up / HuggingFace Blog:** [Read the HF Blog Post](BLOG.md)

 👉 **This demonstrates measurable learning, not random behavior.**
 ### Reward Curve
+![Training Reward Curve](
+![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
+)
 *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
 ### Baseline vs. Trained Comparison
+![Comparison Chart](
+![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
+)
 *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
 ---
 ## 🧪 Reproducibility
 **You can reproduce results in under 5 minutes:**
+1. Open the [Colab notebook](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
 2. Enable GPU
 3. Run all cells
 4. Observe reward improvement
 ## 🔗 Links & Resources
+- 🚀 **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/CivicAI/](https://huggingface.co/spaces/mahammadaftab/CivicAI/)
+- 📓 **Training Notebook (Colab):** [https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
+- 📝 **Write-up / HuggingFace Blog:** [Read the HF Blog Post](https://huggingface.co/spaces/mahammadaftab/CivicAI/blob/main/BLOG.md)