mahammadaftab commited on
Commit
04588be
Β·
verified Β·
1 Parent(s): 35fabb8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -122,11 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
122
  πŸ‘‰ **This demonstrates measurable learning, not random behavior.**
123
 
124
  ### Reward Curve
125
- ![Training Reward Curve](assets/reward_curve.png)
 
 
 
 
126
  *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
127
 
128
  ### Baseline vs. Trained Comparison
129
- ![Comparison Chart](assets/comparison_chart.png)
 
 
 
 
130
  *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
131
 
132
  ---
@@ -134,7 +142,7 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
134
  ## πŸ§ͺ Reproducibility
135
 
136
  **You can reproduce results in under 5 minutes:**
137
- 1. Open the [Colab notebook](https://colab.research.google.com/drive/1examplelinkplaceholder123)
138
  2. Enable GPU
139
  3. Run all cells
140
  4. Observe reward improvement
@@ -190,6 +198,6 @@ CivicAI demonstrates that:
190
 
191
  ## πŸ”— Links & Resources
192
 
193
- - πŸš€ **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/AI_Society_Simulator](https://huggingface.co/spaces/mahammadaftab/AI_Society_Simulator)
194
- - πŸ““ **Training Notebook (Colab):** [https://colab.research.google.com/drive/1examplelinkplaceholder123](https://colab.research.google.com/drive/1examplelinkplaceholder123)
195
- - πŸ“ **Write-up / HuggingFace Blog:** [Read the HF Blog Post](BLOG.md)
 
122
  πŸ‘‰ **This demonstrates measurable learning, not random behavior.**
123
 
124
  ### Reward Curve
125
+ ![Training Reward Curve](
126
+
127
+ ![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
128
+
129
+ )
130
  *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
131
 
132
  ### Baseline vs. Trained Comparison
133
+ ![Comparison Chart](
134
+
135
+ ![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
136
+
137
+ )
138
  *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
139
 
140
  ---
 
142
  ## πŸ§ͺ Reproducibility
143
 
144
  **You can reproduce results in under 5 minutes:**
145
+ 1. Open the [Colab notebook](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
146
  2. Enable GPU
147
  3. Run all cells
148
  4. Observe reward improvement
 
198
 
199
  ## πŸ”— Links & Resources
200
 
201
+ - πŸš€ **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/CivicAI/](https://huggingface.co/spaces/mahammadaftab/CivicAI/)
202
+ - πŸ““ **Training Notebook (Colab):** [https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
203
+ - πŸ“ **Write-up / HuggingFace Blog:** [Read the HF Blog Post](https://huggingface.co/spaces/mahammadaftab/CivicAI/blob/main/BLOG.md)