Spaces:

mahammadaftab
/

CivicAI

Sleeping

mahammadaftab commited on 28 days ago

Commit

e97e92a

verified ·

1 Parent(s): 04588be

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -122,19 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
 👉 **This demonstrates measurable learning, not random behavior.**
 ### Reward Curve
-![Training Reward Curve](
 ![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
-)
 *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
 ### Baseline vs. Trained Comparison
-![Comparison Chart](
 ![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
-)
 *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
 ---

 👉 **This demonstrates measurable learning, not random behavior.**
 ### Reward Curve
+Training Reward Curve
 ![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
 *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
 ### Baseline vs. Trained Comparison
+Comparison Chart
 ![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
 *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
 ---