mahammadaftab commited on
Commit
e97e92a
·
verified ·
1 Parent(s): 04588be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -122,19 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
122
  👉 **This demonstrates measurable learning, not random behavior.**
123
 
124
  ### Reward Curve
125
- ![Training Reward Curve](
126
 
127
  ![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
128
 
129
- )
130
  *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
131
 
132
  ### Baseline vs. Trained Comparison
133
- ![Comparison Chart](
134
 
135
  ![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
136
 
137
- )
138
  *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
139
 
140
  ---
 
122
  👉 **This demonstrates measurable learning, not random behavior.**
123
 
124
  ### Reward Curve
125
+ Training Reward Curve
126
 
127
  ![Screenshot 2026-04-26 163716](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/J_jMixXqJNBc7AEYp4hxr.png)
128
 
129
+
130
  *The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
131
 
132
  ### Baseline vs. Trained Comparison
133
+ Comparison Chart
134
 
135
  ![Screenshot 2026-04-26 164009](https://cdn-uploads.huggingface.co/production/uploads/68e1066110db6d257dfceb12/tNnWRZDymTsXVTPfVbtAt.png)
136
 
137
+
138
  *The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
139
 
140
  ---