Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -122,19 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
|
|
| 122 |
👉 **This demonstrates measurable learning, not random behavior.**
|
| 123 |
|
| 124 |
### Reward Curve
|
| 125 |
-
|
| 126 |
|
| 127 |

|
| 128 |
|
| 129 |
-
|
| 130 |
*The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
|
| 131 |
|
| 132 |
### Baseline vs. Trained Comparison
|
| 133 |
-
|
| 134 |
|
| 135 |

|
| 136 |
|
| 137 |
-
|
| 138 |
*The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
|
| 139 |
|
| 140 |
---
|
|
|
|
| 122 |
👉 **This demonstrates measurable learning, not random behavior.**
|
| 123 |
|
| 124 |
### Reward Curve
|
| 125 |
+
Training Reward Curve
|
| 126 |
|
| 127 |

|
| 128 |
|
| 129 |
+
|
| 130 |
*The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
|
| 131 |
|
| 132 |
### Baseline vs. Trained Comparison
|
| 133 |
+
Comparison Chart
|
| 134 |
|
| 135 |

|
| 136 |
|
| 137 |
+
|
| 138 |
*The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
|
| 139 |
|
| 140 |
---
|