Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -122,11 +122,19 @@ We trained a GPT-2 policy agent using HuggingFace TRL (Proximal Policy Optimizat
|
|
| 122 |
π **This demonstrates measurable learning, not random behavior.**
|
| 123 |
|
| 124 |
### Reward Curve
|
| 125 |
-
:** [https://huggingface.co/spaces/mahammadaftab/
|
| 194 |
-
- π **Training Notebook (Colab):** [https://colab.research.google.com/drive/
|
| 195 |
-
- π **Write-up / HuggingFace Blog:** [Read the HF Blog Post](BLOG.md)
|
|
|
|
| 122 |
π **This demonstrates measurable learning, not random behavior.**
|
| 123 |
|
| 124 |
### Reward Curve
|
| 125 |
+

|
| 128 |
+
|
| 129 |
+
)
|
| 130 |
*The PPO agent successfully learns to outperform the random baseline, finding stable fiscal policies that maximize the multi-objective reward.*
|
| 131 |
|
| 132 |
### Baseline vs. Trained Comparison
|
| 133 |
+

|
| 136 |
+
|
| 137 |
+
)
|
| 138 |
*The trained agent demonstrates significant improvement across all difficulty tiers, particularly in the macroeconomic stabilization task.*
|
| 139 |
|
| 140 |
---
|
|
|
|
| 142 |
## π§ͺ Reproducibility
|
| 143 |
|
| 144 |
**You can reproduce results in under 5 minutes:**
|
| 145 |
+
1. Open the [Colab notebook](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
|
| 146 |
2. Enable GPU
|
| 147 |
3. Run all cells
|
| 148 |
4. Observe reward improvement
|
|
|
|
| 198 |
|
| 199 |
## π Links & Resources
|
| 200 |
|
| 201 |
+
- π **Demo (HuggingFace Space):** [https://huggingface.co/spaces/mahammadaftab/CivicAI/](https://huggingface.co/spaces/mahammadaftab/CivicAI/)
|
| 202 |
+
- π **Training Notebook (Colab):** [https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing](https://colab.research.google.com/drive/1VhW1LdFTEuQ9i9h65EDxl5_H3qmD1H0v?usp=sharing)
|
| 203 |
+
- π **Write-up / HuggingFace Blog:** [Read the HF Blog Post](https://huggingface.co/spaces/mahammadaftab/CivicAI/blob/main/BLOG.md)
|