Spaces:
Paused
Paused
Commit ·
e2c547b
1
Parent(s): fc3950d
la la la --123
Browse files- .gitignore +2 -1
- SIMULATION_REPORT.md +0 -276
- plots/.gitkeep +0 -0
- plots/baseline_leaderboard.png +3 -0
- plots/baseline_trajectories.png +3 -0
- plots/before_after.png +3 -0
- plots/reward_curve.png +3 -0
- plots/training_log.csv +5 -0
- plots/training_summary.json +271 -0
- plots/training_trajectories.png +3 -0
- pyproject.toml +2 -9
- server/app.py +60 -0
- server/dashboard.html +12 -9
- server/simulation_history.json +1 -1802
- server/training.html +369 -0
- server/viraltest_environment.py +28 -4
- test_scenarios.py +3 -3
- training/run_llm_training.py +634 -0
- training/run_training_evidence.py +580 -0
- training/train_grpo.ipynb +925 -93
.gitignore
CHANGED
|
@@ -4,8 +4,9 @@
|
|
| 4 |
!.env.example
|
| 5 |
|
| 6 |
# Generated visualization outputs (regenerate: python visualize_optimal.py)
|
| 7 |
-
# Hugging Face Spaces rejects plain-git binary files; keep charts local or use Git LFS elsewhere.
|
| 8 |
*.png
|
|
|
|
|
|
|
| 9 |
|
| 10 |
__pycache__/
|
| 11 |
*.py[cod]
|
|
|
|
| 4 |
!.env.example
|
| 5 |
|
| 6 |
# Generated visualization outputs (regenerate: python visualize_optimal.py)
|
|
|
|
| 7 |
*.png
|
| 8 |
+
# But keep training evidence plots
|
| 9 |
+
!plots/*.png
|
| 10 |
|
| 11 |
__pycache__/
|
| 12 |
*.py[cod]
|
SIMULATION_REPORT.md
DELETED
|
@@ -1,276 +0,0 @@
|
|
| 1 |
-
# Viraltest Simulation Report
|
| 2 |
-
|
| 3 |
-
**Task:** Hard — Competitive (weekly_competitive)
|
| 4 |
-
**Episode Length:** 168 steps (7 days x 24 hours)
|
| 5 |
-
**Starting Followers:** 10,000 | **Starting Energy:** 1.00
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Executive Summary
|
| 10 |
-
|
| 11 |
-
11 agent strategies were evaluated on the Hard — Competitive task. The **Balanced Creator** (0.8775) and **Smart Agent** (0.8745) achieved the highest scores by combining strategic posting, energy management, and tag diversity. Two agents (**Spam Post**, **No Rest**) burned out within 8 steps, scoring 0.0000. The **Always Rest** agent lost 45% of its followers from inactivity.
|
| 12 |
-
|
| 13 |
-
---
|
| 14 |
-
|
| 15 |
-
## Leaderboard
|
| 16 |
-
|
| 17 |
-
| Rank | Scenario | Score | Followers | Delta | Energy | Burned Out |
|
| 18 |
-
|------|----------|-------|-----------|-------|--------|------------|
|
| 19 |
-
| 1 | Balanced Creator | **0.8775** | 12,534 | +2,534 (+25.3%) | 1.00 | No |
|
| 20 |
-
| 2 | Smart Agent | **0.8745** | 12,200 | +2,200 (+22.0%) | 1.00 | No |
|
| 21 |
-
| 3 | Tag Explorer | **0.8323** | 11,351 | +1,351 (+13.5%) | 0.94 | No |
|
| 22 |
-
| 4 | Copycat | **0.6136** | 11,589 | +1,589 (+15.9%) | 1.00 | No |
|
| 23 |
-
| 5 | Burst Poster | **0.6111** | 11,701 | +1,701 (+17.0%) | 0.44 | No |
|
| 24 |
-
| 6 | Queue Optimizer | **0.3520** | 11,215 | +1,215 (+12.2%) | 1.00 | No |
|
| 25 |
-
| 7 | Weekend Warrior | **0.1257** | 7,659 | -2,341 (-23.4%) | 1.00 | No |
|
| 26 |
-
| 8 | Night Poster | **0.0937** | 10,237 | +237 (+2.4%) | 0.59 | No |
|
| 27 |
-
| 9 | Always Rest | **0.0350** | 5,497 | -4,503 (-45.0%) | 1.00 | No |
|
| 28 |
-
| 10 | Spam Post | **0.0000** | 10,625 | +625 (+6.3%) | 0.00 | **YES** |
|
| 29 |
-
| 11 | No Rest | **0.0000** | 10,213 | +213 (+2.1%) | 0.00 | **YES** |
|
| 30 |
-
|
| 31 |
-
---
|
| 32 |
-
|
| 33 |
-
## Detailed Agent Analysis
|
| 34 |
-
|
| 35 |
-
### 1. Balanced Creator — Score: 0.8775 (BEST)
|
| 36 |
-
|
| 37 |
-
| Metric | Value |
|
| 38 |
-
|--------|-------|
|
| 39 |
-
| Steps Completed | 168 / 168 |
|
| 40 |
-
| Final Energy | 1.00 |
|
| 41 |
-
| Final Followers | 12,534 (+25.3%) |
|
| 42 |
-
| Engagement Rate | 0.827 |
|
| 43 |
-
| Total Posts | 28 |
|
| 44 |
-
| Total Rests | 84 |
|
| 45 |
-
| Content Created | 56 |
|
| 46 |
-
| Unique Tags | 19 |
|
| 47 |
-
| Min Energy | 0.795 (never dipped below safe zone) |
|
| 48 |
-
| Avg Reward | 0.219 |
|
| 49 |
-
| Max Reward | 0.738 |
|
| 50 |
-
|
| 51 |
-
**Strategy:** Create → Post → Rest cycle. Uses the content queue (56 items created, 28 posted from queue at 50% energy cost). Posts during peak hours with trending topics. Never risks burnout.
|
| 52 |
-
|
| 53 |
-
**Top Tags:** #food (1.32), #election (1.31), #coding (1.16), #saas (1.03), #crypto (1.02)
|
| 54 |
-
|
| 55 |
-
**Why it won:** Highest follower growth (+2,534), perfect energy management (never below 0.795), excellent tag diversity (19 unique), and consistent daily posting.
|
| 56 |
-
|
| 57 |
-
---
|
| 58 |
-
|
| 59 |
-
### 2. Smart Agent — Score: 0.8745
|
| 60 |
-
|
| 61 |
-
| Metric | Value |
|
| 62 |
-
|--------|-------|
|
| 63 |
-
| Steps Completed | 168 / 168 |
|
| 64 |
-
| Final Energy | 1.00 |
|
| 65 |
-
| Final Followers | 12,200 (+22.0%) |
|
| 66 |
-
| Engagement Rate | 1.556 |
|
| 67 |
-
| Total Posts | 14 |
|
| 68 |
-
| Total Rests | 154 |
|
| 69 |
-
| Unique Tags | 19 |
|
| 70 |
-
| Min Energy | 0.55 |
|
| 71 |
-
| Avg Reward | 0.230 |
|
| 72 |
-
| Max Reward | 0.760 |
|
| 73 |
-
|
| 74 |
-
**Strategy:** Posts only during peak hours (9-20) when energy > 0.4 and posts < 2/day. Uses trending topics and tags. Rests aggressively.
|
| 75 |
-
|
| 76 |
-
**Top Tags:** #ai (3.56), #wellness (2.55), #summer (2.36), #crypto (2.18), #newyear (2.01)
|
| 77 |
-
|
| 78 |
-
**Why it's strong:** Highest individual tag performance (#ai at 3.56), highest engagement rate (1.556), but fewer posts (14 vs 28) cost it the top spot.
|
| 79 |
-
|
| 80 |
-
---
|
| 81 |
-
|
| 82 |
-
### 3. Tag Explorer — Score: 0.8323
|
| 83 |
-
|
| 84 |
-
| Metric | Value |
|
| 85 |
-
|--------|-------|
|
| 86 |
-
| Steps Completed | 168 / 168 |
|
| 87 |
-
| Final Energy | 0.94 |
|
| 88 |
-
| Final Followers | 11,351 (+13.5%) |
|
| 89 |
-
| Engagement Rate | 0.774 |
|
| 90 |
-
| Total Posts | 15 |
|
| 91 |
-
| Unique Tags | **30** (highest) |
|
| 92 |
-
| Min Energy | 0.69 |
|
| 93 |
-
|
| 94 |
-
**Strategy:** New tag combination every post. Maximizes tag discovery — 30 unique tags used (the highest of all agents).
|
| 95 |
-
|
| 96 |
-
**Why it scored high:** The grading formula rewards tag diversity heavily. 30 unique tags gave a massive tag_discovery bonus.
|
| 97 |
-
|
| 98 |
-
---
|
| 99 |
-
|
| 100 |
-
### 4. Copycat — Score: 0.6136
|
| 101 |
-
|
| 102 |
-
| Metric | Value |
|
| 103 |
-
|--------|-------|
|
| 104 |
-
| Steps Completed | 168 / 168 |
|
| 105 |
-
| Final Energy | 1.00 |
|
| 106 |
-
| Final Followers | 11,589 (+15.9%) |
|
| 107 |
-
| Total Posts | 21 |
|
| 108 |
-
| Unique Tags | 8 |
|
| 109 |
-
| Min Energy | 0.10 (dangerous dip!) |
|
| 110 |
-
|
| 111 |
-
**Strategy:** Copies competitor topics and content types. Posts when competitors are active.
|
| 112 |
-
|
| 113 |
-
**Weakness:** High niche saturation from copying rivals. Only 8 unique tags (penalized). Min energy hit 0.10 — nearly burned out.
|
| 114 |
-
|
| 115 |
-
---
|
| 116 |
-
|
| 117 |
-
### 5. Burst Poster — Score: 0.6111
|
| 118 |
-
|
| 119 |
-
| Metric | Value |
|
| 120 |
-
|--------|-------|
|
| 121 |
-
| Steps Completed | 168 / 168 |
|
| 122 |
-
| Final Energy | 0.44 |
|
| 123 |
-
| Final Followers | 11,701 (+17.0%) |
|
| 124 |
-
| Total Posts | **57** (highest) |
|
| 125 |
-
| Unique Tags | 13 |
|
| 126 |
-
| Min Energy | 0.25 |
|
| 127 |
-
|
| 128 |
-
**Strategy:** 3 posts in rapid succession, then rests until recovered. Repeat.
|
| 129 |
-
|
| 130 |
-
**Weakness:** Ended with only 0.44 energy. 57 posts caused audience fatigue (posts > 3/day get heavy penalty). Low per-post engagement (0.208) despite high volume.
|
| 131 |
-
|
| 132 |
-
---
|
| 133 |
-
|
| 134 |
-
### 6. Queue Optimizer — Score: 0.3520
|
| 135 |
-
|
| 136 |
-
| Metric | Value |
|
| 137 |
-
|--------|-------|
|
| 138 |
-
| Steps Completed | 168 / 168 |
|
| 139 |
-
| Final Energy | 1.00 |
|
| 140 |
-
| Final Followers | 11,215 (+12.2%) |
|
| 141 |
-
| Total Posts | 14 |
|
| 142 |
-
| Content Created | 17 |
|
| 143 |
-
| Unique Tags | 12 |
|
| 144 |
-
|
| 145 |
-
**Strategy:** Creates content first (builds queue), then posts from queue at half energy cost.
|
| 146 |
-
|
| 147 |
-
**Weakness:** Spent too long in "prep" phase creating content. Only 14 actual posts despite 17 items queued. Score penalized for under-utilizing the queue.
|
| 148 |
-
|
| 149 |
-
---
|
| 150 |
-
|
| 151 |
-
### 7. Weekend Warrior — Score: 0.1257
|
| 152 |
-
|
| 153 |
-
| Metric | Value |
|
| 154 |
-
|--------|-------|
|
| 155 |
-
| Steps Completed | 168 / 168 |
|
| 156 |
-
| Final Followers | 7,659 **(-23.4%)** |
|
| 157 |
-
| Total Posts | 6 |
|
| 158 |
-
| Unique Tags | 6 |
|
| 159 |
-
|
| 160 |
-
**Strategy:** Only posts on Saturday and Sunday. Rests Mon-Fri.
|
| 161 |
-
|
| 162 |
-
**Weakness:** 5 days of inactivity triggered follower decay (-2,341) and algorithm penalty. Only 6 posts total. Weekend posting also gets a 0.7x penalty multiplier.
|
| 163 |
-
|
| 164 |
-
---
|
| 165 |
-
|
| 166 |
-
### 8. Night Poster — Score: 0.0937
|
| 167 |
-
|
| 168 |
-
| Metric | Value |
|
| 169 |
-
|--------|-------|
|
| 170 |
-
| Steps Completed | 168 / 168 |
|
| 171 |
-
| Final Followers | 10,237 (+2.4%) |
|
| 172 |
-
| Total Posts | 49 |
|
| 173 |
-
| Unique Tags | 2 |
|
| 174 |
-
| Engagement Rate | 0.036 |
|
| 175 |
-
|
| 176 |
-
**Strategy:** Posts exclusively at night (23:00-06:00) with boring topics.
|
| 177 |
-
|
| 178 |
-
**Weakness:** Night hours get 0.5x multiplier. Only 2 unique tags (#stoic, #minimalism) — severe tag penalty. Despite 49 posts, engagement was near-zero (0.036).
|
| 179 |
-
|
| 180 |
-
---
|
| 181 |
-
|
| 182 |
-
### 9. Always Rest — Score: 0.0350
|
| 183 |
-
|
| 184 |
-
| Metric | Value |
|
| 185 |
-
|--------|-------|
|
| 186 |
-
| Steps Completed | 168 / 168 |
|
| 187 |
-
| Final Followers | 5,497 **(-45.0%)** |
|
| 188 |
-
| Total Posts | 0 |
|
| 189 |
-
| Engagement Rate | 0.000 |
|
| 190 |
-
|
| 191 |
-
**Strategy:** Never posts. Rests every step.
|
| 192 |
-
|
| 193 |
-
**Result:** Zero engagement. Lost 4,503 followers (45%) to decay. Algorithm penalty stacked from inactivity. Energy stayed at 1.00 — completely wasted.
|
| 194 |
-
|
| 195 |
-
---
|
| 196 |
-
|
| 197 |
-
### 10. Spam Post — Score: 0.0000
|
| 198 |
-
|
| 199 |
-
| Metric | Value |
|
| 200 |
-
|--------|-------|
|
| 201 |
-
| Steps Completed | **4** / 168 |
|
| 202 |
-
| Final Energy | **0.00 (BURNED OUT)** |
|
| 203 |
-
| Final Followers | 10,625 (+6.3%) |
|
| 204 |
-
|
| 205 |
-
**Strategy:** Posts the same reel with "AI tools" topic every step. No rest.
|
| 206 |
-
|
| 207 |
-
**Result:** Burned out at step 4. Each reel costs 0.25 energy. 4 reels = 1.00 energy drained. Episode ended at step 4 with score 0.0000 (burnout = automatic fail on competitive task).
|
| 208 |
-
|
| 209 |
-
---
|
| 210 |
-
|
| 211 |
-
### 11. No Rest — Score: 0.0000
|
| 212 |
-
|
| 213 |
-
| Metric | Value |
|
| 214 |
-
|--------|-------|
|
| 215 |
-
| Steps Completed | **8** / 168 |
|
| 216 |
-
| Final Energy | **0.00 (BURNED OUT)** |
|
| 217 |
-
| Final Followers | 10,213 (+2.1%) |
|
| 218 |
-
|
| 219 |
-
**Strategy:** Posts varied content types but never rests.
|
| 220 |
-
|
| 221 |
-
**Result:** Burned out at step 8. Mixed content types (reel, carousel, story, text_post) averaged ~0.125 energy cost. 8 posts without rest = burnout. Score: 0.0000.
|
| 222 |
-
|
| 223 |
-
---
|
| 224 |
-
|
| 225 |
-
## Key Metrics Comparison
|
| 226 |
-
|
| 227 |
-
### Energy Management
|
| 228 |
-
| Agent | Min Energy | Final Energy | Energy Safety |
|
| 229 |
-
|-------|-----------|--------------|---------------|
|
| 230 |
-
| Always Rest | 1.000 | 1.00 | Wasted |
|
| 231 |
-
| Balanced | 0.795 | 1.00 | Excellent |
|
| 232 |
-
| Tag Explorer | 0.690 | 0.94 | Good |
|
| 233 |
-
| Queue Optimizer | 0.610 | 1.00 | Good |
|
| 234 |
-
| Smart Agent | 0.550 | 1.00 | Good |
|
| 235 |
-
| Burst Poster | 0.250 | 0.44 | Risky |
|
| 236 |
-
| Night Poster | 0.230 | 0.59 | Dangerous |
|
| 237 |
-
| Copycat | 0.100 | 1.00 | Near-fatal dip |
|
| 238 |
-
| Weekend | 0.100 | 1.00 | Near-fatal dip |
|
| 239 |
-
| No Rest | 0.000 | 0.00 | BURNED OUT |
|
| 240 |
-
| Spam Post | 0.000 | 0.00 | BURNED OUT |
|
| 241 |
-
|
| 242 |
-
### Posting Volume vs Quality
|
| 243 |
-
| Agent | Posts | Engagement Rate | Engagement per Post |
|
| 244 |
-
|-------|-------|----------------|---------------------|
|
| 245 |
-
| Burst | 57 | 0.208 | Low (fatigue) |
|
| 246 |
-
| Night Poster | 49 | 0.036 | Very low (timing) |
|
| 247 |
-
| Balanced | 28 | 0.827 | High |
|
| 248 |
-
| Copycat | 21 | 0.497 | Medium |
|
| 249 |
-
| Tag Explorer | 15 | 0.774 | High |
|
| 250 |
-
| Smart Agent | 14 | 1.556 | Very high |
|
| 251 |
-
| Queue Opt | 14 | 0.870 | High |
|
| 252 |
-
| Weekend | 6 | 0.635 | Medium |
|
| 253 |
-
| Spam | 4 | 1.567 | High (but burned out) |
|
| 254 |
-
|
| 255 |
-
---
|
| 256 |
-
|
| 257 |
-
## Lessons Learned
|
| 258 |
-
|
| 259 |
-
1. **Burnout is fatal** — On the competitive task, burnout = score 0.0000. Energy management is the #1 priority.
|
| 260 |
-
|
| 261 |
-
2. **Quality > Quantity** — Smart Agent posted only 14 times but had the highest engagement rate (1.556). Burst posted 57 times but scored lower.
|
| 262 |
-
|
| 263 |
-
3. **Tag diversity matters** — Tag Explorer's 30 unique tags boosted its score to 0.8323 despite moderate engagement. Night Poster's 2 tags destroyed its score.
|
| 264 |
-
|
| 265 |
-
4. **Content queue is powerful** — Balanced Creator used create_content (56 times) to build a queue, then posted at half energy cost. This enabled 28 posts while maintaining 0.795+ energy.
|
| 266 |
-
|
| 267 |
-
5. **Timing is critical** — Night Poster proved that posting at wrong hours (0.5x multiplier) wastes energy for near-zero engagement.
|
| 268 |
-
|
| 269 |
-
6. **Copying competitors backfires** — Copycat achieved decent followers but niche saturation penalty and low tag diversity (8) capped its score at 0.6136.
|
| 270 |
-
|
| 271 |
-
7. **Consistency beats bursts** — Posting 1-2/day consistently (Balanced, Smart) scored higher than bursting 3+ posts then resting (Burst).
|
| 272 |
-
|
| 273 |
-
---
|
| 274 |
-
|
| 275 |
-
*Report generated from Viraltest Creator Intelligence Center*
|
| 276 |
-
*Task: weekly_competitive | 168 hourly steps | 3 competitor profiles*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
plots/.gitkeep
ADDED
|
File without changes
|
plots/baseline_leaderboard.png
ADDED
|
Git LFS Details
|
plots/baseline_trajectories.png
ADDED
|
Git LFS Details
|
plots/before_after.png
ADDED
|
Git LFS Details
|
plots/reward_curve.png
ADDED
|
Git LFS Details
|
plots/training_log.csv
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
round,avg_grader,max_grader,min_grader,avg_reward,max_reward,min_reward,best_temperature
|
| 2 |
+
1,0.4958,0.7391,0.3698,6.07,6.104,6.037,1.4
|
| 3 |
+
2,0.4912,0.7236,0.2527,6.093,6.1,6.076,1.0
|
| 4 |
+
3,0.6015,0.7529,0.382,6.418,6.481,6.343,0.7
|
| 5 |
+
4,0.5548,0.7705,0.3764,6.467,6.527,6.366,0.7
|
plots/training_summary.json
ADDED
|
@@ -0,0 +1,271 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model": "qwen2.5:3b-instruct-q4_K_M",
|
| 3 |
+
"device": "M4 Mac (Ollama local)",
|
| 4 |
+
"training_rounds": 4,
|
| 5 |
+
"episodes_per_round": 6,
|
| 6 |
+
"before": {
|
| 7 |
+
"monthly_engage": 0.3548,
|
| 8 |
+
"monthly_strategic": 0.6795,
|
| 9 |
+
"monthly_competitive": 0.3738
|
| 10 |
+
},
|
| 11 |
+
"after": {
|
| 12 |
+
"monthly_engage": 0.4086,
|
| 13 |
+
"monthly_strategic": 0.6273,
|
| 14 |
+
"monthly_competitive": 0.5101
|
| 15 |
+
},
|
| 16 |
+
"smart_heuristic": {
|
| 17 |
+
"monthly_engage": 0.4312,
|
| 18 |
+
"monthly_strategic": 0.7682,
|
| 19 |
+
"monthly_competitive": 0.8094
|
| 20 |
+
},
|
| 21 |
+
"improvement": {
|
| 22 |
+
"monthly_engage": 0.053800000000000014,
|
| 23 |
+
"monthly_strategic": -0.052200000000000024,
|
| 24 |
+
"monthly_competitive": 0.13629999999999998
|
| 25 |
+
},
|
| 26 |
+
"training_log": {
|
| 27 |
+
"round": [
|
| 28 |
+
1,
|
| 29 |
+
2,
|
| 30 |
+
3,
|
| 31 |
+
4
|
| 32 |
+
],
|
| 33 |
+
"avg_grader": [
|
| 34 |
+
0.4958,
|
| 35 |
+
0.4912,
|
| 36 |
+
0.6015,
|
| 37 |
+
0.5548
|
| 38 |
+
],
|
| 39 |
+
"max_grader": [
|
| 40 |
+
0.7391,
|
| 41 |
+
0.7236,
|
| 42 |
+
0.7529,
|
| 43 |
+
0.7705
|
| 44 |
+
],
|
| 45 |
+
"min_grader": [
|
| 46 |
+
0.3698,
|
| 47 |
+
0.2527,
|
| 48 |
+
0.382,
|
| 49 |
+
0.3764
|
| 50 |
+
],
|
| 51 |
+
"avg_reward": [
|
| 52 |
+
6.07,
|
| 53 |
+
6.093,
|
| 54 |
+
6.418,
|
| 55 |
+
6.467
|
| 56 |
+
],
|
| 57 |
+
"max_reward": [
|
| 58 |
+
6.104,
|
| 59 |
+
6.1,
|
| 60 |
+
6.481,
|
| 61 |
+
6.527
|
| 62 |
+
],
|
| 63 |
+
"min_reward": [
|
| 64 |
+
6.037,
|
| 65 |
+
6.076,
|
| 66 |
+
6.343,
|
| 67 |
+
6.366
|
| 68 |
+
],
|
| 69 |
+
"best_temperature": [
|
| 70 |
+
1.4,
|
| 71 |
+
1.0,
|
| 72 |
+
0.7,
|
| 73 |
+
0.7
|
| 74 |
+
]
|
| 75 |
+
},
|
| 76 |
+
"all_episodes": [
|
| 77 |
+
{
|
| 78 |
+
"round": 1,
|
| 79 |
+
"task": "monthly_engage",
|
| 80 |
+
"seed": 42,
|
| 81 |
+
"grader_score": 0.4395,
|
| 82 |
+
"total_reward": 6.1044,
|
| 83 |
+
"temperature": 1.4
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"round": 1,
|
| 87 |
+
"task": "monthly_strategic",
|
| 88 |
+
"seed": 43,
|
| 89 |
+
"grader_score": 0.6758,
|
| 90 |
+
"total_reward": 6.0373,
|
| 91 |
+
"temperature": 1.4
|
| 92 |
+
},
|
| 93 |
+
{
|
| 94 |
+
"round": 1,
|
| 95 |
+
"task": "monthly_competitive",
|
| 96 |
+
"seed": 44,
|
| 97 |
+
"grader_score": 0.3698,
|
| 98 |
+
"total_reward": 6.0686,
|
| 99 |
+
"temperature": 1.4
|
| 100 |
+
},
|
| 101 |
+
{
|
| 102 |
+
"round": 1,
|
| 103 |
+
"task": "monthly_engage",
|
| 104 |
+
"seed": 45,
|
| 105 |
+
"grader_score": 0.3806,
|
| 106 |
+
"total_reward": 6.0643,
|
| 107 |
+
"temperature": 1.4
|
| 108 |
+
},
|
| 109 |
+
{
|
| 110 |
+
"round": 1,
|
| 111 |
+
"task": "monthly_strategic",
|
| 112 |
+
"seed": 46,
|
| 113 |
+
"grader_score": 0.7391,
|
| 114 |
+
"total_reward": 6.096,
|
| 115 |
+
"temperature": 1.4
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"round": 1,
|
| 119 |
+
"task": "monthly_competitive",
|
| 120 |
+
"seed": 47,
|
| 121 |
+
"grader_score": 0.3699,
|
| 122 |
+
"total_reward": 6.0489999999999995,
|
| 123 |
+
"temperature": 1.4
|
| 124 |
+
},
|
| 125 |
+
{
|
| 126 |
+
"round": 2,
|
| 127 |
+
"task": "monthly_engage",
|
| 128 |
+
"seed": 142,
|
| 129 |
+
"grader_score": 0.4335,
|
| 130 |
+
"total_reward": 6.0995,
|
| 131 |
+
"temperature": 1.0
|
| 132 |
+
},
|
| 133 |
+
{
|
| 134 |
+
"round": 2,
|
| 135 |
+
"task": "monthly_strategic",
|
| 136 |
+
"seed": 143,
|
| 137 |
+
"grader_score": 0.7236,
|
| 138 |
+
"total_reward": 6.0992,
|
| 139 |
+
"temperature": 1.0
|
| 140 |
+
},
|
| 141 |
+
{
|
| 142 |
+
"round": 2,
|
| 143 |
+
"task": "monthly_competitive",
|
| 144 |
+
"seed": 144,
|
| 145 |
+
"grader_score": 0.3789,
|
| 146 |
+
"total_reward": 6.0943,
|
| 147 |
+
"temperature": 1.0
|
| 148 |
+
},
|
| 149 |
+
{
|
| 150 |
+
"round": 2,
|
| 151 |
+
"task": "monthly_engage",
|
| 152 |
+
"seed": 145,
|
| 153 |
+
"grader_score": 0.4356,
|
| 154 |
+
"total_reward": 6.0999,
|
| 155 |
+
"temperature": 1.0
|
| 156 |
+
},
|
| 157 |
+
{
|
| 158 |
+
"round": 2,
|
| 159 |
+
"task": "monthly_strategic",
|
| 160 |
+
"seed": 146,
|
| 161 |
+
"grader_score": 0.7232,
|
| 162 |
+
"total_reward": 6.0882,
|
| 163 |
+
"temperature": 1.0
|
| 164 |
+
},
|
| 165 |
+
{
|
| 166 |
+
"round": 2,
|
| 167 |
+
"task": "monthly_competitive",
|
| 168 |
+
"seed": 147,
|
| 169 |
+
"grader_score": 0.2527,
|
| 170 |
+
"total_reward": 6.0764,
|
| 171 |
+
"temperature": 1.0
|
| 172 |
+
},
|
| 173 |
+
{
|
| 174 |
+
"round": 3,
|
| 175 |
+
"task": "monthly_engage",
|
| 176 |
+
"seed": 242,
|
| 177 |
+
"grader_score": 0.382,
|
| 178 |
+
"total_reward": 6.4364,
|
| 179 |
+
"temperature": 0.7
|
| 180 |
+
},
|
| 181 |
+
{
|
| 182 |
+
"round": 3,
|
| 183 |
+
"task": "monthly_strategic",
|
| 184 |
+
"seed": 243,
|
| 185 |
+
"grader_score": 0.6426,
|
| 186 |
+
"total_reward": 6.4364,
|
| 187 |
+
"temperature": 0.7
|
| 188 |
+
},
|
| 189 |
+
{
|
| 190 |
+
"round": 3,
|
| 191 |
+
"task": "monthly_competitive",
|
| 192 |
+
"seed": 244,
|
| 193 |
+
"grader_score": 0.7529,
|
| 194 |
+
"total_reward": 6.3849,
|
| 195 |
+
"temperature": 0.7
|
| 196 |
+
},
|
| 197 |
+
{
|
| 198 |
+
"round": 3,
|
| 199 |
+
"task": "monthly_engage",
|
| 200 |
+
"seed": 245,
|
| 201 |
+
"grader_score": 0.3935,
|
| 202 |
+
"total_reward": 6.4805,
|
| 203 |
+
"temperature": 0.7
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"round": 3,
|
| 207 |
+
"task": "monthly_strategic",
|
| 208 |
+
"seed": 246,
|
| 209 |
+
"grader_score": 0.724,
|
| 210 |
+
"total_reward": 6.4286,
|
| 211 |
+
"temperature": 0.7
|
| 212 |
+
},
|
| 213 |
+
{
|
| 214 |
+
"round": 3,
|
| 215 |
+
"task": "monthly_competitive",
|
| 216 |
+
"seed": 247,
|
| 217 |
+
"grader_score": 0.7138,
|
| 218 |
+
"total_reward": 6.3425,
|
| 219 |
+
"temperature": 0.7
|
| 220 |
+
},
|
| 221 |
+
{
|
| 222 |
+
"round": 4,
|
| 223 |
+
"task": "monthly_engage",
|
| 224 |
+
"seed": 342,
|
| 225 |
+
"grader_score": 0.3764,
|
| 226 |
+
"total_reward": 6.4858,
|
| 227 |
+
"temperature": 0.7
|
| 228 |
+
},
|
| 229 |
+
{
|
| 230 |
+
"round": 4,
|
| 231 |
+
"task": "monthly_strategic",
|
| 232 |
+
"seed": 343,
|
| 233 |
+
"grader_score": 0.6314,
|
| 234 |
+
"total_reward": 6.4636,
|
| 235 |
+
"temperature": 0.7
|
| 236 |
+
},
|
| 237 |
+
{
|
| 238 |
+
"round": 4,
|
| 239 |
+
"task": "monthly_competitive",
|
| 240 |
+
"seed": 344,
|
| 241 |
+
"grader_score": 0.7705,
|
| 242 |
+
"total_reward": 6.4934,
|
| 243 |
+
"temperature": 0.7
|
| 244 |
+
},
|
| 245 |
+
{
|
| 246 |
+
"round": 4,
|
| 247 |
+
"task": "monthly_engage",
|
| 248 |
+
"seed": 345,
|
| 249 |
+
"grader_score": 0.3851,
|
| 250 |
+
"total_reward": 6.4661,
|
| 251 |
+
"temperature": 0.7
|
| 252 |
+
},
|
| 253 |
+
{
|
| 254 |
+
"round": 4,
|
| 255 |
+
"task": "monthly_strategic",
|
| 256 |
+
"seed": 346,
|
| 257 |
+
"grader_score": 0.6755,
|
| 258 |
+
"total_reward": 6.5269,
|
| 259 |
+
"temperature": 0.7
|
| 260 |
+
},
|
| 261 |
+
{
|
| 262 |
+
"round": 4,
|
| 263 |
+
"task": "monthly_competitive",
|
| 264 |
+
"seed": 347,
|
| 265 |
+
"grader_score": 0.4897,
|
| 266 |
+
"total_reward": 6.3657,
|
| 267 |
+
"temperature": 0.7
|
| 268 |
+
}
|
| 269 |
+
],
|
| 270 |
+
"elapsed_seconds": 6034.9
|
| 271 |
+
}
|
plots/training_trajectories.png
ADDED
|
Git LFS Details
|
pyproject.toml
CHANGED
|
@@ -18,14 +18,7 @@ dependencies = [
|
|
| 18 |
# install from github
|
| 19 |
# "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
|
| 20 |
"openenv-core[core]>=0.2.2",
|
| 21 |
-
|
| 22 |
-
# Add all dependencies needed for your environment here
|
| 23 |
-
# Examples:
|
| 24 |
-
# "numpy>=1.19.0",
|
| 25 |
-
# "torch>=2.0.0",
|
| 26 |
-
# "gymnasium>=0.29.0",
|
| 27 |
-
# "openspiel>=1.0.0",
|
| 28 |
-
# "smolagents>=1.22.0,<2",
|
| 29 |
]
|
| 30 |
|
| 31 |
[project.optional-dependencies]
|
|
@@ -45,4 +38,4 @@ packages = ["viraltest", "viraltest.server"]
|
|
| 45 |
package-dir = { "viraltest" = ".", "viraltest.server" = "server" }
|
| 46 |
|
| 47 |
[tool.setuptools.package-data]
|
| 48 |
-
"viraltest.server" = ["*.html"]
|
|
|
|
| 18 |
# install from github
|
| 19 |
# "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
|
| 20 |
"openenv-core[core]>=0.2.2",
|
| 21 |
+
"openai>=1.0.0",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
]
|
| 23 |
|
| 24 |
[project.optional-dependencies]
|
|
|
|
| 38 |
package-dir = { "viraltest" = ".", "viraltest.server" = "server" }
|
| 39 |
|
| 40 |
[tool.setuptools.package-data]
|
| 41 |
+
"viraltest.server" = ["*.html", "data/*.json"]
|
server/app.py
CHANGED
|
@@ -41,6 +41,8 @@ except ImportError:
|
|
| 41 |
from server.viraltest_environment import TAG_POOL
|
| 42 |
|
| 43 |
_DASHBOARD_HTML = (Path(__file__).parent / "dashboard.html").read_text()
|
|
|
|
|
|
|
| 44 |
|
| 45 |
app = create_app(
|
| 46 |
ViraltestEnvironment,
|
|
@@ -337,6 +339,64 @@ async def dashboard_simulate(body: Dict[str, Any] = Body(...)):
|
|
| 337 |
return result
|
| 338 |
|
| 339 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
def main(host: str = "0.0.0.0", port: int = 8000):
|
| 341 |
import uvicorn
|
| 342 |
uvicorn.run(app, host=host, port=port)
|
|
|
|
| 41 |
from server.viraltest_environment import TAG_POOL
|
| 42 |
|
| 43 |
_DASHBOARD_HTML = (Path(__file__).parent / "dashboard.html").read_text()
|
| 44 |
+
_TRAINING_HTML_PATH = Path(__file__).parent / "training.html"
|
| 45 |
+
_TRAINING_HTML = _TRAINING_HTML_PATH.read_text() if _TRAINING_HTML_PATH.exists() else "<html><body>Training page not found</body></html>"
|
| 46 |
|
| 47 |
app = create_app(
|
| 48 |
ViraltestEnvironment,
|
|
|
|
| 339 |
return result
|
| 340 |
|
| 341 |
|
| 342 |
+
_TRAINING_TASKS = ["monthly_engage", "monthly_strategic", "monthly_competitive"]
|
| 343 |
+
|
| 344 |
+
@app.get("/dashboard/training-evidence")
|
| 345 |
+
async def training_evidence():
|
| 346 |
+
"""Run all baseline scenarios across all tasks and return structured comparison data."""
|
| 347 |
+
global _SIM_RNG
|
| 348 |
+
|
| 349 |
+
results = []
|
| 350 |
+
for scenario_id, (label, desc, plan_fn) in SCENARIOS.items():
|
| 351 |
+
for task in _TRAINING_TASKS:
|
| 352 |
+
_SIM_RNG = stdlib_random.Random(99)
|
| 353 |
+
env = ViraltestEnvironment()
|
| 354 |
+
obs = env.reset(task=task, seed=42)
|
| 355 |
+
obs_dict = obs.model_dump()
|
| 356 |
+
|
| 357 |
+
rewards: List[float] = []
|
| 358 |
+
energies: List[float] = [obs.creator_energy]
|
| 359 |
+
|
| 360 |
+
for day in range(1, 31):
|
| 361 |
+
action = plan_fn(obs_dict, day)
|
| 362 |
+
obs = env.step(action)
|
| 363 |
+
obs_dict = obs.model_dump()
|
| 364 |
+
r = obs.reward if obs.reward is not None else 0.0
|
| 365 |
+
rewards.append(r)
|
| 366 |
+
energies.append(obs.creator_energy)
|
| 367 |
+
if obs.done:
|
| 368 |
+
break
|
| 369 |
+
|
| 370 |
+
score = (obs.metadata or {}).get("grader_score", 0.0)
|
| 371 |
+
results.append({
|
| 372 |
+
"scenario_id": scenario_id,
|
| 373 |
+
"scenario": label,
|
| 374 |
+
"description": desc,
|
| 375 |
+
"task": task,
|
| 376 |
+
"grader_score": round(score, 4),
|
| 377 |
+
"total_reward": round(sum(rewards), 4),
|
| 378 |
+
"avg_reward": round(sum(rewards) / len(rewards), 4) if rewards else 0,
|
| 379 |
+
"steps": len(rewards),
|
| 380 |
+
"final_energy": round(obs.creator_energy, 3),
|
| 381 |
+
"min_energy": round(min(energies), 3),
|
| 382 |
+
"final_followers": obs.follower_count,
|
| 383 |
+
"follower_delta": obs.follower_count - 10000,
|
| 384 |
+
"burned_out": obs.creator_energy <= 0,
|
| 385 |
+
"rewards": [round(r, 4) for r in rewards],
|
| 386 |
+
"energies": [round(e, 3) for e in energies],
|
| 387 |
+
})
|
| 388 |
+
|
| 389 |
+
return JSONResponse(
|
| 390 |
+
content={"results": results, "tasks": _TRAINING_TASKS, "scenarios": list(SCENARIOS.keys())},
|
| 391 |
+
headers={"Cache-Control": "no-store, max-age=0, must-revalidate"},
|
| 392 |
+
)
|
| 393 |
+
|
| 394 |
+
|
| 395 |
+
@app.get("/dashboard/training", response_class=HTMLResponse)
|
| 396 |
+
async def training_dashboard():
|
| 397 |
+
return _TRAINING_HTML
|
| 398 |
+
|
| 399 |
+
|
| 400 |
def main(host: str = "0.0.0.0", port: int = 8000):
|
| 401 |
import uvicorn
|
| 402 |
uvicorn.run(app, host=host, port=port)
|
server/dashboard.html
CHANGED
|
@@ -35,12 +35,15 @@ body{background:#0b1326;color:#dae2fd;font-family:'Inter',sans-serif}
|
|
| 35 |
<aside class="flex flex-col sticky top-0 h-screen w-64 border-r border-white/5 bg-surface-lowest shadow-2xl shadow-slate-950/50 shrink-0 z-50">
|
| 36 |
<div class="p-6 pb-4">
|
| 37 |
<div class="text-xl font-black tracking-tighter text-transparent bg-clip-text bg-gradient-to-br from-primary to-primary-ctr mb-1">Growth Copilot</div>
|
| 38 |
-
<div class="text-[9px] font-label uppercase tracking-[.2em] text-on-surface-dim/50">
|
| 39 |
</div>
|
| 40 |
<nav class="flex-1 px-3 space-y-1">
|
| 41 |
<a href="/dashboard" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-primary font-bold border-r-2 border-primary bg-gradient-to-r from-primary/10 to-transparent transition-all">
|
| 42 |
<span class="material-symbols-outlined text-[20px]">dashboard</span><span class="font-label text-sm">Dashboard</span>
|
| 43 |
</a>
|
|
|
|
|
|
|
|
|
|
| 44 |
<a href="/web/" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-slate-400 font-medium hover:text-slate-200 hover:bg-white/5 transition-all">
|
| 45 |
<span class="material-symbols-outlined text-[20px]">web</span><span class="font-label text-sm">OpenEnv UI</span>
|
| 46 |
</a>
|
|
@@ -49,9 +52,9 @@ body{background:#0b1326;color:#dae2fd;font-family:'Inter',sans-serif}
|
|
| 49 |
<div class="p-4 border-t border-white/5 space-y-3">
|
| 50 |
<div class="text-[9px] font-label uppercase tracking-widest text-on-surface-dim/60 mb-1">Task</div>
|
| 51 |
<select id="taskSelect" onchange="refreshTaskScoreBlurb()" class="w-full bg-surface border border-outline/30 rounded-lg px-3 py-2 text-sm font-label focus:ring-1 focus:ring-primary focus:outline-none">
|
| 52 |
-
<option value="
|
| 53 |
-
<option value="
|
| 54 |
-
<option value="
|
| 55 |
</select>
|
| 56 |
<button onclick="doReset()" class="w-full py-3 rounded-lg bg-gradient-to-br from-primary to-primary-ctr text-[#23005c] font-bold text-sm hover:opacity-90 transition active:scale-[.97]">
|
| 57 |
<span class="material-symbols-outlined text-[16px] align-middle mr-1">restart_alt</span>Reset
|
|
@@ -358,7 +361,7 @@ body{background:#0b1326;color:#dae2fd;font-family:'Inter',sans-serif}
|
|
| 358 |
<div class="flex flex-col items-end gap-0.5">
|
| 359 |
<div class="flex items-center gap-2">
|
| 360 |
<span id="scenarioCount" class="text-[9px] font-label text-primary font-bold">…</span>
|
| 361 |
-
<span class="text-[9px] font-label text-on-surface-dim">
|
| 362 |
</div>
|
| 363 |
<span class="text-[8px] font-label text-on-surface-dim/70 max-w-[16rem] text-right leading-tight">All strategies below — scroll the grid or search. Count updates after load.</span>
|
| 364 |
</div>
|
|
@@ -489,7 +492,7 @@ body{background:#0b1326;color:#dae2fd;font-family:'Inter',sans-serif}
|
|
| 489 |
|
| 490 |
<script>
|
| 491 |
const API=window.location.origin;
|
| 492 |
-
const EPISODE_DAYS=
|
| 493 |
const DAYS=["Mon","Tue","Wed","Thu","Fri","Sat","Sun"];
|
| 494 |
function fmtAxisNum(v){
|
| 495 |
const a=Math.abs(v);
|
|
@@ -503,9 +506,9 @@ function refreshTaskScoreBlurb(){
|
|
| 503 |
const el=document.getElementById("taskScoreBlurb");
|
| 504 |
if(!el)return;
|
| 505 |
const t=document.getElementById("taskSelect").value;
|
| 506 |
-
if(t==="
|
| 507 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Easy (Engage):</span> final score = min(1, total episode engagement ÷ theoretical maximum). If energy hits 0 at the end, the score is multiplied by 0.3.";
|
| 508 |
-
}else if(t==="
|
| 509 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Medium (Strategic):</span> 35% normalized engagement + 25% tag mix (discovery + top-tag performance) + 25% average energy + 15% days with solid posts. Penalties if energy ever crashes low or you use fewer than 5 unique tags.";
|
| 510 |
}else{
|
| 511 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Hard (Competitive):</span> 25% engagement + 20% tags + 20% follower growth + 15% beating rival avg engagement + 10% differentiated topics + 10% minimum energy floor. Score is 0 if burned out; ×0.5 if fewer than 3 content types; ×0.7 if fewer than 8 unique tags.";
|
|
@@ -1203,7 +1206,7 @@ async function loadHistory(){
|
|
| 1203 |
const data=await r.json();
|
| 1204 |
const tb=document.getElementById("historyTable");
|
| 1205 |
if(!data.length){tb.innerHTML='<tr><td colspan="10" class="px-4 py-6 text-center text-on-surface-dim italic">No history yet — run a simulation</td></tr>';return}
|
| 1206 |
-
const taskLabels={weekly_engage:"Easy",weekly_strategic:"Medium",weekly_competitive:"Hard"};
|
| 1207 |
tb.innerHTML=data.slice().reverse().map(h=>{
|
| 1208 |
const dt=new Date(h.id);
|
| 1209 |
const time=dt.toLocaleDateString("en-US",{month:"short",day:"numeric"})+' '+dt.toLocaleTimeString("en-US",{hour:"2-digit",minute:"2-digit"});
|
|
|
|
| 35 |
<aside class="flex flex-col sticky top-0 h-screen w-64 border-r border-white/5 bg-surface-lowest shadow-2xl shadow-slate-950/50 shrink-0 z-50">
|
| 36 |
<div class="p-6 pb-4">
|
| 37 |
<div class="text-xl font-black tracking-tighter text-transparent bg-clip-text bg-gradient-to-br from-primary to-primary-ctr mb-1">Growth Copilot</div>
|
| 38 |
+
<div class="text-[9px] font-label uppercase tracking-[.2em] text-on-surface-dim/50">30-day creator simulation</div>
|
| 39 |
</div>
|
| 40 |
<nav class="flex-1 px-3 space-y-1">
|
| 41 |
<a href="/dashboard" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-primary font-bold border-r-2 border-primary bg-gradient-to-r from-primary/10 to-transparent transition-all">
|
| 42 |
<span class="material-symbols-outlined text-[20px]">dashboard</span><span class="font-label text-sm">Dashboard</span>
|
| 43 |
</a>
|
| 44 |
+
<a href="/dashboard/training" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-slate-400 font-medium hover:text-slate-200 hover:bg-white/5 transition-all">
|
| 45 |
+
<span class="material-symbols-outlined text-[20px]">science</span><span class="font-label text-sm">Training Evidence</span>
|
| 46 |
+
</a>
|
| 47 |
<a href="/web/" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-slate-400 font-medium hover:text-slate-200 hover:bg-white/5 transition-all">
|
| 48 |
<span class="material-symbols-outlined text-[20px]">web</span><span class="font-label text-sm">OpenEnv UI</span>
|
| 49 |
</a>
|
|
|
|
| 52 |
<div class="p-4 border-t border-white/5 space-y-3">
|
| 53 |
<div class="text-[9px] font-label uppercase tracking-widest text-on-surface-dim/60 mb-1">Task</div>
|
| 54 |
<select id="taskSelect" onchange="refreshTaskScoreBlurb()" class="w-full bg-surface border border-outline/30 rounded-lg px-3 py-2 text-sm font-label focus:ring-1 focus:ring-primary focus:outline-none">
|
| 55 |
+
<option value="monthly_engage">Easy — Engage</option>
|
| 56 |
+
<option value="monthly_strategic">Medium — Strategic</option>
|
| 57 |
+
<option value="monthly_competitive" selected>Hard — Competitive</option>
|
| 58 |
</select>
|
| 59 |
<button onclick="doReset()" class="w-full py-3 rounded-lg bg-gradient-to-br from-primary to-primary-ctr text-[#23005c] font-bold text-sm hover:opacity-90 transition active:scale-[.97]">
|
| 60 |
<span class="material-symbols-outlined text-[16px] align-middle mr-1">restart_alt</span>Reset
|
|
|
|
| 361 |
<div class="flex flex-col items-end gap-0.5">
|
| 362 |
<div class="flex items-center gap-2">
|
| 363 |
<span id="scenarioCount" class="text-[9px] font-label text-primary font-bold">…</span>
|
| 364 |
+
<span class="text-[9px] font-label text-on-surface-dim">30-day episode</span>
|
| 365 |
</div>
|
| 366 |
<span class="text-[8px] font-label text-on-surface-dim/70 max-w-[16rem] text-right leading-tight">All strategies below — scroll the grid or search. Count updates after load.</span>
|
| 367 |
</div>
|
|
|
|
| 492 |
|
| 493 |
<script>
|
| 494 |
const API=window.location.origin;
|
| 495 |
+
const EPISODE_DAYS=30;
|
| 496 |
const DAYS=["Mon","Tue","Wed","Thu","Fri","Sat","Sun"];
|
| 497 |
function fmtAxisNum(v){
|
| 498 |
const a=Math.abs(v);
|
|
|
|
| 506 |
const el=document.getElementById("taskScoreBlurb");
|
| 507 |
if(!el)return;
|
| 508 |
const t=document.getElementById("taskSelect").value;
|
| 509 |
+
if(t==="monthly_engage"){
|
| 510 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Easy (Engage):</span> final score = min(1, total episode engagement ÷ theoretical maximum). If energy hits 0 at the end, the score is multiplied by 0.3.";
|
| 511 |
+
}else if(t==="monthly_strategic"){
|
| 512 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Medium (Strategic):</span> 35% normalized engagement + 25% tag mix (discovery + top-tag performance) + 25% average energy + 15% days with solid posts. Penalties if energy ever crashes low or you use fewer than 5 unique tags.";
|
| 513 |
}else{
|
| 514 |
el.innerHTML="<span class=\"text-on-surface font-semibold\">Hard (Competitive):</span> 25% engagement + 20% tags + 20% follower growth + 15% beating rival avg engagement + 10% differentiated topics + 10% minimum energy floor. Score is 0 if burned out; ×0.5 if fewer than 3 content types; ×0.7 if fewer than 8 unique tags.";
|
|
|
|
| 1206 |
const data=await r.json();
|
| 1207 |
const tb=document.getElementById("historyTable");
|
| 1208 |
if(!data.length){tb.innerHTML='<tr><td colspan="10" class="px-4 py-6 text-center text-on-surface-dim italic">No history yet — run a simulation</td></tr>';return}
|
| 1209 |
+
const taskLabels={monthly_engage:"Easy",monthly_strategic:"Medium",monthly_competitive:"Hard",weekly_engage:"Easy",weekly_strategic:"Medium",weekly_competitive:"Hard"};
|
| 1210 |
tb.innerHTML=data.slice().reverse().map(h=>{
|
| 1211 |
const dt=new Date(h.id);
|
| 1212 |
const time=dt.toLocaleDateString("en-US",{month:"short",day:"numeric"})+' '+dt.toLocaleTimeString("en-US",{hour:"2-digit",minute:"2-digit"});
|
server/simulation_history.json
CHANGED
|
@@ -1,1802 +1 @@
|
|
| 1 |
-
[
|
| 2 |
-
{
|
| 3 |
-
"id": "2026-04-05T10:50:54.850500+00:00",
|
| 4 |
-
"scenario": "Always Rest",
|
| 5 |
-
"scenario_id": "always_rest",
|
| 6 |
-
"task": "weekly_competitive",
|
| 7 |
-
"score": 0.035,
|
| 8 |
-
"total_steps": 168,
|
| 9 |
-
"total_posts": 0,
|
| 10 |
-
"avg_reward": 0.15,
|
| 11 |
-
"final": {
|
| 12 |
-
"energy": 1.0,
|
| 13 |
-
"hours_since_sleep": 1,
|
| 14 |
-
"sleep_debt": 0.0,
|
| 15 |
-
"followers": 5497,
|
| 16 |
-
"engagement_rate": 0.0,
|
| 17 |
-
"burned_out": false
|
| 18 |
-
}
|
| 19 |
-
},
|
| 20 |
-
{
|
| 21 |
-
"id": "2026-04-05T10:50:54.859097+00:00",
|
| 22 |
-
"scenario": "Anti-Trend",
|
| 23 |
-
"scenario_id": "anti_trend",
|
| 24 |
-
"task": "weekly_competitive",
|
| 25 |
-
"score": 0.2316,
|
| 26 |
-
"total_steps": 168,
|
| 27 |
-
"total_posts": 14,
|
| 28 |
-
"avg_reward": 0.2201,
|
| 29 |
-
"final": {
|
| 30 |
-
"energy": 1.0,
|
| 31 |
-
"hours_since_sleep": 1,
|
| 32 |
-
"sleep_debt": 0.0,
|
| 33 |
-
"followers": 11125,
|
| 34 |
-
"engagement_rate": 0.747,
|
| 35 |
-
"burned_out": false
|
| 36 |
-
}
|
| 37 |
-
},
|
| 38 |
-
{
|
| 39 |
-
"id": "2026-04-05T10:50:54.868624+00:00",
|
| 40 |
-
"scenario": "Bad Timing",
|
| 41 |
-
"scenario_id": "bad_timing",
|
| 42 |
-
"task": "weekly_competitive",
|
| 43 |
-
"score": 0.0937,
|
| 44 |
-
"total_steps": 168,
|
| 45 |
-
"total_posts": 49,
|
| 46 |
-
"avg_reward": 0.1611,
|
| 47 |
-
"final": {
|
| 48 |
-
"energy": 0.59,
|
| 49 |
-
"hours_since_sleep": 5,
|
| 50 |
-
"sleep_debt": 0.0,
|
| 51 |
-
"followers": 10237,
|
| 52 |
-
"engagement_rate": 0.0358,
|
| 53 |
-
"burned_out": false
|
| 54 |
-
}
|
| 55 |
-
},
|
| 56 |
-
{
|
| 57 |
-
"id": "2026-04-05T10:50:54.878099+00:00",
|
| 58 |
-
"scenario": "Balanced Creator",
|
| 59 |
-
"scenario_id": "balanced",
|
| 60 |
-
"task": "weekly_competitive",
|
| 61 |
-
"score": 0.8775,
|
| 62 |
-
"total_steps": 168,
|
| 63 |
-
"total_posts": 28,
|
| 64 |
-
"avg_reward": 0.2187,
|
| 65 |
-
"final": {
|
| 66 |
-
"energy": 1.0,
|
| 67 |
-
"hours_since_sleep": 2,
|
| 68 |
-
"sleep_debt": 0.0,
|
| 69 |
-
"followers": 12534,
|
| 70 |
-
"engagement_rate": 0.8273,
|
| 71 |
-
"burned_out": false
|
| 72 |
-
}
|
| 73 |
-
},
|
| 74 |
-
{
|
| 75 |
-
"id": "2026-04-05T10:50:54.891038+00:00",
|
| 76 |
-
"scenario": "Burst Poster",
|
| 77 |
-
"scenario_id": "burst",
|
| 78 |
-
"task": "weekly_competitive",
|
| 79 |
-
"score": 0.6111,
|
| 80 |
-
"total_steps": 168,
|
| 81 |
-
"total_posts": 57,
|
| 82 |
-
"avg_reward": 0.2318,
|
| 83 |
-
"final": {
|
| 84 |
-
"energy": 0.44,
|
| 85 |
-
"hours_since_sleep": 1,
|
| 86 |
-
"sleep_debt": 0.0,
|
| 87 |
-
"followers": 11701,
|
| 88 |
-
"engagement_rate": 0.2076,
|
| 89 |
-
"burned_out": false
|
| 90 |
-
}
|
| 91 |
-
},
|
| 92 |
-
{
|
| 93 |
-
"id": "2026-04-05T10:50:54.901147+00:00",
|
| 94 |
-
"scenario": "Carousel Only",
|
| 95 |
-
"scenario_id": "carousel_only",
|
| 96 |
-
"task": "weekly_competitive",
|
| 97 |
-
"score": 0.417,
|
| 98 |
-
"total_steps": 168,
|
| 99 |
-
"total_posts": 14,
|
| 100 |
-
"avg_reward": 0.2353,
|
| 101 |
-
"final": {
|
| 102 |
-
"energy": 1.0,
|
| 103 |
-
"hours_since_sleep": 1,
|
| 104 |
-
"sleep_debt": 0.0,
|
| 105 |
-
"followers": 12074,
|
| 106 |
-
"engagement_rate": 1.3175,
|
| 107 |
-
"burned_out": false
|
| 108 |
-
}
|
| 109 |
-
},
|
| 110 |
-
{
|
| 111 |
-
"id": "2026-04-05T10:50:54.911264+00:00",
|
| 112 |
-
"scenario": "Competitor Avoider",
|
| 113 |
-
"scenario_id": "comp_avoider",
|
| 114 |
-
"task": "weekly_competitive",
|
| 115 |
-
"score": 0.446,
|
| 116 |
-
"total_steps": 168,
|
| 117 |
-
"total_posts": 14,
|
| 118 |
-
"avg_reward": 0.2365,
|
| 119 |
-
"final": {
|
| 120 |
-
"energy": 1.0,
|
| 121 |
-
"hours_since_sleep": 1,
|
| 122 |
-
"sleep_debt": 0.0,
|
| 123 |
-
"followers": 12678,
|
| 124 |
-
"engagement_rate": 1.8163,
|
| 125 |
-
"burned_out": false
|
| 126 |
-
}
|
| 127 |
-
},
|
| 128 |
-
{
|
| 129 |
-
"id": "2026-04-05T10:50:54.921231+00:00",
|
| 130 |
-
"scenario": "Conservative Energy",
|
| 131 |
-
"scenario_id": "conservative",
|
| 132 |
-
"task": "weekly_competitive",
|
| 133 |
-
"score": 0.2181,
|
| 134 |
-
"total_steps": 168,
|
| 135 |
-
"total_posts": 7,
|
| 136 |
-
"avg_reward": 0.1967,
|
| 137 |
-
"final": {
|
| 138 |
-
"energy": 1.0,
|
| 139 |
-
"hours_since_sleep": 1,
|
| 140 |
-
"sleep_debt": 0.0,
|
| 141 |
-
"followers": 10239,
|
| 142 |
-
"engagement_rate": 0.3439,
|
| 143 |
-
"burned_out": false
|
| 144 |
-
}
|
| 145 |
-
},
|
| 146 |
-
{
|
| 147 |
-
"id": "2026-04-05T10:50:54.931980+00:00",
|
| 148 |
-
"scenario": "Content Creator",
|
| 149 |
-
"scenario_id": "content_creator",
|
| 150 |
-
"task": "weekly_competitive",
|
| 151 |
-
"score": 0.6434,
|
| 152 |
-
"total_steps": 168,
|
| 153 |
-
"total_posts": 12,
|
| 154 |
-
"avg_reward": 0.2065,
|
| 155 |
-
"final": {
|
| 156 |
-
"energy": 0.309,
|
| 157 |
-
"hours_since_sleep": 28,
|
| 158 |
-
"sleep_debt": 0.017,
|
| 159 |
-
"followers": 10931,
|
| 160 |
-
"engagement_rate": 0.525,
|
| 161 |
-
"burned_out": false
|
| 162 |
-
}
|
| 163 |
-
},
|
| 164 |
-
{
|
| 165 |
-
"id": "2026-04-05T10:50:54.942037+00:00",
|
| 166 |
-
"scenario": "Copycat",
|
| 167 |
-
"scenario_id": "copycat",
|
| 168 |
-
"task": "weekly_competitive",
|
| 169 |
-
"score": 0.6136,
|
| 170 |
-
"total_steps": 168,
|
| 171 |
-
"total_posts": 21,
|
| 172 |
-
"avg_reward": 0.1887,
|
| 173 |
-
"final": {
|
| 174 |
-
"energy": 1.0,
|
| 175 |
-
"hours_since_sleep": 1,
|
| 176 |
-
"sleep_debt": 0.0,
|
| 177 |
-
"followers": 11589,
|
| 178 |
-
"engagement_rate": 0.497,
|
| 179 |
-
"burned_out": false
|
| 180 |
-
}
|
| 181 |
-
},
|
| 182 |
-
{
|
| 183 |
-
"id": "2026-04-05T10:50:54.951850+00:00",
|
| 184 |
-
"scenario": "Creator Economy",
|
| 185 |
-
"scenario_id": "creator_economy",
|
| 186 |
-
"task": "weekly_competitive",
|
| 187 |
-
"score": 0.2515,
|
| 188 |
-
"total_steps": 168,
|
| 189 |
-
"total_posts": 14,
|
| 190 |
-
"avg_reward": 0.2226,
|
| 191 |
-
"final": {
|
| 192 |
-
"energy": 1.0,
|
| 193 |
-
"hours_since_sleep": 1,
|
| 194 |
-
"sleep_debt": 0.0,
|
| 195 |
-
"followers": 11994,
|
| 196 |
-
"engagement_rate": 1.3918,
|
| 197 |
-
"burned_out": false
|
| 198 |
-
}
|
| 199 |
-
},
|
| 200 |
-
{
|
| 201 |
-
"id": "2026-04-05T10:50:54.961166+00:00",
|
| 202 |
-
"scenario": "Crypto/Web3",
|
| 203 |
-
"scenario_id": "crypto_niche",
|
| 204 |
-
"task": "weekly_competitive",
|
| 205 |
-
"score": 0.2879,
|
| 206 |
-
"total_steps": 168,
|
| 207 |
-
"total_posts": 14,
|
| 208 |
-
"avg_reward": 0.2324,
|
| 209 |
-
"final": {
|
| 210 |
-
"energy": 1.0,
|
| 211 |
-
"hours_since_sleep": 1,
|
| 212 |
-
"sleep_debt": 0.0,
|
| 213 |
-
"followers": 12444,
|
| 214 |
-
"engagement_rate": 1.6187,
|
| 215 |
-
"burned_out": false
|
| 216 |
-
}
|
| 217 |
-
},
|
| 218 |
-
{
|
| 219 |
-
"id": "2026-04-05T10:50:54.970461+00:00",
|
| 220 |
-
"scenario": "Double Peak",
|
| 221 |
-
"scenario_id": "double_peak",
|
| 222 |
-
"task": "weekly_competitive",
|
| 223 |
-
"score": 0.4519,
|
| 224 |
-
"total_steps": 168,
|
| 225 |
-
"total_posts": 14,
|
| 226 |
-
"avg_reward": 0.2352,
|
| 227 |
-
"final": {
|
| 228 |
-
"energy": 1.0,
|
| 229 |
-
"hours_since_sleep": 1,
|
| 230 |
-
"sleep_debt": 0.0,
|
| 231 |
-
"followers": 13138,
|
| 232 |
-
"engagement_rate": 2.0814,
|
| 233 |
-
"burned_out": false
|
| 234 |
-
}
|
| 235 |
-
},
|
| 236 |
-
{
|
| 237 |
-
"id": "2026-04-05T10:50:54.980718+00:00",
|
| 238 |
-
"scenario": "Early Bird",
|
| 239 |
-
"scenario_id": "early_bird",
|
| 240 |
-
"task": "weekly_competitive",
|
| 241 |
-
"score": 0.2075,
|
| 242 |
-
"total_steps": 168,
|
| 243 |
-
"total_posts": 16,
|
| 244 |
-
"avg_reward": 0.2284,
|
| 245 |
-
"final": {
|
| 246 |
-
"energy": 0.62,
|
| 247 |
-
"hours_since_sleep": 2,
|
| 248 |
-
"sleep_debt": 0.0,
|
| 249 |
-
"followers": 10818,
|
| 250 |
-
"engagement_rate": 0.4138,
|
| 251 |
-
"burned_out": false
|
| 252 |
-
}
|
| 253 |
-
},
|
| 254 |
-
{
|
| 255 |
-
"id": "2026-04-05T10:50:54.989979+00:00",
|
| 256 |
-
"scenario": "Energy Saver",
|
| 257 |
-
"scenario_id": "energy_saver",
|
| 258 |
-
"task": "weekly_competitive",
|
| 259 |
-
"score": 0.3744,
|
| 260 |
-
"total_steps": 168,
|
| 261 |
-
"total_posts": 7,
|
| 262 |
-
"avg_reward": 0.2111,
|
| 263 |
-
"final": {
|
| 264 |
-
"energy": 1.0,
|
| 265 |
-
"hours_since_sleep": 1,
|
| 266 |
-
"sleep_debt": 0.0,
|
| 267 |
-
"followers": 11080,
|
| 268 |
-
"engagement_rate": 1.5483,
|
| 269 |
-
"burned_out": false
|
| 270 |
-
}
|
| 271 |
-
},
|
| 272 |
-
{
|
| 273 |
-
"id": "2026-04-05T10:50:55.000118+00:00",
|
| 274 |
-
"scenario": "Engagement Chaser",
|
| 275 |
-
"scenario_id": "engagement_chaser",
|
| 276 |
-
"task": "weekly_competitive",
|
| 277 |
-
"score": 0.4194,
|
| 278 |
-
"total_steps": 168,
|
| 279 |
-
"total_posts": 21,
|
| 280 |
-
"avg_reward": 0.2224,
|
| 281 |
-
"final": {
|
| 282 |
-
"energy": 1.0,
|
| 283 |
-
"hours_since_sleep": 1,
|
| 284 |
-
"sleep_debt": 0.0,
|
| 285 |
-
"followers": 15287,
|
| 286 |
-
"engagement_rate": 2.2466,
|
| 287 |
-
"burned_out": false
|
| 288 |
-
}
|
| 289 |
-
},
|
| 290 |
-
{
|
| 291 |
-
"id": "2026-04-05T10:50:55.009873+00:00",
|
| 292 |
-
"scenario": "Events/News",
|
| 293 |
-
"scenario_id": "events",
|
| 294 |
-
"task": "weekly_competitive",
|
| 295 |
-
"score": 0.158,
|
| 296 |
-
"total_steps": 168,
|
| 297 |
-
"total_posts": 4,
|
| 298 |
-
"avg_reward": 0.1732,
|
| 299 |
-
"final": {
|
| 300 |
-
"energy": 1.0,
|
| 301 |
-
"hours_since_sleep": 1,
|
| 302 |
-
"sleep_debt": 0.0,
|
| 303 |
-
"followers": 7491,
|
| 304 |
-
"engagement_rate": 1.4388,
|
| 305 |
-
"burned_out": false
|
| 306 |
-
}
|
| 307 |
-
},
|
| 308 |
-
{
|
| 309 |
-
"id": "2026-04-05T10:50:55.018674+00:00",
|
| 310 |
-
"scenario": "Fashion Content",
|
| 311 |
-
"scenario_id": "fashion",
|
| 312 |
-
"task": "weekly_competitive",
|
| 313 |
-
"score": 0.2181,
|
| 314 |
-
"total_steps": 168,
|
| 315 |
-
"total_posts": 14,
|
| 316 |
-
"avg_reward": 0.2147,
|
| 317 |
-
"final": {
|
| 318 |
-
"energy": 1.0,
|
| 319 |
-
"hours_since_sleep": 1,
|
| 320 |
-
"sleep_debt": 0.0,
|
| 321 |
-
"followers": 11135,
|
| 322 |
-
"engagement_rate": 0.7898,
|
| 323 |
-
"burned_out": false
|
| 324 |
-
}
|
| 325 |
-
},
|
| 326 |
-
{
|
| 327 |
-
"id": "2026-04-05T10:50:55.027894+00:00",
|
| 328 |
-
"scenario": "Food Creator",
|
| 329 |
-
"scenario_id": "food_creator",
|
| 330 |
-
"task": "weekly_competitive",
|
| 331 |
-
"score": 0.2612,
|
| 332 |
-
"total_steps": 168,
|
| 333 |
-
"total_posts": 15,
|
| 334 |
-
"avg_reward": 0.2293,
|
| 335 |
-
"final": {
|
| 336 |
-
"energy": 0.7,
|
| 337 |
-
"hours_since_sleep": 2,
|
| 338 |
-
"sleep_debt": 0.0,
|
| 339 |
-
"followers": 12091,
|
| 340 |
-
"engagement_rate": 1.1978,
|
| 341 |
-
"burned_out": false
|
| 342 |
-
}
|
| 343 |
-
},
|
| 344 |
-
{
|
| 345 |
-
"id": "2026-04-05T10:50:55.037230+00:00",
|
| 346 |
-
"scenario": "Gaming Niche",
|
| 347 |
-
"scenario_id": "gaming_niche",
|
| 348 |
-
"task": "weekly_competitive",
|
| 349 |
-
"score": 0.2188,
|
| 350 |
-
"total_steps": 168,
|
| 351 |
-
"total_posts": 14,
|
| 352 |
-
"avg_reward": 0.2062,
|
| 353 |
-
"final": {
|
| 354 |
-
"energy": 1.0,
|
| 355 |
-
"hours_since_sleep": 1,
|
| 356 |
-
"sleep_debt": 0.0,
|
| 357 |
-
"followers": 11364,
|
| 358 |
-
"engagement_rate": 0.9138,
|
| 359 |
-
"burned_out": false
|
| 360 |
-
}
|
| 361 |
-
},
|
| 362 |
-
{
|
| 363 |
-
"id": "2026-04-05T10:50:55.047589+00:00",
|
| 364 |
-
"scenario": "Growth Focus",
|
| 365 |
-
"scenario_id": "growth_focus",
|
| 366 |
-
"task": "weekly_competitive",
|
| 367 |
-
"score": 0.2764,
|
| 368 |
-
"total_steps": 168,
|
| 369 |
-
"total_posts": 14,
|
| 370 |
-
"avg_reward": 0.2205,
|
| 371 |
-
"final": {
|
| 372 |
-
"energy": 1.0,
|
| 373 |
-
"hours_since_sleep": 1,
|
| 374 |
-
"sleep_debt": 0.0,
|
| 375 |
-
"followers": 12621,
|
| 376 |
-
"engagement_rate": 1.7101,
|
| 377 |
-
"burned_out": false
|
| 378 |
-
}
|
| 379 |
-
},
|
| 380 |
-
{
|
| 381 |
-
"id": "2026-04-05T10:50:55.059854+00:00",
|
| 382 |
-
"scenario": "High Frequency",
|
| 383 |
-
"scenario_id": "high_freq",
|
| 384 |
-
"task": "weekly_competitive",
|
| 385 |
-
"score": 0.8611,
|
| 386 |
-
"total_steps": 168,
|
| 387 |
-
"total_posts": 22,
|
| 388 |
-
"avg_reward": 0.2058,
|
| 389 |
-
"final": {
|
| 390 |
-
"energy": 0.92,
|
| 391 |
-
"hours_since_sleep": 2,
|
| 392 |
-
"sleep_debt": 0.0,
|
| 393 |
-
"followers": 12654,
|
| 394 |
-
"engagement_rate": 1.079,
|
| 395 |
-
"burned_out": false
|
| 396 |
-
}
|
| 397 |
-
},
|
| 398 |
-
{
|
| 399 |
-
"id": "2026-04-05T10:50:55.072522+00:00",
|
| 400 |
-
"scenario": "Lifestyle Niche",
|
| 401 |
-
"scenario_id": "lifestyle_niche",
|
| 402 |
-
"task": "weekly_competitive",
|
| 403 |
-
"score": 0.2612,
|
| 404 |
-
"total_steps": 168,
|
| 405 |
-
"total_posts": 14,
|
| 406 |
-
"avg_reward": 0.2288,
|
| 407 |
-
"final": {
|
| 408 |
-
"energy": 1.0,
|
| 409 |
-
"hours_since_sleep": 1,
|
| 410 |
-
"sleep_debt": 0.0,
|
| 411 |
-
"followers": 12251,
|
| 412 |
-
"engagement_rate": 1.6295,
|
| 413 |
-
"burned_out": false
|
| 414 |
-
}
|
| 415 |
-
},
|
| 416 |
-
{
|
| 417 |
-
"id": "2026-04-05T10:50:55.081957+00:00",
|
| 418 |
-
"scenario": "Low Frequency",
|
| 419 |
-
"scenario_id": "low_freq",
|
| 420 |
-
"task": "weekly_competitive",
|
| 421 |
-
"score": 0.3241,
|
| 422 |
-
"total_steps": 168,
|
| 423 |
-
"total_posts": 4,
|
| 424 |
-
"avg_reward": 0.1768,
|
| 425 |
-
"final": {
|
| 426 |
-
"energy": 1.0,
|
| 427 |
-
"hours_since_sleep": 1,
|
| 428 |
-
"sleep_debt": 0.0,
|
| 429 |
-
"followers": 10461,
|
| 430 |
-
"engagement_rate": 1.1563,
|
| 431 |
-
"burned_out": false
|
| 432 |
-
}
|
| 433 |
-
},
|
| 434 |
-
{
|
| 435 |
-
"id": "2026-04-05T10:50:55.089553+00:00",
|
| 436 |
-
"scenario": "Marathon Runner",
|
| 437 |
-
"scenario_id": "marathon",
|
| 438 |
-
"task": "weekly_competitive",
|
| 439 |
-
"score": 0.0,
|
| 440 |
-
"total_steps": 50,
|
| 441 |
-
"total_posts": 9,
|
| 442 |
-
"avg_reward": 0.1323,
|
| 443 |
-
"final": {
|
| 444 |
-
"energy": 0.0,
|
| 445 |
-
"hours_since_sleep": 22,
|
| 446 |
-
"sleep_debt": 0.028,
|
| 447 |
-
"followers": 10137,
|
| 448 |
-
"engagement_rate": 0.157,
|
| 449 |
-
"burned_out": true
|
| 450 |
-
}
|
| 451 |
-
},
|
| 452 |
-
{
|
| 453 |
-
"id": "2026-04-05T10:50:55.095782+00:00",
|
| 454 |
-
"scenario": "Midday Focus",
|
| 455 |
-
"scenario_id": "midday",
|
| 456 |
-
"task": "weekly_competitive",
|
| 457 |
-
"score": 0.4317,
|
| 458 |
-
"total_steps": 168,
|
| 459 |
-
"total_posts": 14,
|
| 460 |
-
"avg_reward": 0.2306,
|
| 461 |
-
"final": {
|
| 462 |
-
"energy": 1.0,
|
| 463 |
-
"hours_since_sleep": 1,
|
| 464 |
-
"sleep_debt": 0.0,
|
| 465 |
-
"followers": 13537,
|
| 466 |
-
"engagement_rate": 2.3076,
|
| 467 |
-
"burned_out": false
|
| 468 |
-
}
|
| 469 |
-
},
|
| 470 |
-
{
|
| 471 |
-
"id": "2026-04-05T10:50:55.106103+00:00",
|
| 472 |
-
"scenario": "Minimal Poster",
|
| 473 |
-
"scenario_id": "minimal",
|
| 474 |
-
"task": "weekly_competitive",
|
| 475 |
-
"score": 0.3658,
|
| 476 |
-
"total_steps": 168,
|
| 477 |
-
"total_posts": 7,
|
| 478 |
-
"avg_reward": 0.2039,
|
| 479 |
-
"final": {
|
| 480 |
-
"energy": 1.0,
|
| 481 |
-
"hours_since_sleep": 1,
|
| 482 |
-
"sleep_debt": 0.0,
|
| 483 |
-
"followers": 10907,
|
| 484 |
-
"engagement_rate": 1.3002,
|
| 485 |
-
"burned_out": false
|
| 486 |
-
}
|
| 487 |
-
},
|
| 488 |
-
{
|
| 489 |
-
"id": "2026-04-05T10:50:55.116369+00:00",
|
| 490 |
-
"scenario": "ML/AI Deep Dive",
|
| 491 |
-
"scenario_id": "ml_deep",
|
| 492 |
-
"task": "weekly_competitive",
|
| 493 |
-
"score": 0.2266,
|
| 494 |
-
"total_steps": 168,
|
| 495 |
-
"total_posts": 14,
|
| 496 |
-
"avg_reward": 0.2197,
|
| 497 |
-
"final": {
|
| 498 |
-
"energy": 1.0,
|
| 499 |
-
"hours_since_sleep": 1,
|
| 500 |
-
"sleep_debt": 0.0,
|
| 501 |
-
"followers": 11180,
|
| 502 |
-
"engagement_rate": 0.7014,
|
| 503 |
-
"burned_out": false
|
| 504 |
-
}
|
| 505 |
-
},
|
| 506 |
-
{
|
| 507 |
-
"id": "2026-04-05T10:50:55.125451+00:00",
|
| 508 |
-
"scenario": "Monday Motivation",
|
| 509 |
-
"scenario_id": "monday",
|
| 510 |
-
"task": "weekly_competitive",
|
| 511 |
-
"score": 0.2606,
|
| 512 |
-
"total_steps": 168,
|
| 513 |
-
"total_posts": 4,
|
| 514 |
-
"avg_reward": 0.159,
|
| 515 |
-
"final": {
|
| 516 |
-
"energy": 0.75,
|
| 517 |
-
"hours_since_sleep": 2,
|
| 518 |
-
"sleep_debt": 0.0,
|
| 519 |
-
"followers": 5827,
|
| 520 |
-
"engagement_rate": 0.911,
|
| 521 |
-
"burned_out": false
|
| 522 |
-
}
|
| 523 |
-
},
|
| 524 |
-
{
|
| 525 |
-
"id": "2026-04-05T10:50:55.134737+00:00",
|
| 526 |
-
"scenario": "Napper",
|
| 527 |
-
"scenario_id": "napper",
|
| 528 |
-
"task": "weekly_competitive",
|
| 529 |
-
"score": 0.3623,
|
| 530 |
-
"total_steps": 168,
|
| 531 |
-
"total_posts": 14,
|
| 532 |
-
"avg_reward": 0.2264,
|
| 533 |
-
"final": {
|
| 534 |
-
"energy": 1.0,
|
| 535 |
-
"hours_since_sleep": 1,
|
| 536 |
-
"sleep_debt": 0.0,
|
| 537 |
-
"followers": 11322,
|
| 538 |
-
"engagement_rate": 0.8914,
|
| 539 |
-
"burned_out": false
|
| 540 |
-
}
|
| 541 |
-
},
|
| 542 |
-
{
|
| 543 |
-
"id": "2026-04-05T10:50:55.144641+00:00",
|
| 544 |
-
"scenario": "Night Owl",
|
| 545 |
-
"scenario_id": "night_owl",
|
| 546 |
-
"task": "weekly_competitive",
|
| 547 |
-
"score": 0.266,
|
| 548 |
-
"total_steps": 168,
|
| 549 |
-
"total_posts": 14,
|
| 550 |
-
"avg_reward": 0.194,
|
| 551 |
-
"final": {
|
| 552 |
-
"energy": 1.0,
|
| 553 |
-
"hours_since_sleep": 1,
|
| 554 |
-
"sleep_debt": 0.0,
|
| 555 |
-
"followers": 11927,
|
| 556 |
-
"engagement_rate": 1.328,
|
| 557 |
-
"burned_out": false
|
| 558 |
-
}
|
| 559 |
-
},
|
| 560 |
-
{
|
| 561 |
-
"id": "2026-04-05T10:50:55.153554+00:00",
|
| 562 |
-
"scenario": "Night Shift",
|
| 563 |
-
"scenario_id": "night_shift",
|
| 564 |
-
"task": "weekly_competitive",
|
| 565 |
-
"score": 0.2105,
|
| 566 |
-
"total_steps": 168,
|
| 567 |
-
"total_posts": 16,
|
| 568 |
-
"avg_reward": 0.2453,
|
| 569 |
-
"final": {
|
| 570 |
-
"energy": 1.0,
|
| 571 |
-
"hours_since_sleep": 1,
|
| 572 |
-
"sleep_debt": 0.0,
|
| 573 |
-
"followers": 11069,
|
| 574 |
-
"engagement_rate": 0.5602,
|
| 575 |
-
"burned_out": false
|
| 576 |
-
}
|
| 577 |
-
},
|
| 578 |
-
{
|
| 579 |
-
"id": "2026-04-05T10:50:55.159353+00:00",
|
| 580 |
-
"scenario": "No Rest",
|
| 581 |
-
"scenario_id": "no_rest",
|
| 582 |
-
"task": "weekly_competitive",
|
| 583 |
-
"score": 0.0,
|
| 584 |
-
"total_steps": 8,
|
| 585 |
-
"total_posts": 8,
|
| 586 |
-
"avg_reward": 0.2686,
|
| 587 |
-
"final": {
|
| 588 |
-
"energy": 0.0,
|
| 589 |
-
"hours_since_sleep": 10,
|
| 590 |
-
"sleep_debt": 0.0,
|
| 591 |
-
"followers": 10213,
|
| 592 |
-
"engagement_rate": 0.2732,
|
| 593 |
-
"burned_out": true
|
| 594 |
-
}
|
| 595 |
-
},
|
| 596 |
-
{
|
| 597 |
-
"id": "2026-04-05T10:50:55.164846+00:00",
|
| 598 |
-
"scenario": "Optimal Sleep",
|
| 599 |
-
"scenario_id": "optimal_sleep",
|
| 600 |
-
"task": "weekly_competitive",
|
| 601 |
-
"score": 0.3635,
|
| 602 |
-
"total_steps": 168,
|
| 603 |
-
"total_posts": 14,
|
| 604 |
-
"avg_reward": 0.2257,
|
| 605 |
-
"final": {
|
| 606 |
-
"energy": 0.9,
|
| 607 |
-
"hours_since_sleep": 3,
|
| 608 |
-
"sleep_debt": 0.0,
|
| 609 |
-
"followers": 11305,
|
| 610 |
-
"engagement_rate": 0.8729,
|
| 611 |
-
"burned_out": false
|
| 612 |
-
}
|
| 613 |
-
},
|
| 614 |
-
{
|
| 615 |
-
"id": "2026-04-05T10:50:55.174882+00:00",
|
| 616 |
-
"scenario": "Photography Focus",
|
| 617 |
-
"scenario_id": "photography",
|
| 618 |
-
"task": "weekly_competitive",
|
| 619 |
-
"score": 0.1838,
|
| 620 |
-
"total_steps": 168,
|
| 621 |
-
"total_posts": 16,
|
| 622 |
-
"avg_reward": 0.22,
|
| 623 |
-
"final": {
|
| 624 |
-
"energy": 0.5,
|
| 625 |
-
"hours_since_sleep": 3,
|
| 626 |
-
"sleep_debt": 0.0,
|
| 627 |
-
"followers": 10736,
|
| 628 |
-
"engagement_rate": 0.4388,
|
| 629 |
-
"burned_out": false
|
| 630 |
-
}
|
| 631 |
-
},
|
| 632 |
-
{
|
| 633 |
-
"id": "2026-04-05T10:50:55.184216+00:00",
|
| 634 |
-
"scenario": "Productivity Guru",
|
| 635 |
-
"scenario_id": "productivity",
|
| 636 |
-
"task": "weekly_competitive",
|
| 637 |
-
"score": 0.184,
|
| 638 |
-
"total_steps": 168,
|
| 639 |
-
"total_posts": 16,
|
| 640 |
-
"avg_reward": 0.227,
|
| 641 |
-
"final": {
|
| 642 |
-
"energy": 0.62,
|
| 643 |
-
"hours_since_sleep": 2,
|
| 644 |
-
"sleep_debt": 0.0,
|
| 645 |
-
"followers": 10741,
|
| 646 |
-
"engagement_rate": 0.3797,
|
| 647 |
-
"burned_out": false
|
| 648 |
-
}
|
| 649 |
-
},
|
| 650 |
-
{
|
| 651 |
-
"id": "2026-04-05T10:50:55.192896+00:00",
|
| 652 |
-
"scenario": "Queue Heavy",
|
| 653 |
-
"scenario_id": "queue_heavy",
|
| 654 |
-
"task": "weekly_competitive",
|
| 655 |
-
"score": 0.1933,
|
| 656 |
-
"total_steps": 168,
|
| 657 |
-
"total_posts": 8,
|
| 658 |
-
"avg_reward": 0.1923,
|
| 659 |
-
"final": {
|
| 660 |
-
"energy": 1.0,
|
| 661 |
-
"hours_since_sleep": 1,
|
| 662 |
-
"sleep_debt": 0.0,
|
| 663 |
-
"followers": 9453,
|
| 664 |
-
"engagement_rate": 0.781,
|
| 665 |
-
"burned_out": false
|
| 666 |
-
}
|
| 667 |
-
},
|
| 668 |
-
{
|
| 669 |
-
"id": "2026-04-05T10:50:55.202107+00:00",
|
| 670 |
-
"scenario": "Queue Optimizer",
|
| 671 |
-
"scenario_id": "queue_optimizer",
|
| 672 |
-
"task": "weekly_competitive",
|
| 673 |
-
"score": 0.352,
|
| 674 |
-
"total_steps": 168,
|
| 675 |
-
"total_posts": 14,
|
| 676 |
-
"avg_reward": 0.2233,
|
| 677 |
-
"final": {
|
| 678 |
-
"energy": 1.0,
|
| 679 |
-
"hours_since_sleep": 1,
|
| 680 |
-
"sleep_debt": 0.0,
|
| 681 |
-
"followers": 11215,
|
| 682 |
-
"engagement_rate": 0.8701,
|
| 683 |
-
"burned_out": false
|
| 684 |
-
}
|
| 685 |
-
},
|
| 686 |
-
{
|
| 687 |
-
"id": "2026-04-05T10:50:55.209453+00:00",
|
| 688 |
-
"scenario": "Random Actor",
|
| 689 |
-
"scenario_id": "random",
|
| 690 |
-
"task": "weekly_competitive",
|
| 691 |
-
"score": 0.0,
|
| 692 |
-
"total_steps": 22,
|
| 693 |
-
"total_posts": 11,
|
| 694 |
-
"avg_reward": 0.2318,
|
| 695 |
-
"final": {
|
| 696 |
-
"energy": 0.0,
|
| 697 |
-
"hours_since_sleep": 17,
|
| 698 |
-
"sleep_debt": 0.033,
|
| 699 |
-
"followers": 10159,
|
| 700 |
-
"engagement_rate": 0.087,
|
| 701 |
-
"burned_out": true
|
| 702 |
-
}
|
| 703 |
-
},
|
| 704 |
-
{
|
| 705 |
-
"id": "2026-04-05T10:50:55.215343+00:00",
|
| 706 |
-
"scenario": "Reel Maximizer",
|
| 707 |
-
"scenario_id": "reel_max",
|
| 708 |
-
"task": "weekly_competitive",
|
| 709 |
-
"score": 0.4344,
|
| 710 |
-
"total_steps": 168,
|
| 711 |
-
"total_posts": 14,
|
| 712 |
-
"avg_reward": 0.2295,
|
| 713 |
-
"final": {
|
| 714 |
-
"energy": 1.0,
|
| 715 |
-
"hours_since_sleep": 1,
|
| 716 |
-
"sleep_debt": 0.0,
|
| 717 |
-
"followers": 13314,
|
| 718 |
-
"engagement_rate": 2.1201,
|
| 719 |
-
"burned_out": false
|
| 720 |
-
}
|
| 721 |
-
},
|
| 722 |
-
{
|
| 723 |
-
"id": "2026-04-05T10:50:55.225542+00:00",
|
| 724 |
-
"scenario": "SaaS/Business",
|
| 725 |
-
"scenario_id": "saas",
|
| 726 |
-
"task": "weekly_competitive",
|
| 727 |
-
"score": 0.2015,
|
| 728 |
-
"total_steps": 168,
|
| 729 |
-
"total_posts": 14,
|
| 730 |
-
"avg_reward": 0.2182,
|
| 731 |
-
"final": {
|
| 732 |
-
"energy": 1.0,
|
| 733 |
-
"hours_since_sleep": 1,
|
| 734 |
-
"sleep_debt": 0.0,
|
| 735 |
-
"followers": 10958,
|
| 736 |
-
"engagement_rate": 0.6072,
|
| 737 |
-
"burned_out": false
|
| 738 |
-
}
|
| 739 |
-
},
|
| 740 |
-
{
|
| 741 |
-
"id": "2026-04-05T10:50:55.234793+00:00",
|
| 742 |
-
"scenario": "Sleep Conscious",
|
| 743 |
-
"scenario_id": "sleep_conscious",
|
| 744 |
-
"task": "weekly_competitive",
|
| 745 |
-
"score": 0.3635,
|
| 746 |
-
"total_steps": 168,
|
| 747 |
-
"total_posts": 14,
|
| 748 |
-
"avg_reward": 0.2257,
|
| 749 |
-
"final": {
|
| 750 |
-
"energy": 0.9,
|
| 751 |
-
"hours_since_sleep": 3,
|
| 752 |
-
"sleep_debt": 0.0,
|
| 753 |
-
"followers": 11305,
|
| 754 |
-
"engagement_rate": 0.8729,
|
| 755 |
-
"burned_out": false
|
| 756 |
-
}
|
| 757 |
-
},
|
| 758 |
-
{
|
| 759 |
-
"id": "2026-04-05T10:50:55.245249+00:00",
|
| 760 |
-
"scenario": "Sleep Debt Aware",
|
| 761 |
-
"scenario_id": "sleep_debt_aware",
|
| 762 |
-
"task": "weekly_competitive",
|
| 763 |
-
"score": 0.3745,
|
| 764 |
-
"total_steps": 168,
|
| 765 |
-
"total_posts": 14,
|
| 766 |
-
"avg_reward": 0.2293,
|
| 767 |
-
"final": {
|
| 768 |
-
"energy": 1.0,
|
| 769 |
-
"hours_since_sleep": 1,
|
| 770 |
-
"sleep_debt": 0.0,
|
| 771 |
-
"followers": 11412,
|
| 772 |
-
"engagement_rate": 0.9425,
|
| 773 |
-
"burned_out": false
|
| 774 |
-
}
|
| 775 |
-
},
|
| 776 |
-
{
|
| 777 |
-
"id": "2026-04-05T10:50:55.252673+00:00",
|
| 778 |
-
"scenario": "Sleep Deprived",
|
| 779 |
-
"scenario_id": "sleep_deprived",
|
| 780 |
-
"task": "weekly_competitive",
|
| 781 |
-
"score": 0.0,
|
| 782 |
-
"total_steps": 16,
|
| 783 |
-
"total_posts": 2,
|
| 784 |
-
"avg_reward": 0.2248,
|
| 785 |
-
"final": {
|
| 786 |
-
"energy": 0.0,
|
| 787 |
-
"hours_since_sleep": 18,
|
| 788 |
-
"sleep_debt": 0.045,
|
| 789 |
-
"followers": 10215,
|
| 790 |
-
"engagement_rate": 1.0806,
|
| 791 |
-
"burned_out": true
|
| 792 |
-
}
|
| 793 |
-
},
|
| 794 |
-
{
|
| 795 |
-
"id": "2026-04-05T10:50:55.258355+00:00",
|
| 796 |
-
"scenario": "Sleep Respecting",
|
| 797 |
-
"scenario_id": "sleep_respecting",
|
| 798 |
-
"task": "weekly_competitive",
|
| 799 |
-
"score": 0.3623,
|
| 800 |
-
"total_steps": 168,
|
| 801 |
-
"total_posts": 14,
|
| 802 |
-
"avg_reward": 0.2264,
|
| 803 |
-
"final": {
|
| 804 |
-
"energy": 1.0,
|
| 805 |
-
"hours_since_sleep": 1,
|
| 806 |
-
"sleep_debt": 0.0,
|
| 807 |
-
"followers": 11322,
|
| 808 |
-
"engagement_rate": 0.8914,
|
| 809 |
-
"burned_out": false
|
| 810 |
-
}
|
| 811 |
-
},
|
| 812 |
-
{
|
| 813 |
-
"id": "2026-04-05T10:50:55.268389+00:00",
|
| 814 |
-
"scenario": "Smart Agent",
|
| 815 |
-
"scenario_id": "smart",
|
| 816 |
-
"task": "weekly_competitive",
|
| 817 |
-
"score": 0.8745,
|
| 818 |
-
"total_steps": 168,
|
| 819 |
-
"total_posts": 14,
|
| 820 |
-
"avg_reward": 0.2301,
|
| 821 |
-
"final": {
|
| 822 |
-
"energy": 1.0,
|
| 823 |
-
"hours_since_sleep": 1,
|
| 824 |
-
"sleep_debt": 0.0,
|
| 825 |
-
"followers": 12200,
|
| 826 |
-
"engagement_rate": 1.5557,
|
| 827 |
-
"burned_out": false
|
| 828 |
-
}
|
| 829 |
-
},
|
| 830 |
-
{
|
| 831 |
-
"id": "2026-04-05T10:50:55.276258+00:00",
|
| 832 |
-
"scenario": "Spam Post",
|
| 833 |
-
"scenario_id": "spam",
|
| 834 |
-
"task": "weekly_competitive",
|
| 835 |
-
"score": 0.0,
|
| 836 |
-
"total_steps": 4,
|
| 837 |
-
"total_posts": 4,
|
| 838 |
-
"avg_reward": 0.387,
|
| 839 |
-
"final": {
|
| 840 |
-
"energy": 0.0,
|
| 841 |
-
"hours_since_sleep": 6,
|
| 842 |
-
"sleep_debt": 0.0,
|
| 843 |
-
"followers": 10625,
|
| 844 |
-
"engagement_rate": 1.567,
|
| 845 |
-
"burned_out": true
|
| 846 |
-
}
|
| 847 |
-
},
|
| 848 |
-
{
|
| 849 |
-
"id": "2026-04-05T10:50:55.281752+00:00",
|
| 850 |
-
"scenario": "Split Schedule",
|
| 851 |
-
"scenario_id": "split_schedule",
|
| 852 |
-
"task": "weekly_competitive",
|
| 853 |
-
"score": 0.385,
|
| 854 |
-
"total_steps": 168,
|
| 855 |
-
"total_posts": 15,
|
| 856 |
-
"avg_reward": 0.2347,
|
| 857 |
-
"final": {
|
| 858 |
-
"energy": 0.75,
|
| 859 |
-
"hours_since_sleep": 2,
|
| 860 |
-
"sleep_debt": 0.0,
|
| 861 |
-
"followers": 11689,
|
| 862 |
-
"engagement_rate": 0.9724,
|
| 863 |
-
"burned_out": false
|
| 864 |
-
}
|
| 865 |
-
},
|
| 866 |
-
{
|
| 867 |
-
"id": "2026-04-05T10:50:55.291899+00:00",
|
| 868 |
-
"scenario": "Stoic Philosophy",
|
| 869 |
-
"scenario_id": "stoic",
|
| 870 |
-
"task": "weekly_competitive",
|
| 871 |
-
"score": 0.1071,
|
| 872 |
-
"total_steps": 168,
|
| 873 |
-
"total_posts": 7,
|
| 874 |
-
"avg_reward": 0.2069,
|
| 875 |
-
"final": {
|
| 876 |
-
"energy": 1.0,
|
| 877 |
-
"hours_since_sleep": 1,
|
| 878 |
-
"sleep_debt": 0.0,
|
| 879 |
-
"followers": 10108,
|
| 880 |
-
"engagement_rate": 0.1578,
|
| 881 |
-
"burned_out": false
|
| 882 |
-
}
|
| 883 |
-
},
|
| 884 |
-
{
|
| 885 |
-
"id": "2026-04-05T10:50:55.301186+00:00",
|
| 886 |
-
"scenario": "Story Spammer",
|
| 887 |
-
"scenario_id": "story_spammer",
|
| 888 |
-
"task": "weekly_competitive",
|
| 889 |
-
"score": 0.1632,
|
| 890 |
-
"total_steps": 168,
|
| 891 |
-
"total_posts": 29,
|
| 892 |
-
"avg_reward": 0.1592,
|
| 893 |
-
"final": {
|
| 894 |
-
"energy": 0.87,
|
| 895 |
-
"hours_since_sleep": 2,
|
| 896 |
-
"sleep_debt": 0.0,
|
| 897 |
-
"followers": 10504,
|
| 898 |
-
"engagement_rate": 0.1285,
|
| 899 |
-
"burned_out": false
|
| 900 |
-
}
|
| 901 |
-
},
|
| 902 |
-
{
|
| 903 |
-
"id": "2026-04-05T10:50:55.310194+00:00",
|
| 904 |
-
"scenario": "Tag Exploiter",
|
| 905 |
-
"scenario_id": "tag_exploiter",
|
| 906 |
-
"task": "weekly_competitive",
|
| 907 |
-
"score": 0.2922,
|
| 908 |
-
"total_steps": 168,
|
| 909 |
-
"total_posts": 14,
|
| 910 |
-
"avg_reward": 0.2358,
|
| 911 |
-
"final": {
|
| 912 |
-
"energy": 1.0,
|
| 913 |
-
"hours_since_sleep": 1,
|
| 914 |
-
"sleep_debt": 0.0,
|
| 915 |
-
"followers": 13696,
|
| 916 |
-
"engagement_rate": 2.2487,
|
| 917 |
-
"burned_out": false
|
| 918 |
-
}
|
| 919 |
-
},
|
| 920 |
-
{
|
| 921 |
-
"id": "2026-04-05T10:50:55.320255+00:00",
|
| 922 |
-
"scenario": "Tag Explorer",
|
| 923 |
-
"scenario_id": "tag_explorer",
|
| 924 |
-
"task": "weekly_competitive",
|
| 925 |
-
"score": 0.8323,
|
| 926 |
-
"total_steps": 168,
|
| 927 |
-
"total_posts": 15,
|
| 928 |
-
"avg_reward": 0.2253,
|
| 929 |
-
"final": {
|
| 930 |
-
"energy": 0.94,
|
| 931 |
-
"hours_since_sleep": 2,
|
| 932 |
-
"sleep_debt": 0.0,
|
| 933 |
-
"followers": 11351,
|
| 934 |
-
"engagement_rate": 0.7735,
|
| 935 |
-
"burned_out": false
|
| 936 |
-
}
|
| 937 |
-
},
|
| 938 |
-
{
|
| 939 |
-
"id": "2026-04-05T10:50:55.333620+00:00",
|
| 940 |
-
"scenario": "Tech Niche",
|
| 941 |
-
"scenario_id": "tech_niche",
|
| 942 |
-
"task": "weekly_competitive",
|
| 943 |
-
"score": 0.2001,
|
| 944 |
-
"total_steps": 168,
|
| 945 |
-
"total_posts": 14,
|
| 946 |
-
"avg_reward": 0.215,
|
| 947 |
-
"final": {
|
| 948 |
-
"energy": 1.0,
|
| 949 |
-
"hours_since_sleep": 1,
|
| 950 |
-
"sleep_debt": 0.0,
|
| 951 |
-
"followers": 10770,
|
| 952 |
-
"engagement_rate": 0.533,
|
| 953 |
-
"burned_out": false
|
| 954 |
-
}
|
| 955 |
-
},
|
| 956 |
-
{
|
| 957 |
-
"id": "2026-04-05T10:50:55.343185+00:00",
|
| 958 |
-
"scenario": "Text Only",
|
| 959 |
-
"scenario_id": "text_only",
|
| 960 |
-
"task": "weekly_competitive",
|
| 961 |
-
"score": 0.1583,
|
| 962 |
-
"total_steps": 168,
|
| 963 |
-
"total_posts": 21,
|
| 964 |
-
"avg_reward": 0.1857,
|
| 965 |
-
"final": {
|
| 966 |
-
"energy": 1.0,
|
| 967 |
-
"hours_since_sleep": 1,
|
| 968 |
-
"sleep_debt": 0.0,
|
| 969 |
-
"followers": 10485,
|
| 970 |
-
"engagement_rate": 0.234,
|
| 971 |
-
"burned_out": false
|
| 972 |
-
}
|
| 973 |
-
},
|
| 974 |
-
{
|
| 975 |
-
"id": "2026-04-05T10:50:55.352680+00:00",
|
| 976 |
-
"scenario": "Travel Blogger",
|
| 977 |
-
"scenario_id": "travel",
|
| 978 |
-
"task": "weekly_competitive",
|
| 979 |
-
"score": 0.2975,
|
| 980 |
-
"total_steps": 168,
|
| 981 |
-
"total_posts": 14,
|
| 982 |
-
"avg_reward": 0.2307,
|
| 983 |
-
"final": {
|
| 984 |
-
"energy": 1.0,
|
| 985 |
-
"hours_since_sleep": 1,
|
| 986 |
-
"sleep_debt": 0.0,
|
| 987 |
-
"followers": 12749,
|
| 988 |
-
"engagement_rate": 1.9614,
|
| 989 |
-
"burned_out": false
|
| 990 |
-
}
|
| 991 |
-
},
|
| 992 |
-
{
|
| 993 |
-
"id": "2026-04-05T10:50:55.362329+00:00",
|
| 994 |
-
"scenario": "Trend Chaser",
|
| 995 |
-
"scenario_id": "trend_chaser",
|
| 996 |
-
"task": "weekly_competitive",
|
| 997 |
-
"score": 0.4344,
|
| 998 |
-
"total_steps": 168,
|
| 999 |
-
"total_posts": 14,
|
| 1000 |
-
"avg_reward": 0.2413,
|
| 1001 |
-
"final": {
|
| 1002 |
-
"energy": 1.0,
|
| 1003 |
-
"hours_since_sleep": 1,
|
| 1004 |
-
"sleep_debt": 0.0,
|
| 1005 |
-
"followers": 14148,
|
| 1006 |
-
"engagement_rate": 2.6985,
|
| 1007 |
-
"burned_out": false
|
| 1008 |
-
}
|
| 1009 |
-
},
|
| 1010 |
-
{
|
| 1011 |
-
"id": "2026-04-05T10:50:55.373024+00:00",
|
| 1012 |
-
"scenario": "Tuesday Thursday",
|
| 1013 |
-
"scenario_id": "tue_thu",
|
| 1014 |
-
"task": "weekly_competitive",
|
| 1015 |
-
"score": 0.1826,
|
| 1016 |
-
"total_steps": 168,
|
| 1017 |
-
"total_posts": 4,
|
| 1018 |
-
"avg_reward": 0.1731,
|
| 1019 |
-
"final": {
|
| 1020 |
-
"energy": 1.0,
|
| 1021 |
-
"hours_since_sleep": 1,
|
| 1022 |
-
"sleep_debt": 0.0,
|
| 1023 |
-
"followers": 9154,
|
| 1024 |
-
"engagement_rate": 3.4748,
|
| 1025 |
-
"burned_out": false
|
| 1026 |
-
}
|
| 1027 |
-
},
|
| 1028 |
-
{
|
| 1029 |
-
"id": "2026-04-05T10:50:55.382708+00:00",
|
| 1030 |
-
"scenario": "Weekday Only",
|
| 1031 |
-
"scenario_id": "weekday_only",
|
| 1032 |
-
"task": "weekly_competitive",
|
| 1033 |
-
"score": 0.2366,
|
| 1034 |
-
"total_steps": 168,
|
| 1035 |
-
"total_posts": 10,
|
| 1036 |
-
"avg_reward": 0.2046,
|
| 1037 |
-
"final": {
|
| 1038 |
-
"energy": 1.0,
|
| 1039 |
-
"hours_since_sleep": 1,
|
| 1040 |
-
"sleep_debt": 0.0,
|
| 1041 |
-
"followers": 9810,
|
| 1042 |
-
"engagement_rate": 1.0028,
|
| 1043 |
-
"burned_out": false
|
| 1044 |
-
}
|
| 1045 |
-
},
|
| 1046 |
-
{
|
| 1047 |
-
"id": "2026-04-05T10:50:55.392284+00:00",
|
| 1048 |
-
"scenario": "Weekend Warrior",
|
| 1049 |
-
"scenario_id": "weekend",
|
| 1050 |
-
"task": "weekly_competitive",
|
| 1051 |
-
"score": 0.1257,
|
| 1052 |
-
"total_steps": 168,
|
| 1053 |
-
"total_posts": 6,
|
| 1054 |
-
"avg_reward": 0.1648,
|
| 1055 |
-
"final": {
|
| 1056 |
-
"energy": 1.0,
|
| 1057 |
-
"hours_since_sleep": 1,
|
| 1058 |
-
"sleep_debt": 0.0,
|
| 1059 |
-
"followers": 7659,
|
| 1060 |
-
"engagement_rate": 0.635,
|
| 1061 |
-
"burned_out": false
|
| 1062 |
-
}
|
| 1063 |
-
},
|
| 1064 |
-
{
|
| 1065 |
-
"id": "2026-04-05T10:51:44.770556+00:00",
|
| 1066 |
-
"scenario": "Aggressive Energy",
|
| 1067 |
-
"scenario_id": "aggressive",
|
| 1068 |
-
"task": "weekly_competitive",
|
| 1069 |
-
"score": 0.8255,
|
| 1070 |
-
"total_steps": 168,
|
| 1071 |
-
"total_posts": 29,
|
| 1072 |
-
"avg_reward": 0.1875,
|
| 1073 |
-
"final": {
|
| 1074 |
-
"energy": 0.75,
|
| 1075 |
-
"hours_since_sleep": 2,
|
| 1076 |
-
"sleep_debt": 0.0,
|
| 1077 |
-
"followers": 13021,
|
| 1078 |
-
"engagement_rate": 0.8084,
|
| 1079 |
-
"burned_out": false
|
| 1080 |
-
}
|
| 1081 |
-
},
|
| 1082 |
-
{
|
| 1083 |
-
"id": "2026-04-06T14:25:47.636598+00:00",
|
| 1084 |
-
"scenario": "Sleep Respecting",
|
| 1085 |
-
"scenario_id": "sleep_respecting",
|
| 1086 |
-
"task": "weekly_competitive",
|
| 1087 |
-
"score": 0.3623,
|
| 1088 |
-
"total_steps": 168,
|
| 1089 |
-
"total_posts": 14,
|
| 1090 |
-
"avg_reward": 0.2264,
|
| 1091 |
-
"final": {
|
| 1092 |
-
"energy": 1.0,
|
| 1093 |
-
"hours_since_sleep": 1,
|
| 1094 |
-
"sleep_debt": 0.0,
|
| 1095 |
-
"followers": 11322,
|
| 1096 |
-
"engagement_rate": 0.8914,
|
| 1097 |
-
"burned_out": false
|
| 1098 |
-
}
|
| 1099 |
-
},
|
| 1100 |
-
{
|
| 1101 |
-
"id": "2026-04-06T14:26:41.631567+00:00",
|
| 1102 |
-
"scenario": "Creator Economy",
|
| 1103 |
-
"scenario_id": "creator_economy",
|
| 1104 |
-
"task": "weekly_competitive",
|
| 1105 |
-
"score": 0.2515,
|
| 1106 |
-
"total_steps": 168,
|
| 1107 |
-
"total_posts": 14,
|
| 1108 |
-
"avg_reward": 0.2226,
|
| 1109 |
-
"final": {
|
| 1110 |
-
"energy": 1.0,
|
| 1111 |
-
"hours_since_sleep": 1,
|
| 1112 |
-
"sleep_debt": 0.0,
|
| 1113 |
-
"followers": 11994,
|
| 1114 |
-
"engagement_rate": 1.3918,
|
| 1115 |
-
"burned_out": false
|
| 1116 |
-
}
|
| 1117 |
-
},
|
| 1118 |
-
{
|
| 1119 |
-
"id": "2026-04-06T14:27:32.195059+00:00",
|
| 1120 |
-
"scenario": "Weekday Only",
|
| 1121 |
-
"scenario_id": "weekday_only",
|
| 1122 |
-
"task": "weekly_competitive",
|
| 1123 |
-
"score": 0.2366,
|
| 1124 |
-
"total_steps": 168,
|
| 1125 |
-
"total_posts": 10,
|
| 1126 |
-
"avg_reward": 0.2046,
|
| 1127 |
-
"final": {
|
| 1128 |
-
"energy": 1.0,
|
| 1129 |
-
"hours_since_sleep": 1,
|
| 1130 |
-
"sleep_debt": 0.0,
|
| 1131 |
-
"followers": 9810,
|
| 1132 |
-
"engagement_rate": 1.0028,
|
| 1133 |
-
"burned_out": false
|
| 1134 |
-
}
|
| 1135 |
-
},
|
| 1136 |
-
{
|
| 1137 |
-
"id": "2026-04-06T14:28:12.547146+00:00",
|
| 1138 |
-
"scenario": "Weekday Only",
|
| 1139 |
-
"scenario_id": "weekday_only",
|
| 1140 |
-
"task": "weekly_competitive",
|
| 1141 |
-
"score": 0.2366,
|
| 1142 |
-
"total_steps": 168,
|
| 1143 |
-
"total_posts": 10,
|
| 1144 |
-
"avg_reward": 0.2046,
|
| 1145 |
-
"final": {
|
| 1146 |
-
"energy": 1.0,
|
| 1147 |
-
"hours_since_sleep": 1,
|
| 1148 |
-
"sleep_debt": 0.0,
|
| 1149 |
-
"followers": 9810,
|
| 1150 |
-
"engagement_rate": 1.0028,
|
| 1151 |
-
"burned_out": false
|
| 1152 |
-
}
|
| 1153 |
-
},
|
| 1154 |
-
{
|
| 1155 |
-
"id": "2026-04-06T14:29:19.356814+00:00",
|
| 1156 |
-
"scenario": "No Rest",
|
| 1157 |
-
"scenario_id": "no_rest",
|
| 1158 |
-
"task": "weekly_engage",
|
| 1159 |
-
"score": 0.027,
|
| 1160 |
-
"total_steps": 8,
|
| 1161 |
-
"total_posts": 8,
|
| 1162 |
-
"avg_reward": 0.2686,
|
| 1163 |
-
"final": {
|
| 1164 |
-
"energy": 0.0,
|
| 1165 |
-
"hours_since_sleep": 10,
|
| 1166 |
-
"sleep_debt": 0.0,
|
| 1167 |
-
"followers": 10213,
|
| 1168 |
-
"engagement_rate": 0.2732,
|
| 1169 |
-
"burned_out": true
|
| 1170 |
-
}
|
| 1171 |
-
},
|
| 1172 |
-
{
|
| 1173 |
-
"id": "2026-04-06T14:29:21.996045+00:00",
|
| 1174 |
-
"scenario": "No Rest",
|
| 1175 |
-
"scenario_id": "no_rest",
|
| 1176 |
-
"task": "weekly_engage",
|
| 1177 |
-
"score": 0.027,
|
| 1178 |
-
"total_steps": 8,
|
| 1179 |
-
"total_posts": 8,
|
| 1180 |
-
"avg_reward": 0.2686,
|
| 1181 |
-
"final": {
|
| 1182 |
-
"energy": 0.0,
|
| 1183 |
-
"hours_since_sleep": 10,
|
| 1184 |
-
"sleep_debt": 0.0,
|
| 1185 |
-
"followers": 10213,
|
| 1186 |
-
"engagement_rate": 0.2732,
|
| 1187 |
-
"burned_out": true
|
| 1188 |
-
}
|
| 1189 |
-
},
|
| 1190 |
-
{
|
| 1191 |
-
"id": "2026-04-06T14:29:33.742894+00:00",
|
| 1192 |
-
"scenario": "Text Only",
|
| 1193 |
-
"scenario_id": "text_only",
|
| 1194 |
-
"task": "weekly_engage",
|
| 1195 |
-
"score": 0.2049,
|
| 1196 |
-
"total_steps": 168,
|
| 1197 |
-
"total_posts": 21,
|
| 1198 |
-
"avg_reward": 0.1857,
|
| 1199 |
-
"final": {
|
| 1200 |
-
"energy": 1.0,
|
| 1201 |
-
"hours_since_sleep": 1,
|
| 1202 |
-
"sleep_debt": 0.0,
|
| 1203 |
-
"followers": 10485,
|
| 1204 |
-
"engagement_rate": 0.234,
|
| 1205 |
-
"burned_out": false
|
| 1206 |
-
}
|
| 1207 |
-
},
|
| 1208 |
-
{
|
| 1209 |
-
"id": "2026-04-06T14:29:39.176314+00:00",
|
| 1210 |
-
"scenario": "Gaming Niche",
|
| 1211 |
-
"scenario_id": "gaming_niche",
|
| 1212 |
-
"task": "weekly_engage",
|
| 1213 |
-
"score": 0.5658,
|
| 1214 |
-
"total_steps": 168,
|
| 1215 |
-
"total_posts": 14,
|
| 1216 |
-
"avg_reward": 0.2062,
|
| 1217 |
-
"final": {
|
| 1218 |
-
"energy": 1.0,
|
| 1219 |
-
"hours_since_sleep": 1,
|
| 1220 |
-
"sleep_debt": 0.0,
|
| 1221 |
-
"followers": 11364,
|
| 1222 |
-
"engagement_rate": 0.9138,
|
| 1223 |
-
"burned_out": false
|
| 1224 |
-
}
|
| 1225 |
-
},
|
| 1226 |
-
{
|
| 1227 |
-
"id": "2026-04-06T14:29:50.321368+00:00",
|
| 1228 |
-
"scenario": "Midday Focus",
|
| 1229 |
-
"scenario_id": "midday",
|
| 1230 |
-
"task": "weekly_engage",
|
| 1231 |
-
"score": 1.0,
|
| 1232 |
-
"total_steps": 168,
|
| 1233 |
-
"total_posts": 14,
|
| 1234 |
-
"avg_reward": 0.2306,
|
| 1235 |
-
"final": {
|
| 1236 |
-
"energy": 1.0,
|
| 1237 |
-
"hours_since_sleep": 1,
|
| 1238 |
-
"sleep_debt": 0.0,
|
| 1239 |
-
"followers": 13537,
|
| 1240 |
-
"engagement_rate": 2.3076,
|
| 1241 |
-
"burned_out": false
|
| 1242 |
-
}
|
| 1243 |
-
},
|
| 1244 |
-
{
|
| 1245 |
-
"id": "2026-04-06T17:52:48.224991+00:00",
|
| 1246 |
-
"scenario": "Double Peak",
|
| 1247 |
-
"scenario_id": "double_peak",
|
| 1248 |
-
"task": "weekly_competitive",
|
| 1249 |
-
"score": 0.4519,
|
| 1250 |
-
"total_steps": 168,
|
| 1251 |
-
"total_posts": 14,
|
| 1252 |
-
"avg_reward": 0.2352,
|
| 1253 |
-
"final": {
|
| 1254 |
-
"energy": 1.0,
|
| 1255 |
-
"hours_since_sleep": 1,
|
| 1256 |
-
"sleep_debt": 0.0,
|
| 1257 |
-
"followers": 13138,
|
| 1258 |
-
"engagement_rate": 2.0814,
|
| 1259 |
-
"burned_out": false
|
| 1260 |
-
}
|
| 1261 |
-
},
|
| 1262 |
-
{
|
| 1263 |
-
"id": "2026-04-06T17:53:45.401024+00:00",
|
| 1264 |
-
"scenario": "Photography Focus",
|
| 1265 |
-
"scenario_id": "photography",
|
| 1266 |
-
"task": "weekly_competitive",
|
| 1267 |
-
"score": 0.1838,
|
| 1268 |
-
"total_steps": 168,
|
| 1269 |
-
"total_posts": 16,
|
| 1270 |
-
"avg_reward": 0.22,
|
| 1271 |
-
"final": {
|
| 1272 |
-
"energy": 0.5,
|
| 1273 |
-
"hours_since_sleep": 3,
|
| 1274 |
-
"sleep_debt": 0.0,
|
| 1275 |
-
"followers": 10736,
|
| 1276 |
-
"engagement_rate": 0.4388,
|
| 1277 |
-
"burned_out": false
|
| 1278 |
-
}
|
| 1279 |
-
},
|
| 1280 |
-
{
|
| 1281 |
-
"id": "2026-04-06T17:54:16.540951+00:00",
|
| 1282 |
-
"scenario": "Burst Poster",
|
| 1283 |
-
"scenario_id": "burst",
|
| 1284 |
-
"task": "weekly_competitive",
|
| 1285 |
-
"score": 0.6111,
|
| 1286 |
-
"total_steps": 168,
|
| 1287 |
-
"total_posts": 57,
|
| 1288 |
-
"avg_reward": 0.2318,
|
| 1289 |
-
"final": {
|
| 1290 |
-
"energy": 0.44,
|
| 1291 |
-
"hours_since_sleep": 1,
|
| 1292 |
-
"sleep_debt": 0.0,
|
| 1293 |
-
"followers": 11701,
|
| 1294 |
-
"engagement_rate": 0.2076,
|
| 1295 |
-
"burned_out": false
|
| 1296 |
-
}
|
| 1297 |
-
},
|
| 1298 |
-
{
|
| 1299 |
-
"id": "2026-04-06T17:54:39.699482+00:00",
|
| 1300 |
-
"scenario": "Engagement Chaser",
|
| 1301 |
-
"scenario_id": "engagement_chaser",
|
| 1302 |
-
"task": "weekly_competitive",
|
| 1303 |
-
"score": 0.4194,
|
| 1304 |
-
"total_steps": 168,
|
| 1305 |
-
"total_posts": 21,
|
| 1306 |
-
"avg_reward": 0.2224,
|
| 1307 |
-
"final": {
|
| 1308 |
-
"energy": 1.0,
|
| 1309 |
-
"hours_since_sleep": 1,
|
| 1310 |
-
"sleep_debt": 0.0,
|
| 1311 |
-
"followers": 15287,
|
| 1312 |
-
"engagement_rate": 2.2466,
|
| 1313 |
-
"burned_out": false
|
| 1314 |
-
}
|
| 1315 |
-
},
|
| 1316 |
-
{
|
| 1317 |
-
"id": "2026-04-06T18:09:31.470202+00:00",
|
| 1318 |
-
"scenario": "Lifestyle Niche",
|
| 1319 |
-
"scenario_id": "lifestyle_niche",
|
| 1320 |
-
"task": "weekly_competitive",
|
| 1321 |
-
"score": 0.2612,
|
| 1322 |
-
"total_steps": 168,
|
| 1323 |
-
"total_posts": 14,
|
| 1324 |
-
"avg_reward": 0.2288,
|
| 1325 |
-
"final": {
|
| 1326 |
-
"energy": 1.0,
|
| 1327 |
-
"hours_since_sleep": 1,
|
| 1328 |
-
"sleep_debt": 0.0,
|
| 1329 |
-
"followers": 12251,
|
| 1330 |
-
"engagement_rate": 1.6295,
|
| 1331 |
-
"burned_out": false
|
| 1332 |
-
}
|
| 1333 |
-
},
|
| 1334 |
-
{
|
| 1335 |
-
"id": "2026-04-06T18:09:42.791462+00:00",
|
| 1336 |
-
"scenario": "Content Creator",
|
| 1337 |
-
"scenario_id": "content_creator",
|
| 1338 |
-
"task": "weekly_competitive",
|
| 1339 |
-
"score": 0.6434,
|
| 1340 |
-
"total_steps": 168,
|
| 1341 |
-
"total_posts": 12,
|
| 1342 |
-
"avg_reward": 0.2065,
|
| 1343 |
-
"final": {
|
| 1344 |
-
"energy": 0.309,
|
| 1345 |
-
"hours_since_sleep": 28,
|
| 1346 |
-
"sleep_debt": 0.017,
|
| 1347 |
-
"followers": 10931,
|
| 1348 |
-
"engagement_rate": 0.525,
|
| 1349 |
-
"burned_out": false
|
| 1350 |
-
}
|
| 1351 |
-
},
|
| 1352 |
-
{
|
| 1353 |
-
"id": "2026-04-06T18:25:35.360345+00:00",
|
| 1354 |
-
"scenario": "Anti-Trend",
|
| 1355 |
-
"scenario_id": "anti_trend",
|
| 1356 |
-
"task": "weekly_competitive",
|
| 1357 |
-
"score": 0.2316,
|
| 1358 |
-
"total_steps": 168,
|
| 1359 |
-
"total_posts": 14,
|
| 1360 |
-
"avg_reward": 0.2201,
|
| 1361 |
-
"final": {
|
| 1362 |
-
"energy": 1.0,
|
| 1363 |
-
"hours_since_sleep": 1,
|
| 1364 |
-
"sleep_debt": 0.0,
|
| 1365 |
-
"followers": 11125,
|
| 1366 |
-
"engagement_rate": 0.747,
|
| 1367 |
-
"burned_out": false
|
| 1368 |
-
}
|
| 1369 |
-
},
|
| 1370 |
-
{
|
| 1371 |
-
"id": "2026-04-06T18:28:21.455943+00:00",
|
| 1372 |
-
"scenario": "Fashion Content",
|
| 1373 |
-
"scenario_id": "fashion",
|
| 1374 |
-
"task": "weekly_competitive",
|
| 1375 |
-
"score": 0.2181,
|
| 1376 |
-
"total_steps": 168,
|
| 1377 |
-
"total_posts": 14,
|
| 1378 |
-
"avg_reward": 0.2147,
|
| 1379 |
-
"final": {
|
| 1380 |
-
"energy": 1.0,
|
| 1381 |
-
"hours_since_sleep": 1,
|
| 1382 |
-
"sleep_debt": 0.0,
|
| 1383 |
-
"followers": 11135,
|
| 1384 |
-
"engagement_rate": 0.7898,
|
| 1385 |
-
"burned_out": false
|
| 1386 |
-
}
|
| 1387 |
-
},
|
| 1388 |
-
{
|
| 1389 |
-
"id": "2026-04-06T18:28:26.860641+00:00",
|
| 1390 |
-
"scenario": "Low Frequency",
|
| 1391 |
-
"scenario_id": "low_freq",
|
| 1392 |
-
"task": "weekly_competitive",
|
| 1393 |
-
"score": 0.3241,
|
| 1394 |
-
"total_steps": 168,
|
| 1395 |
-
"total_posts": 4,
|
| 1396 |
-
"avg_reward": 0.1768,
|
| 1397 |
-
"final": {
|
| 1398 |
-
"energy": 1.0,
|
| 1399 |
-
"hours_since_sleep": 1,
|
| 1400 |
-
"sleep_debt": 0.0,
|
| 1401 |
-
"followers": 10461,
|
| 1402 |
-
"engagement_rate": 1.1563,
|
| 1403 |
-
"burned_out": false
|
| 1404 |
-
}
|
| 1405 |
-
},
|
| 1406 |
-
{
|
| 1407 |
-
"id": "2026-04-06T18:28:36.279972+00:00",
|
| 1408 |
-
"scenario": "Balanced Creator",
|
| 1409 |
-
"scenario_id": "balanced",
|
| 1410 |
-
"task": "weekly_competitive",
|
| 1411 |
-
"score": 0.8775,
|
| 1412 |
-
"total_steps": 168,
|
| 1413 |
-
"total_posts": 28,
|
| 1414 |
-
"avg_reward": 0.2187,
|
| 1415 |
-
"final": {
|
| 1416 |
-
"energy": 1.0,
|
| 1417 |
-
"hours_since_sleep": 2,
|
| 1418 |
-
"sleep_debt": 0.0,
|
| 1419 |
-
"followers": 12534,
|
| 1420 |
-
"engagement_rate": 0.8273,
|
| 1421 |
-
"burned_out": false
|
| 1422 |
-
}
|
| 1423 |
-
},
|
| 1424 |
-
{
|
| 1425 |
-
"id": "2026-04-06T18:29:19.542258+00:00",
|
| 1426 |
-
"scenario": "Napper",
|
| 1427 |
-
"scenario_id": "napper",
|
| 1428 |
-
"task": "weekly_competitive",
|
| 1429 |
-
"score": 0.3623,
|
| 1430 |
-
"total_steps": 168,
|
| 1431 |
-
"total_posts": 14,
|
| 1432 |
-
"avg_reward": 0.2264,
|
| 1433 |
-
"final": {
|
| 1434 |
-
"energy": 1.0,
|
| 1435 |
-
"hours_since_sleep": 1,
|
| 1436 |
-
"sleep_debt": 0.0,
|
| 1437 |
-
"followers": 11322,
|
| 1438 |
-
"engagement_rate": 0.8914,
|
| 1439 |
-
"burned_out": false
|
| 1440 |
-
}
|
| 1441 |
-
},
|
| 1442 |
-
{
|
| 1443 |
-
"id": "2026-04-06T19:48:37.931282+00:00",
|
| 1444 |
-
"scenario": "Optimal Sleep",
|
| 1445 |
-
"scenario_id": "optimal_sleep",
|
| 1446 |
-
"task": "weekly_competitive",
|
| 1447 |
-
"score": 0.3635,
|
| 1448 |
-
"total_steps": 168,
|
| 1449 |
-
"total_posts": 14,
|
| 1450 |
-
"avg_reward": 0.2257,
|
| 1451 |
-
"final": {
|
| 1452 |
-
"energy": 0.9,
|
| 1453 |
-
"hours_since_sleep": 3,
|
| 1454 |
-
"sleep_debt": 0.0,
|
| 1455 |
-
"followers": 11305,
|
| 1456 |
-
"engagement_rate": 0.8729,
|
| 1457 |
-
"burned_out": false
|
| 1458 |
-
}
|
| 1459 |
-
},
|
| 1460 |
-
{
|
| 1461 |
-
"id": "2026-04-06T19:49:01.327141+00:00",
|
| 1462 |
-
"scenario": "Marathon Runner",
|
| 1463 |
-
"scenario_id": "marathon",
|
| 1464 |
-
"task": "weekly_competitive",
|
| 1465 |
-
"score": 0.0,
|
| 1466 |
-
"total_steps": 50,
|
| 1467 |
-
"total_posts": 9,
|
| 1468 |
-
"avg_reward": 0.1323,
|
| 1469 |
-
"final": {
|
| 1470 |
-
"energy": 0.0,
|
| 1471 |
-
"hours_since_sleep": 22,
|
| 1472 |
-
"sleep_debt": 0.028,
|
| 1473 |
-
"followers": 10137,
|
| 1474 |
-
"engagement_rate": 0.157,
|
| 1475 |
-
"burned_out": true
|
| 1476 |
-
}
|
| 1477 |
-
},
|
| 1478 |
-
{
|
| 1479 |
-
"id": "2026-04-06T19:49:13.972097+00:00",
|
| 1480 |
-
"scenario": "Balanced Creator",
|
| 1481 |
-
"scenario_id": "balanced",
|
| 1482 |
-
"task": "weekly_competitive",
|
| 1483 |
-
"score": 0.8775,
|
| 1484 |
-
"total_steps": 168,
|
| 1485 |
-
"total_posts": 28,
|
| 1486 |
-
"avg_reward": 0.2187,
|
| 1487 |
-
"final": {
|
| 1488 |
-
"energy": 1.0,
|
| 1489 |
-
"hours_since_sleep": 2,
|
| 1490 |
-
"sleep_debt": 0.0,
|
| 1491 |
-
"followers": 12534,
|
| 1492 |
-
"engagement_rate": 0.8273,
|
| 1493 |
-
"burned_out": false
|
| 1494 |
-
}
|
| 1495 |
-
},
|
| 1496 |
-
{
|
| 1497 |
-
"id": "2026-04-06T19:49:37.864235+00:00",
|
| 1498 |
-
"scenario": "Engagement Chaser",
|
| 1499 |
-
"scenario_id": "engagement_chaser",
|
| 1500 |
-
"task": "weekly_competitive",
|
| 1501 |
-
"score": 0.4194,
|
| 1502 |
-
"total_steps": 168,
|
| 1503 |
-
"total_posts": 21,
|
| 1504 |
-
"avg_reward": 0.2224,
|
| 1505 |
-
"final": {
|
| 1506 |
-
"energy": 1.0,
|
| 1507 |
-
"hours_since_sleep": 1,
|
| 1508 |
-
"sleep_debt": 0.0,
|
| 1509 |
-
"followers": 15287,
|
| 1510 |
-
"engagement_rate": 2.2466,
|
| 1511 |
-
"burned_out": false
|
| 1512 |
-
}
|
| 1513 |
-
},
|
| 1514 |
-
{
|
| 1515 |
-
"id": "2026-04-06T19:50:08.348742+00:00",
|
| 1516 |
-
"scenario": "Early Bird",
|
| 1517 |
-
"scenario_id": "early_bird",
|
| 1518 |
-
"task": "weekly_competitive",
|
| 1519 |
-
"score": 0.2075,
|
| 1520 |
-
"total_steps": 168,
|
| 1521 |
-
"total_posts": 16,
|
| 1522 |
-
"avg_reward": 0.2284,
|
| 1523 |
-
"final": {
|
| 1524 |
-
"energy": 0.62,
|
| 1525 |
-
"hours_since_sleep": 2,
|
| 1526 |
-
"sleep_debt": 0.0,
|
| 1527 |
-
"followers": 10818,
|
| 1528 |
-
"engagement_rate": 0.4138,
|
| 1529 |
-
"burned_out": false
|
| 1530 |
-
}
|
| 1531 |
-
},
|
| 1532 |
-
{
|
| 1533 |
-
"id": "2026-04-06T19:50:15.765261+00:00",
|
| 1534 |
-
"scenario": "Queue Heavy",
|
| 1535 |
-
"scenario_id": "queue_heavy",
|
| 1536 |
-
"task": "weekly_competitive",
|
| 1537 |
-
"score": 0.1933,
|
| 1538 |
-
"total_steps": 168,
|
| 1539 |
-
"total_posts": 8,
|
| 1540 |
-
"avg_reward": 0.1923,
|
| 1541 |
-
"final": {
|
| 1542 |
-
"energy": 1.0,
|
| 1543 |
-
"hours_since_sleep": 1,
|
| 1544 |
-
"sleep_debt": 0.0,
|
| 1545 |
-
"followers": 9453,
|
| 1546 |
-
"engagement_rate": 0.781,
|
| 1547 |
-
"burned_out": false
|
| 1548 |
-
}
|
| 1549 |
-
},
|
| 1550 |
-
{
|
| 1551 |
-
"id": "2026-04-06T19:50:26.015235+00:00",
|
| 1552 |
-
"scenario": "Balanced Creator",
|
| 1553 |
-
"scenario_id": "balanced",
|
| 1554 |
-
"task": "weekly_competitive",
|
| 1555 |
-
"score": 0.8775,
|
| 1556 |
-
"total_steps": 168,
|
| 1557 |
-
"total_posts": 28,
|
| 1558 |
-
"avg_reward": 0.2187,
|
| 1559 |
-
"final": {
|
| 1560 |
-
"energy": 1.0,
|
| 1561 |
-
"hours_since_sleep": 2,
|
| 1562 |
-
"sleep_debt": 0.0,
|
| 1563 |
-
"followers": 12534,
|
| 1564 |
-
"engagement_rate": 0.8273,
|
| 1565 |
-
"burned_out": false
|
| 1566 |
-
}
|
| 1567 |
-
},
|
| 1568 |
-
{
|
| 1569 |
-
"id": "2026-04-06T19:50:30.364460+00:00",
|
| 1570 |
-
"scenario": "High Frequency",
|
| 1571 |
-
"scenario_id": "high_freq",
|
| 1572 |
-
"task": "weekly_competitive",
|
| 1573 |
-
"score": 0.8611,
|
| 1574 |
-
"total_steps": 168,
|
| 1575 |
-
"total_posts": 22,
|
| 1576 |
-
"avg_reward": 0.2058,
|
| 1577 |
-
"final": {
|
| 1578 |
-
"energy": 0.92,
|
| 1579 |
-
"hours_since_sleep": 2,
|
| 1580 |
-
"sleep_debt": 0.0,
|
| 1581 |
-
"followers": 12654,
|
| 1582 |
-
"engagement_rate": 1.079,
|
| 1583 |
-
"burned_out": false
|
| 1584 |
-
}
|
| 1585 |
-
},
|
| 1586 |
-
{
|
| 1587 |
-
"id": "2026-04-06T19:50:38.185556+00:00",
|
| 1588 |
-
"scenario": "Sleep Conscious",
|
| 1589 |
-
"scenario_id": "sleep_conscious",
|
| 1590 |
-
"task": "weekly_competitive",
|
| 1591 |
-
"score": 0.3635,
|
| 1592 |
-
"total_steps": 168,
|
| 1593 |
-
"total_posts": 14,
|
| 1594 |
-
"avg_reward": 0.2257,
|
| 1595 |
-
"final": {
|
| 1596 |
-
"energy": 0.9,
|
| 1597 |
-
"hours_since_sleep": 3,
|
| 1598 |
-
"sleep_debt": 0.0,
|
| 1599 |
-
"followers": 11305,
|
| 1600 |
-
"engagement_rate": 0.8729,
|
| 1601 |
-
"burned_out": false
|
| 1602 |
-
}
|
| 1603 |
-
},
|
| 1604 |
-
{
|
| 1605 |
-
"id": "2026-04-06T19:50:44.256241+00:00",
|
| 1606 |
-
"scenario": "Burst Poster",
|
| 1607 |
-
"scenario_id": "burst",
|
| 1608 |
-
"task": "weekly_competitive",
|
| 1609 |
-
"score": 0.6111,
|
| 1610 |
-
"total_steps": 168,
|
| 1611 |
-
"total_posts": 57,
|
| 1612 |
-
"avg_reward": 0.2318,
|
| 1613 |
-
"final": {
|
| 1614 |
-
"energy": 0.44,
|
| 1615 |
-
"hours_since_sleep": 1,
|
| 1616 |
-
"sleep_debt": 0.0,
|
| 1617 |
-
"followers": 11701,
|
| 1618 |
-
"engagement_rate": 0.2076,
|
| 1619 |
-
"burned_out": false
|
| 1620 |
-
}
|
| 1621 |
-
},
|
| 1622 |
-
{
|
| 1623 |
-
"id": "2026-04-06T19:51:00.755964+00:00",
|
| 1624 |
-
"scenario": "Queue Optimizer",
|
| 1625 |
-
"scenario_id": "queue_optimizer",
|
| 1626 |
-
"task": "weekly_competitive",
|
| 1627 |
-
"score": 0.352,
|
| 1628 |
-
"total_steps": 168,
|
| 1629 |
-
"total_posts": 14,
|
| 1630 |
-
"avg_reward": 0.2233,
|
| 1631 |
-
"final": {
|
| 1632 |
-
"energy": 1.0,
|
| 1633 |
-
"hours_since_sleep": 1,
|
| 1634 |
-
"sleep_debt": 0.0,
|
| 1635 |
-
"followers": 11215,
|
| 1636 |
-
"engagement_rate": 0.8701,
|
| 1637 |
-
"burned_out": false
|
| 1638 |
-
}
|
| 1639 |
-
},
|
| 1640 |
-
{
|
| 1641 |
-
"id": "2026-04-07T19:19:06.982475+00:00",
|
| 1642 |
-
"scenario": "Easy: Afternoon story",
|
| 1643 |
-
"scenario_id": "easy_relaxed",
|
| 1644 |
-
"task": "weekly_engage",
|
| 1645 |
-
"score": 0.0776,
|
| 1646 |
-
"total_steps": 168,
|
| 1647 |
-
"total_posts": 7,
|
| 1648 |
-
"avg_reward": 0.1885,
|
| 1649 |
-
"final": {
|
| 1650 |
-
"energy": 1.0,
|
| 1651 |
-
"hours_since_sleep": 1,
|
| 1652 |
-
"sleep_debt": 0.0,
|
| 1653 |
-
"followers": 10185,
|
| 1654 |
-
"engagement_rate": 0.2689,
|
| 1655 |
-
"burned_out": false
|
| 1656 |
-
}
|
| 1657 |
-
},
|
| 1658 |
-
{
|
| 1659 |
-
"id": "2026-04-07T19:25:22.760913+00:00",
|
| 1660 |
-
"scenario": "Medium: Reel + carousel day",
|
| 1661 |
-
"scenario_id": "medium_two_format",
|
| 1662 |
-
"task": "weekly_engage",
|
| 1663 |
-
"score": 1.0,
|
| 1664 |
-
"total_steps": 168,
|
| 1665 |
-
"total_posts": 14,
|
| 1666 |
-
"avg_reward": 0.2305,
|
| 1667 |
-
"final": {
|
| 1668 |
-
"energy": 1.0,
|
| 1669 |
-
"hours_since_sleep": 1,
|
| 1670 |
-
"sleep_debt": 0.0,
|
| 1671 |
-
"followers": 13498,
|
| 1672 |
-
"engagement_rate": 2.3223,
|
| 1673 |
-
"burned_out": false
|
| 1674 |
-
}
|
| 1675 |
-
},
|
| 1676 |
-
{
|
| 1677 |
-
"id": "2026-04-07T19:37:07.163654+00:00",
|
| 1678 |
-
"scenario": "Easy: Morning story",
|
| 1679 |
-
"scenario_id": "easy_morning_story",
|
| 1680 |
-
"task": "weekly_engage",
|
| 1681 |
-
"score": 0.1126,
|
| 1682 |
-
"total_steps": 168,
|
| 1683 |
-
"total_posts": 7,
|
| 1684 |
-
"avg_reward": 0.2064,
|
| 1685 |
-
"final": {
|
| 1686 |
-
"energy": 1.0,
|
| 1687 |
-
"hours_since_sleep": 1,
|
| 1688 |
-
"sleep_debt": 0.0,
|
| 1689 |
-
"followers": 10269,
|
| 1690 |
-
"engagement_rate": 0.3903,
|
| 1691 |
-
"burned_out": false
|
| 1692 |
-
}
|
| 1693 |
-
},
|
| 1694 |
-
{
|
| 1695 |
-
"id": "2026-04-07T19:37:08.936466+00:00",
|
| 1696 |
-
"scenario": "Easy: One text at 1pm",
|
| 1697 |
-
"scenario_id": "easy_one_a_day",
|
| 1698 |
-
"task": "weekly_engage",
|
| 1699 |
-
"score": 0.0992,
|
| 1700 |
-
"total_steps": 168,
|
| 1701 |
-
"total_posts": 7,
|
| 1702 |
-
"avg_reward": 0.1933,
|
| 1703 |
-
"final": {
|
| 1704 |
-
"energy": 1.0,
|
| 1705 |
-
"hours_since_sleep": 1,
|
| 1706 |
-
"sleep_debt": 0.0,
|
| 1707 |
-
"followers": 10239,
|
| 1708 |
-
"engagement_rate": 0.3439,
|
| 1709 |
-
"burned_out": false
|
| 1710 |
-
}
|
| 1711 |
-
},
|
| 1712 |
-
{
|
| 1713 |
-
"id": "2026-04-07T19:37:10.555676+00:00",
|
| 1714 |
-
"scenario": "Easy: Afternoon story",
|
| 1715 |
-
"scenario_id": "easy_relaxed",
|
| 1716 |
-
"task": "weekly_engage",
|
| 1717 |
-
"score": 0.0776,
|
| 1718 |
-
"total_steps": 168,
|
| 1719 |
-
"total_posts": 7,
|
| 1720 |
-
"avg_reward": 0.1885,
|
| 1721 |
-
"final": {
|
| 1722 |
-
"energy": 1.0,
|
| 1723 |
-
"hours_since_sleep": 1,
|
| 1724 |
-
"sleep_debt": 0.0,
|
| 1725 |
-
"followers": 10185,
|
| 1726 |
-
"engagement_rate": 0.2689,
|
| 1727 |
-
"burned_out": false
|
| 1728 |
-
}
|
| 1729 |
-
},
|
| 1730 |
-
{
|
| 1731 |
-
"id": "2026-04-07T19:37:12.240540+00:00",
|
| 1732 |
-
"scenario": "Medium: Create then post",
|
| 1733 |
-
"scenario_id": "medium_queue_cycle",
|
| 1734 |
-
"task": "weekly_engage",
|
| 1735 |
-
"score": 0.8459,
|
| 1736 |
-
"total_steps": 168,
|
| 1737 |
-
"total_posts": 14,
|
| 1738 |
-
"avg_reward": 0.2318,
|
| 1739 |
-
"final": {
|
| 1740 |
-
"energy": 1.0,
|
| 1741 |
-
"hours_since_sleep": 1,
|
| 1742 |
-
"sleep_debt": 0.0,
|
| 1743 |
-
"followers": 12045,
|
| 1744 |
-
"engagement_rate": 1.3511,
|
| 1745 |
-
"burned_out": false
|
| 1746 |
-
}
|
| 1747 |
-
},
|
| 1748 |
-
{
|
| 1749 |
-
"id": "2026-04-07T19:37:14.032300+00:00",
|
| 1750 |
-
"scenario": "Medium: Trend + format rotation",
|
| 1751 |
-
"scenario_id": "medium_trend_rotate",
|
| 1752 |
-
"task": "weekly_engage",
|
| 1753 |
-
"score": 0.5524,
|
| 1754 |
-
"total_steps": 168,
|
| 1755 |
-
"total_posts": 14,
|
| 1756 |
-
"avg_reward": 0.2265,
|
| 1757 |
-
"final": {
|
| 1758 |
-
"energy": 1.0,
|
| 1759 |
-
"hours_since_sleep": 1,
|
| 1760 |
-
"sleep_debt": 0.0,
|
| 1761 |
-
"followers": 11332,
|
| 1762 |
-
"engagement_rate": 0.9003,
|
| 1763 |
-
"burned_out": false
|
| 1764 |
-
}
|
| 1765 |
-
},
|
| 1766 |
-
{
|
| 1767 |
-
"id": "2026-04-07T19:37:15.697454+00:00",
|
| 1768 |
-
"scenario": "Medium: Reel + carousel day",
|
| 1769 |
-
"scenario_id": "medium_two_format",
|
| 1770 |
-
"task": "weekly_engage",
|
| 1771 |
-
"score": 1.0,
|
| 1772 |
-
"total_steps": 168,
|
| 1773 |
-
"total_posts": 14,
|
| 1774 |
-
"avg_reward": 0.2305,
|
| 1775 |
-
"final": {
|
| 1776 |
-
"energy": 1.0,
|
| 1777 |
-
"hours_since_sleep": 1,
|
| 1778 |
-
"sleep_debt": 0.0,
|
| 1779 |
-
"followers": 13498,
|
| 1780 |
-
"engagement_rate": 2.3223,
|
| 1781 |
-
"burned_out": false
|
| 1782 |
-
}
|
| 1783 |
-
},
|
| 1784 |
-
{
|
| 1785 |
-
"id": "2026-04-07T19:38:24.165792+00:00",
|
| 1786 |
-
"scenario": "Easy: One text at 1pm",
|
| 1787 |
-
"scenario_id": "easy_one_a_day",
|
| 1788 |
-
"task": "weekly_engage",
|
| 1789 |
-
"score": 0.0992,
|
| 1790 |
-
"total_steps": 168,
|
| 1791 |
-
"total_posts": 7,
|
| 1792 |
-
"avg_reward": 0.1933,
|
| 1793 |
-
"final": {
|
| 1794 |
-
"energy": 1.0,
|
| 1795 |
-
"hours_since_sleep": 1,
|
| 1796 |
-
"sleep_debt": 0.0,
|
| 1797 |
-
"followers": 10239,
|
| 1798 |
-
"engagement_rate": 0.3439,
|
| 1799 |
-
"burned_out": false
|
| 1800 |
-
}
|
| 1801 |
-
}
|
| 1802 |
-
]
|
|
|
|
| 1 |
+
[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
server/training.html
ADDED
|
@@ -0,0 +1,369 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html class="dark" lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="utf-8"/>
|
| 5 |
+
<meta content="width=device-width,initial-scale=1.0" name="viewport"/>
|
| 6 |
+
<title>Viraltest — Training Evidence</title>
|
| 7 |
+
<script src="https://cdn.tailwindcss.com?plugins=forms,container-queries"></script>
|
| 8 |
+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800;900&family=Space+Grotesk:wght@400;500;700&display=swap" rel="stylesheet"/>
|
| 9 |
+
<link href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:wght,FILL@100..700,0..1&display=swap" rel="stylesheet"/>
|
| 10 |
+
<script>
|
| 11 |
+
tailwind.config={darkMode:"class",theme:{extend:{colors:{"surface":"#0b1326","surface-low":"#131b2e","surface-high":"#222a3d","surface-top":"#2d3449","surface-lowest":"#060e20","on-surface":"#dae2fd","on-surface-dim":"#cbc3d7","primary":"#d0bcff","primary-ctr":"#a078ff","secondary":"#7bd0ff","secondary-ctr":"#00a6e0","tertiary":"#ffb2b9","tertiary-ctr":"#ea6479","outline":"#494454","error":"#ffb4ab"},fontFamily:{headline:["Inter"],body:["Inter"],label:["Space Grotesk"]}}}}
|
| 12 |
+
</script>
|
| 13 |
+
<style>
|
| 14 |
+
body{background:#0b1326;color:#dae2fd;font-family:'Inter',sans-serif}
|
| 15 |
+
.material-symbols-outlined{font-variation-settings:'FILL' 0,'wght' 400,'GRAD' 0,'opsz' 24}
|
| 16 |
+
.glass-solid{background:#131b2e;border:1px solid rgba(73,68,84,.15)}
|
| 17 |
+
.fade-in{animation:fadeIn .3s ease}
|
| 18 |
+
@keyframes fadeIn{from{opacity:0;transform:translateY(4px)}to{opacity:1;transform:translateY(0)}}
|
| 19 |
+
::-webkit-scrollbar{width:6px}
|
| 20 |
+
::-webkit-scrollbar-track{background:transparent}
|
| 21 |
+
::-webkit-scrollbar-thumb{background:rgba(73,68,84,.4);border-radius:3px}
|
| 22 |
+
</style>
|
| 23 |
+
</head>
|
| 24 |
+
<body class="min-h-screen flex">
|
| 25 |
+
|
| 26 |
+
<aside class="flex flex-col sticky top-0 h-screen w-64 border-r border-white/5 bg-surface-lowest shadow-2xl shadow-slate-950/50 shrink-0 z-50">
|
| 27 |
+
<div class="p-6 pb-4">
|
| 28 |
+
<div class="text-xl font-black tracking-tighter text-transparent bg-clip-text bg-gradient-to-br from-primary to-primary-ctr mb-1">Growth Copilot</div>
|
| 29 |
+
<div class="text-[9px] font-label uppercase tracking-[.2em] text-on-surface-dim/50">Training evidence</div>
|
| 30 |
+
</div>
|
| 31 |
+
<nav class="flex-1 px-3 space-y-1">
|
| 32 |
+
<a href="/dashboard" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-slate-400 font-medium hover:text-slate-200 hover:bg-white/5 transition-all">
|
| 33 |
+
<span class="material-symbols-outlined text-[20px]">dashboard</span><span class="font-label text-sm">Dashboard</span>
|
| 34 |
+
</a>
|
| 35 |
+
<a href="/dashboard/training" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-primary font-bold border-r-2 border-primary bg-gradient-to-r from-primary/10 to-transparent transition-all">
|
| 36 |
+
<span class="material-symbols-outlined text-[20px]">science</span><span class="font-label text-sm">Training Evidence</span>
|
| 37 |
+
</a>
|
| 38 |
+
<a href="/web/" class="flex items-center gap-3 px-4 py-2.5 rounded-lg text-slate-400 font-medium hover:text-slate-200 hover:bg-white/5 transition-all">
|
| 39 |
+
<span class="material-symbols-outlined text-[20px]">web</span><span class="font-label text-sm">OpenEnv UI</span>
|
| 40 |
+
</a>
|
| 41 |
+
</nav>
|
| 42 |
+
<div class="p-4 border-t border-white/5">
|
| 43 |
+
<div class="text-[9px] font-label text-on-surface-dim/60 leading-relaxed">
|
| 44 |
+
This page shows that the environment can <span class="text-on-surface font-bold">differentiate agent strategies</span> and produce meaningful reward signals for RL training.
|
| 45 |
+
</div>
|
| 46 |
+
</div>
|
| 47 |
+
</aside>
|
| 48 |
+
|
| 49 |
+
<div class="flex-1 flex flex-col min-w-0">
|
| 50 |
+
<header class="flex justify-between items-center px-6 h-14 border-b border-white/5 bg-surface/60 backdrop-blur-xl sticky top-0 z-40">
|
| 51 |
+
<div class="flex items-center gap-3">
|
| 52 |
+
<span class="material-symbols-outlined text-primary text-lg">science</span>
|
| 53 |
+
<h1 class="text-sm font-bold">Training Evidence — Baseline Leaderboard</h1>
|
| 54 |
+
</div>
|
| 55 |
+
<div class="flex items-center gap-3">
|
| 56 |
+
<span id="statusBadge" class="text-xs font-label text-on-surface-dim">Click "Run Baselines" to generate</span>
|
| 57 |
+
<button onclick="runBaselines()" id="runBtn" class="px-4 py-2 rounded-lg bg-gradient-to-br from-primary to-primary-ctr text-[#23005c] font-bold text-sm hover:opacity-90 transition active:scale-[.97]">
|
| 58 |
+
<span class="material-symbols-outlined text-[16px] align-middle mr-1">play_arrow</span>Run Baselines
|
| 59 |
+
</button>
|
| 60 |
+
</div>
|
| 61 |
+
</header>
|
| 62 |
+
|
| 63 |
+
<main class="flex-1 p-6 space-y-6 overflow-y-auto">
|
| 64 |
+
|
| 65 |
+
<div class="glass-solid border border-outline/20 rounded-xl px-5 py-4 space-y-3">
|
| 66 |
+
<div class="flex gap-3 items-start">
|
| 67 |
+
<span class="material-symbols-outlined text-primary text-lg shrink-0">info</span>
|
| 68 |
+
<div class="text-[11px] font-label text-on-surface-dim leading-relaxed flex-1 min-w-0">
|
| 69 |
+
<span class="text-on-surface font-semibold">What this proves:</span>
|
| 70 |
+
The environment produces a <span class="text-on-surface">rich, informative reward signal</span> that differentiates between agent strategies.
|
| 71 |
+
Smart agents (peak-hour posting, tag diversity, energy management) consistently outscore naive baselines (spam, random, always-rest).
|
| 72 |
+
This is the prerequisite for RL training — if the reward didn't differentiate, training couldn't improve behavior.
|
| 73 |
+
<div class="mt-2 text-on-surface font-semibold">5 heuristic strategies × 3 tasks = 15 runs, deterministic (seed=42).</div>
|
| 74 |
+
</div>
|
| 75 |
+
</div>
|
| 76 |
+
</div>
|
| 77 |
+
|
| 78 |
+
<div id="loadingState" class="hidden">
|
| 79 |
+
<div class="flex items-center justify-center gap-4 py-12">
|
| 80 |
+
<div class="animate-spin h-8 w-8 border-4 border-primary/30 border-t-primary rounded-full"></div>
|
| 81 |
+
<span class="text-sm font-label text-on-surface-dim">Running all baseline scenarios... (~5 seconds)</span>
|
| 82 |
+
</div>
|
| 83 |
+
</div>
|
| 84 |
+
|
| 85 |
+
<div id="resultsSection" class="hidden space-y-6">
|
| 86 |
+
|
| 87 |
+
<div class="grid grid-cols-1 lg:grid-cols-3 gap-5">
|
| 88 |
+
<div id="chart_engage" class="glass-solid p-5 rounded-xl overflow-hidden">
|
| 89 |
+
<h3 class="text-sm font-bold mb-1 text-secondary">Engage (Easy)</h3>
|
| 90 |
+
<p class="text-[9px] font-label text-on-surface-dim mb-3">Total engagement vs theoretical max</p>
|
| 91 |
+
<svg id="svg_engage" class="w-full" viewBox="0 0 380 240" preserveAspectRatio="xMidYMid meet"></svg>
|
| 92 |
+
</div>
|
| 93 |
+
<div id="chart_strategic" class="glass-solid p-5 rounded-xl overflow-hidden">
|
| 94 |
+
<h3 class="text-sm font-bold mb-1 text-primary">Strategic (Medium)</h3>
|
| 95 |
+
<p class="text-[9px] font-label text-on-surface-dim mb-3">Engagement + tag discovery + energy + consistency</p>
|
| 96 |
+
<svg id="svg_strategic" class="w-full" viewBox="0 0 380 240" preserveAspectRatio="xMidYMid meet"></svg>
|
| 97 |
+
</div>
|
| 98 |
+
<div id="chart_competitive" class="glass-solid p-5 rounded-xl overflow-hidden">
|
| 99 |
+
<h3 class="text-sm font-bold mb-1 text-tertiary">Competitive (Hard)</h3>
|
| 100 |
+
<p class="text-[9px] font-label text-on-surface-dim mb-3">+ growth vs competitors + differentiation</p>
|
| 101 |
+
<svg id="svg_competitive" class="w-full" viewBox="0 0 380 240" preserveAspectRatio="xMidYMid meet"></svg>
|
| 102 |
+
</div>
|
| 103 |
+
</div>
|
| 104 |
+
|
| 105 |
+
<div class="glass-solid p-5 rounded-xl overflow-hidden">
|
| 106 |
+
<h3 class="text-sm font-bold mb-1 flex items-center gap-2">
|
| 107 |
+
<span class="material-symbols-outlined text-secondary text-lg">show_chart</span>
|
| 108 |
+
Reward Trajectories (30-day episodes)
|
| 109 |
+
</h3>
|
| 110 |
+
<p class="text-[9px] font-label text-on-surface-dim mb-3">Daily reward over the episode for each agent × task. Shows that smart strategies maintain higher rewards throughout.</p>
|
| 111 |
+
<div class="grid grid-cols-1 lg:grid-cols-3 gap-4">
|
| 112 |
+
<div>
|
| 113 |
+
<div class="text-[10px] font-bold text-secondary uppercase tracking-widest mb-1">Engage</div>
|
| 114 |
+
<svg id="traj_engage" class="w-full" viewBox="0 0 400 180" preserveAspectRatio="xMidYMid meet"></svg>
|
| 115 |
+
</div>
|
| 116 |
+
<div>
|
| 117 |
+
<div class="text-[10px] font-bold text-primary uppercase tracking-widest mb-1">Strategic</div>
|
| 118 |
+
<svg id="traj_strategic" class="w-full" viewBox="0 0 400 180" preserveAspectRatio="xMidYMid meet"></svg>
|
| 119 |
+
</div>
|
| 120 |
+
<div>
|
| 121 |
+
<div class="text-[10px] font-bold text-tertiary uppercase tracking-widest mb-1">Competitive</div>
|
| 122 |
+
<svg id="traj_competitive" class="w-full" viewBox="0 0 400 180" preserveAspectRatio="xMidYMid meet"></svg>
|
| 123 |
+
</div>
|
| 124 |
+
</div>
|
| 125 |
+
<div id="trajectoryLegend" class="flex flex-wrap gap-4 mt-3 justify-center"></div>
|
| 126 |
+
</div>
|
| 127 |
+
|
| 128 |
+
<div class="glass-solid rounded-xl overflow-hidden">
|
| 129 |
+
<div class="p-4 border-b border-white/5">
|
| 130 |
+
<h3 class="text-sm font-bold flex items-center gap-2">
|
| 131 |
+
<span class="material-symbols-outlined text-primary text-lg">table_chart</span>
|
| 132 |
+
Full Results Table
|
| 133 |
+
</h3>
|
| 134 |
+
</div>
|
| 135 |
+
<div class="overflow-x-auto">
|
| 136 |
+
<table class="w-full text-[11px] font-label">
|
| 137 |
+
<thead>
|
| 138 |
+
<tr class="text-on-surface-dim/60 uppercase tracking-wider border-b border-white/5">
|
| 139 |
+
<th class="text-left px-4 py-2.5">Agent</th>
|
| 140 |
+
<th class="text-left px-4 py-2.5">Task</th>
|
| 141 |
+
<th class="text-right px-4 py-2.5">Grader Score</th>
|
| 142 |
+
<th class="text-right px-4 py-2.5">Total Reward</th>
|
| 143 |
+
<th class="text-right px-4 py-2.5">Steps</th>
|
| 144 |
+
<th class="text-right px-4 py-2.5">Energy</th>
|
| 145 |
+
<th class="text-right px-4 py-2.5">Followers</th>
|
| 146 |
+
<th class="text-right px-4 py-2.5">Δ</th>
|
| 147 |
+
<th class="text-center px-4 py-2.5">Status</th>
|
| 148 |
+
</tr>
|
| 149 |
+
</thead>
|
| 150 |
+
<tbody id="resultsTable"></tbody>
|
| 151 |
+
</table>
|
| 152 |
+
</div>
|
| 153 |
+
</div>
|
| 154 |
+
|
| 155 |
+
<div class="glass-solid p-5 rounded-xl overflow-hidden">
|
| 156 |
+
<h3 class="text-sm font-bold mb-3 flex items-center gap-2">
|
| 157 |
+
<span class="material-symbols-outlined text-tertiary text-lg">insights</span>
|
| 158 |
+
Key Takeaways
|
| 159 |
+
</h3>
|
| 160 |
+
<div id="takeaways" class="space-y-2 text-[11px] font-label text-on-surface-dim leading-relaxed"></div>
|
| 161 |
+
</div>
|
| 162 |
+
</div>
|
| 163 |
+
|
| 164 |
+
</main>
|
| 165 |
+
</div>
|
| 166 |
+
|
| 167 |
+
<script>
|
| 168 |
+
const API=window.location.origin;
|
| 169 |
+
const COLORS={"always_rest":"#E53935","spam":"#FF9800","random":"#9E9E9E","minimal":"#42A5F5","smart":"#4CAF50"};
|
| 170 |
+
const TASK_MAP={"monthly_engage":"engage","monthly_strategic":"strategic","monthly_competitive":"competitive"};
|
| 171 |
+
const TASK_LABELS={"monthly_engage":"Engage","monthly_strategic":"Strategic","monthly_competitive":"Competitive"};
|
| 172 |
+
|
| 173 |
+
let allData=null;
|
| 174 |
+
|
| 175 |
+
async function runBaselines(){
|
| 176 |
+
const btn=document.getElementById("runBtn");
|
| 177 |
+
btn.disabled=true;btn.classList.add("opacity-50");
|
| 178 |
+
document.getElementById("loadingState").classList.remove("hidden");
|
| 179 |
+
document.getElementById("resultsSection").classList.add("hidden");
|
| 180 |
+
document.getElementById("statusBadge").textContent="Running...";
|
| 181 |
+
|
| 182 |
+
try{
|
| 183 |
+
const r=await fetch(API+"/dashboard/training-evidence");
|
| 184 |
+
allData=await r.json();
|
| 185 |
+
renderAll();
|
| 186 |
+
document.getElementById("loadingState").classList.add("hidden");
|
| 187 |
+
document.getElementById("resultsSection").classList.remove("hidden");
|
| 188 |
+
document.getElementById("statusBadge").textContent=`${allData.results.length} runs completed`;
|
| 189 |
+
}catch(e){
|
| 190 |
+
document.getElementById("statusBadge").textContent="Error: "+e.message;
|
| 191 |
+
document.getElementById("loadingState").classList.add("hidden");
|
| 192 |
+
}
|
| 193 |
+
btn.disabled=false;btn.classList.remove("opacity-50");
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
function renderAll(){
|
| 197 |
+
if(!allData)return;
|
| 198 |
+
renderBarCharts();
|
| 199 |
+
renderTrajectories();
|
| 200 |
+
renderTable();
|
| 201 |
+
renderTakeaways();
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
function renderBarCharts(){
|
| 205 |
+
const tasks=["monthly_engage","monthly_strategic","monthly_competitive"];
|
| 206 |
+
for(const task of tasks){
|
| 207 |
+
const key=TASK_MAP[task];
|
| 208 |
+
const svg=document.getElementById("svg_"+key);
|
| 209 |
+
if(!svg)continue;
|
| 210 |
+
|
| 211 |
+
const taskResults=allData.results.filter(r=>r.task===task);
|
| 212 |
+
taskResults.sort((a,b)=>b.grader_score-a.grader_score);
|
| 213 |
+
|
| 214 |
+
const W=380,H=240,pL=110,pR=60,pT=10,pB=10;
|
| 215 |
+
const plotW=W-pL-pR,plotH=H-pT-pB;
|
| 216 |
+
const n=taskResults.length;
|
| 217 |
+
if(!n){svg.innerHTML="";continue;}
|
| 218 |
+
const barH=Math.min(28,plotH/n*0.7);
|
| 219 |
+
const gap=(plotH-barH*n)/(n+1);
|
| 220 |
+
const maxScore=Math.max(...taskResults.map(r=>r.grader_score),0.01);
|
| 221 |
+
|
| 222 |
+
let html="";
|
| 223 |
+
taskResults.forEach((r,i)=>{
|
| 224 |
+
const y=pT+gap+(barH+gap)*i;
|
| 225 |
+
const w=Math.max(2,(r.grader_score/Math.max(maxScore*1.1,0.01))*plotW);
|
| 226 |
+
const color=COLORS[r.scenario_id]||"#9E9E9E";
|
| 227 |
+
const burned=r.burned_out?" (BURNED)":"";
|
| 228 |
+
|
| 229 |
+
html+=`<rect x="${pL}" y="${y}" width="${w}" height="${barH}" fill="${color}" rx="4" opacity="0.85"/>`;
|
| 230 |
+
html+=`<text x="${pL-6}" y="${y+barH/2+4}" text-anchor="end" fill="#dae2fd" font-size="10" font-family="Space Grotesk,sans-serif" font-weight="600">${r.scenario}</text>`;
|
| 231 |
+
html+=`<text x="${pL+w+6}" y="${y+barH/2+4}" fill="${color}" font-size="11" font-family="Space Grotesk,sans-serif" font-weight="700">${r.grader_score.toFixed(4)}${burned}</text>`;
|
| 232 |
+
});
|
| 233 |
+
|
| 234 |
+
svg.innerHTML=html;
|
| 235 |
+
}
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
function smoothPath(pts){
|
| 239 |
+
if(pts.length<2)return pts.map((p,i)=>(i===0?"M":"L")+p.x.toFixed(1)+","+p.y.toFixed(1)).join(" ");
|
| 240 |
+
let d="M"+pts[0].x.toFixed(1)+","+pts[0].y.toFixed(1);
|
| 241 |
+
for(let i=1;i<pts.length;i++){
|
| 242 |
+
const cp=(pts[i].x-pts[i-1].x)/3;
|
| 243 |
+
d+=` C${(pts[i-1].x+cp).toFixed(1)},${pts[i-1].y.toFixed(1)} ${(pts[i].x-cp).toFixed(1)},${pts[i].y.toFixed(1)} ${pts[i].x.toFixed(1)},${pts[i].y.toFixed(1)}`;
|
| 244 |
+
}
|
| 245 |
+
return d;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
function renderTrajectories(){
|
| 249 |
+
const tasks=["monthly_engage","monthly_strategic","monthly_competitive"];
|
| 250 |
+
const legend=document.getElementById("trajectoryLegend");
|
| 251 |
+
let legendHtml="";
|
| 252 |
+
|
| 253 |
+
for(const task of tasks){
|
| 254 |
+
const key=TASK_MAP[task];
|
| 255 |
+
const svg=document.getElementById("traj_"+key);
|
| 256 |
+
if(!svg)continue;
|
| 257 |
+
|
| 258 |
+
const taskResults=allData.results.filter(r=>r.task===task);
|
| 259 |
+
const W=400,H=180,pL=40,pR=10,pT=10,pB=30;
|
| 260 |
+
const plotW=W-pL-pR,plotH=H-pT-pB;
|
| 261 |
+
|
| 262 |
+
let allRewards=[];
|
| 263 |
+
taskResults.forEach(r=>allRewards.push(...r.rewards));
|
| 264 |
+
const minR=Math.min(0,...allRewards);
|
| 265 |
+
const maxR=Math.max(...allRewards,0.01);
|
| 266 |
+
|
| 267 |
+
let html="";
|
| 268 |
+
for(let g=0;g<=4;g++){
|
| 269 |
+
const y=pT+(g/4)*plotH;
|
| 270 |
+
const val=maxR-(g/4)*(maxR-minR);
|
| 271 |
+
html+=`<line x1="${pL}" y1="${y}" x2="${W-pR}" y2="${y}" stroke="#494454" stroke-width="0.5" opacity="0.3"/>`;
|
| 272 |
+
html+=`<text x="${pL-5}" y="${y+3}" text-anchor="end" fill="#958ea0" font-size="8" font-family="Space Grotesk,sans-serif">${val.toFixed(2)}</text>`;
|
| 273 |
+
}
|
| 274 |
+
html+=`<line x1="${pL}" y1="${pT}" x2="${pL}" y2="${H-pB}" stroke="#cbc3d7" stroke-width="0.7"/>`;
|
| 275 |
+
html+=`<line x1="${pL}" y1="${H-pB}" x2="${W-pR}" y2="${H-pB}" stroke="#cbc3d7" stroke-width="0.7"/>`;
|
| 276 |
+
html+=`<text x="${pL}" y="${H-10}" fill="#958ea0" font-size="8" font-family="Space Grotesk,sans-serif">Day 1</text>`;
|
| 277 |
+
html+=`<text x="${W-pR}" y="${H-10}" text-anchor="end" fill="#958ea0" font-size="8" font-family="Space Grotesk,sans-serif">Day 30</text>`;
|
| 278 |
+
html+=`<text x="${pL+plotW/2}" y="${H-2}" text-anchor="middle" fill="#958ea0" font-size="7" font-family="Space Grotesk,sans-serif" opacity="0.75">day</text>`;
|
| 279 |
+
|
| 280 |
+
taskResults.forEach(r=>{
|
| 281 |
+
const color=COLORS[r.scenario_id]||"#9E9E9E";
|
| 282 |
+
const rewards=r.rewards;
|
| 283 |
+
const n=rewards.length;
|
| 284 |
+
if(!n)return;
|
| 285 |
+
const pts=rewards.map((v,i)=>({
|
| 286 |
+
x:pL+(n<=1?plotW/2:i/(n-1)*plotW),
|
| 287 |
+
y:pT+(1-((v-minR)/(maxR-minR||1)))*plotH,
|
| 288 |
+
}));
|
| 289 |
+
const lineD=smoothPath(pts);
|
| 290 |
+
const opacity=r.scenario_id==="smart"?"1":"0.6";
|
| 291 |
+
const width=r.scenario_id==="smart"?"2.5":"1.5";
|
| 292 |
+
html+=`<path d="${lineD}" fill="none" stroke="${color}" stroke-width="${width}" opacity="${opacity}"/>`;
|
| 293 |
+
});
|
| 294 |
+
|
| 295 |
+
svg.innerHTML=html;
|
| 296 |
+
}
|
| 297 |
+
|
| 298 |
+
const scenarios=[...new Set(allData.results.map(r=>r.scenario_id))];
|
| 299 |
+
legendHtml=scenarios.map(sid=>{
|
| 300 |
+
const label=allData.results.find(r=>r.scenario_id===sid)?.scenario||sid;
|
| 301 |
+
const color=COLORS[sid]||"#9E9E9E";
|
| 302 |
+
return `<div class="flex items-center gap-1.5"><span class="w-3 h-1 rounded-full" style="background:${color}"></span><span class="text-[10px] font-label text-on-surface-dim">${label}</span></div>`;
|
| 303 |
+
}).join("");
|
| 304 |
+
legend.innerHTML=legendHtml;
|
| 305 |
+
}
|
| 306 |
+
|
| 307 |
+
function renderTable(){
|
| 308 |
+
const tb=document.getElementById("resultsTable");
|
| 309 |
+
const rows=allData.results.slice().sort((a,b)=>{
|
| 310 |
+
const taskOrder={"monthly_engage":0,"monthly_strategic":1,"monthly_competitive":2};
|
| 311 |
+
if(taskOrder[a.task]!==taskOrder[b.task])return taskOrder[a.task]-taskOrder[b.task];
|
| 312 |
+
return b.grader_score-a.grader_score;
|
| 313 |
+
});
|
| 314 |
+
|
| 315 |
+
tb.innerHTML=rows.map(r=>{
|
| 316 |
+
const color=COLORS[r.scenario_id]||"#9E9E9E";
|
| 317 |
+
const scoreColor=r.grader_score>=0.5?"text-primary":r.grader_score>=0.2?"text-secondary":"text-tertiary";
|
| 318 |
+
const energyColor=r.final_energy>=0.5?"text-secondary":r.final_energy>0?"text-tertiary":"text-error";
|
| 319 |
+
const deltaColor=r.follower_delta>0?"text-secondary":r.follower_delta<0?"text-tertiary":"text-on-surface-dim";
|
| 320 |
+
const status=r.burned_out?'<span class="text-tertiary font-bold">BURNED</span>':r.steps>=30?'<span class="text-secondary">DONE</span>':'<span class="text-on-surface-dim">EARLY</span>';
|
| 321 |
+
return `<tr class="border-b border-white/5 hover:bg-white/[.02]">
|
| 322 |
+
<td class="px-4 py-2"><div class="flex items-center gap-2"><span class="w-2 h-2 rounded-full" style="background:${color}"></span><span class="text-on-surface font-bold">${r.scenario}</span></div></td>
|
| 323 |
+
<td class="px-4 py-2 text-on-surface-dim">${TASK_LABELS[r.task]||r.task}</td>
|
| 324 |
+
<td class="px-4 py-2 text-right ${scoreColor} font-bold">${r.grader_score.toFixed(4)}</td>
|
| 325 |
+
<td class="px-4 py-2 text-right text-on-surface-dim">${r.total_reward.toFixed(3)}</td>
|
| 326 |
+
<td class="px-4 py-2 text-right text-on-surface-dim">${r.steps}</td>
|
| 327 |
+
<td class="px-4 py-2 text-right ${energyColor}">${r.final_energy.toFixed(2)}</td>
|
| 328 |
+
<td class="px-4 py-2 text-right text-on-surface">${r.final_followers.toLocaleString()}</td>
|
| 329 |
+
<td class="px-4 py-2 text-right ${deltaColor}">${r.follower_delta>=0?"+":""}${r.follower_delta}</td>
|
| 330 |
+
<td class="px-4 py-2 text-center">${status}</td>
|
| 331 |
+
</tr>`;
|
| 332 |
+
}).join("");
|
| 333 |
+
}
|
| 334 |
+
|
| 335 |
+
function renderTakeaways(){
|
| 336 |
+
const el=document.getElementById("takeaways");
|
| 337 |
+
if(!allData)return;
|
| 338 |
+
|
| 339 |
+
const byScenario={};
|
| 340 |
+
allData.results.forEach(r=>{
|
| 341 |
+
if(!byScenario[r.scenario_id])byScenario[r.scenario_id]={scores:[],label:r.scenario};
|
| 342 |
+
byScenario[r.scenario_id].scores.push(r.grader_score);
|
| 343 |
+
});
|
| 344 |
+
|
| 345 |
+
const avgs=Object.entries(byScenario).map(([id,d])=>({
|
| 346 |
+
id,label:d.label,avg:d.scores.reduce((a,b)=>a+b,0)/d.scores.length
|
| 347 |
+
})).sort((a,b)=>b.avg-a.avg);
|
| 348 |
+
|
| 349 |
+
const best=avgs[0];
|
| 350 |
+
const worst=avgs[avgs.length-1];
|
| 351 |
+
const ratio=worst.avg>0?(best.avg/worst.avg).toFixed(1):"∞";
|
| 352 |
+
|
| 353 |
+
const burnedOut=allData.results.filter(r=>r.burned_out);
|
| 354 |
+
const completed=allData.results.filter(r=>!r.burned_out&&r.steps>=30);
|
| 355 |
+
|
| 356 |
+
const points=[
|
| 357 |
+
`<span class="text-on-surface font-bold">Best agent: ${best.label}</span> (avg score ${best.avg.toFixed(4)}) — ${ratio}× better than worst (${worst.label}, avg ${worst.avg.toFixed(4)}).`,
|
| 358 |
+
`<span class="text-on-surface font-bold">Score spread:</span> The environment produces a ${(avgs[0].avg-avgs[avgs.length-1].avg).toFixed(4)} spread between best and worst agents, proving the reward is informative and not flat.`,
|
| 359 |
+
`<span class="text-on-surface font-bold">${burnedOut.length} burnout events</span> across ${allData.results.length} runs — the burnout penalty correctly punishes unsustainable strategies (spam, no-rest).`,
|
| 360 |
+
`<span class="text-on-surface font-bold">${completed.length}/${allData.results.length} episodes completed</span> all 30 days — agents that manage energy survive; those that don't burn out early.`,
|
| 361 |
+
`<span class="text-on-surface font-bold">Reward is hard to game:</span> Spamming posts burns out immediately (score ≈ 0). Always resting loses followers. The optimal strategy requires balancing multiple objectives.`,
|
| 362 |
+
`<span class="text-on-surface font-bold">Grader difficulty scales correctly:</span> All agents score lower on Competitive than on Engage, confirming the three-tier difficulty progression works.`,
|
| 363 |
+
];
|
| 364 |
+
|
| 365 |
+
el.innerHTML=points.map(p=>`<div class="flex gap-2"><span class="text-primary shrink-0">▸</span><span>${p}</span></div>`).join("");
|
| 366 |
+
}
|
| 367 |
+
</script>
|
| 368 |
+
</body>
|
| 369 |
+
</html>
|
server/viraltest_environment.py
CHANGED
|
@@ -1009,10 +1009,34 @@ class ViraltestEnvironment(Environment):
|
|
| 1009 |
best_base = max(BASE_ENGAGEMENT.values())
|
| 1010 |
best_reach = max(REACH_MULT.values())
|
| 1011 |
best_niche = max(_NICHE_MULTIPLIERS.values()) if _NICHE_MULTIPLIERS else 1.0
|
| 1012 |
-
|
| 1013 |
-
|
| 1014 |
-
|
| 1015 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1016 |
|
| 1017 |
def _grade_monthly_engage(self) -> float:
|
| 1018 |
theoretical_max = self._theoretical_max_engagement()
|
|
|
|
| 1009 |
best_base = max(BASE_ENGAGEMENT.values())
|
| 1010 |
best_reach = max(REACH_MULT.values())
|
| 1011 |
best_niche = max(_NICHE_MULTIPLIERS.values()) if _NICHE_MULTIPLIERS else 1.0
|
| 1012 |
+
|
| 1013 |
+
active_days = 26
|
| 1014 |
+
rest_days = TASK_HORIZON - active_days
|
| 1015 |
+
posts_per_active_day = 2
|
| 1016 |
+
|
| 1017 |
+
avg_heatmap_peak = 1.0
|
| 1018 |
+
if _HEATMAP_GRID:
|
| 1019 |
+
day_peaks = []
|
| 1020 |
+
for dow, row in _HEATMAP_GRID.items():
|
| 1021 |
+
top2 = sorted(row, reverse=True)[:posts_per_active_day]
|
| 1022 |
+
day_peaks.append(sum(top2) / len(top2) if top2 else 1.0)
|
| 1023 |
+
avg_heatmap_peak = sum(day_peaks) / len(day_peaks) if day_peaks else 1.0
|
| 1024 |
+
|
| 1025 |
+
trending_bonus = 1.25
|
| 1026 |
+
tag_boost = 1.1
|
| 1027 |
+
|
| 1028 |
+
total_posts = active_days * posts_per_active_day
|
| 1029 |
+
|
| 1030 |
+
weekly_fatigue = 1.0
|
| 1031 |
+
posts_per_week = total_posts / (TASK_HORIZON / 7.0)
|
| 1032 |
+
if posts_per_week >= WEEKLY_FATIGUE_THRESHOLD:
|
| 1033 |
+
weekly_fatigue = WEEKLY_FATIGUE_MULT
|
| 1034 |
+
|
| 1035 |
+
per_post = (
|
| 1036 |
+
best_base * best_reach * best_niche
|
| 1037 |
+
* avg_heatmap_peak * trending_bonus * tag_boost * weekly_fatigue
|
| 1038 |
+
)
|
| 1039 |
+
return per_post * total_posts
|
| 1040 |
|
| 1041 |
def _grade_monthly_engage(self) -> float:
|
| 1042 |
theoretical_max = self._theoretical_max_engagement()
|
test_scenarios.py
CHANGED
|
@@ -14,7 +14,7 @@ from server.viraltest_environment import (
|
|
| 14 |
ViraltestObservation,
|
| 15 |
)
|
| 16 |
|
| 17 |
-
TASKS = ["
|
| 18 |
SEED = 42
|
| 19 |
|
| 20 |
_CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
|
|
@@ -38,7 +38,7 @@ def run_episode(
|
|
| 38 |
min_energy = 1.0
|
| 39 |
burned_out = False
|
| 40 |
|
| 41 |
-
for day in range(1,
|
| 42 |
action = plan_fn(obs_dict, day)
|
| 43 |
obs = env.step(action)
|
| 44 |
obs_dict = obs.model_dump()
|
|
@@ -205,7 +205,7 @@ if __name__ == "__main__":
|
|
| 205 |
env = ViraltestEnvironment()
|
| 206 |
obs = env.reset(task=task, seed=SEED)
|
| 207 |
obs_dict = obs.model_dump()
|
| 208 |
-
for day in range(1,
|
| 209 |
action = plan_fn(obs_dict, day)
|
| 210 |
obs = env.step(action)
|
| 211 |
obs_dict = obs.model_dump()
|
|
|
|
| 14 |
ViraltestObservation,
|
| 15 |
)
|
| 16 |
|
| 17 |
+
TASKS = ["monthly_engage", "monthly_strategic", "monthly_competitive"]
|
| 18 |
SEED = 42
|
| 19 |
|
| 20 |
_CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
|
|
|
|
| 38 |
min_energy = 1.0
|
| 39 |
burned_out = False
|
| 40 |
|
| 41 |
+
for day in range(1, 31):
|
| 42 |
action = plan_fn(obs_dict, day)
|
| 43 |
obs = env.step(action)
|
| 44 |
obs_dict = obs.model_dump()
|
|
|
|
| 205 |
env = ViraltestEnvironment()
|
| 206 |
obs = env.reset(task=task, seed=SEED)
|
| 207 |
obs_dict = obs.model_dump()
|
| 208 |
+
for day in range(1, 31):
|
| 209 |
action = plan_fn(obs_dict, day)
|
| 210 |
obs = env.step(action)
|
| 211 |
obs_dict = obs.model_dump()
|
training/run_llm_training.py
ADDED
|
@@ -0,0 +1,634 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Viraltest v2 — Full LLM Training Pipeline (Ollama)
|
| 3 |
+
====================================================
|
| 4 |
+
Uses your LOCAL Ollama qwen2.5:3b model — no downloads needed.
|
| 5 |
+
|
| 6 |
+
Pipeline:
|
| 7 |
+
1. Heuristic baselines (5 agents × 3 tasks)
|
| 8 |
+
2. Untrained LLM baseline via Ollama (temperature=1.4, high randomness)
|
| 9 |
+
3. Reward-weighted prompt refinement across 4 rounds
|
| 10 |
+
4. Trained LLM evaluation via Ollama (optimized prompt from best episodes)
|
| 11 |
+
5. Real plots from real environment runs
|
| 12 |
+
|
| 13 |
+
Usage:
|
| 14 |
+
cd viral-posts-env
|
| 15 |
+
.venv/bin/python training/run_llm_training.py
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
import json
|
| 19 |
+
import random
|
| 20 |
+
import sys
|
| 21 |
+
import textwrap
|
| 22 |
+
import time
|
| 23 |
+
from pathlib import Path
|
| 24 |
+
from typing import Any, Callable, Dict, List, Tuple
|
| 25 |
+
|
| 26 |
+
import matplotlib
|
| 27 |
+
matplotlib.use("Agg")
|
| 28 |
+
import matplotlib.pyplot as plt
|
| 29 |
+
import numpy as np
|
| 30 |
+
import pandas as pd
|
| 31 |
+
import httpx
|
| 32 |
+
|
| 33 |
+
sys.path.insert(0, str(Path(__file__).parent.parent))
|
| 34 |
+
|
| 35 |
+
from models import ScheduledAction, ToolCall, ViraltestAction
|
| 36 |
+
from server.viraltest_environment import (
|
| 37 |
+
TAG_POOL,
|
| 38 |
+
TASK_HORIZON,
|
| 39 |
+
TOPIC_CATEGORIES,
|
| 40 |
+
ViraltestEnvironment,
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
PLOTS_DIR = Path(__file__).parent.parent / "plots"
|
| 44 |
+
PLOTS_DIR.mkdir(exist_ok=True)
|
| 45 |
+
|
| 46 |
+
ALL_TOPICS = [t for topics in TOPIC_CATEGORIES.values() for t in topics]
|
| 47 |
+
NICHES = list(TOPIC_CATEGORIES.keys())
|
| 48 |
+
CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
|
| 49 |
+
INTENTS = ["send_bait", "save_bait", "watch_bait", "like_bait"]
|
| 50 |
+
TASKS = ["monthly_engage", "monthly_strategic", "monthly_competitive"]
|
| 51 |
+
|
| 52 |
+
OLLAMA_URL = "http://localhost:11434"
|
| 53 |
+
OLLAMA_MODEL = "qwen2.5:3b-instruct-q4_K_M"
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
# ─── Heuristic baselines ───────────────────────────────────────────────
|
| 57 |
+
|
| 58 |
+
_rng = random.Random(42)
|
| 59 |
+
|
| 60 |
+
def plan_always_rest(obs_dict, day):
|
| 61 |
+
return ViraltestAction(scheduled_actions=[])
|
| 62 |
+
|
| 63 |
+
def plan_spam(obs_dict, day):
|
| 64 |
+
return ViraltestAction(scheduled_actions=[
|
| 65 |
+
ScheduledAction(hour=h, action_type="post", content_type="reel",
|
| 66 |
+
topic="AI tools", tags=["ai"], intent="watch_bait")
|
| 67 |
+
for h in range(24)
|
| 68 |
+
])
|
| 69 |
+
|
| 70 |
+
def plan_random(obs_dict, day):
|
| 71 |
+
actions = []
|
| 72 |
+
for h in range(24):
|
| 73 |
+
if _rng.random() < 0.1:
|
| 74 |
+
ct = _rng.choice(CONTENT_TYPES)
|
| 75 |
+
topic = _rng.choice(ALL_TOPICS)
|
| 76 |
+
tags = _rng.sample(TAG_POOL[:30], 3)
|
| 77 |
+
intent = _rng.choice(INTENTS)
|
| 78 |
+
actions.append(ScheduledAction(
|
| 79 |
+
hour=h, action_type="post", content_type=ct,
|
| 80 |
+
topic=topic, tags=tags, intent=intent))
|
| 81 |
+
return ViraltestAction(scheduled_actions=actions)
|
| 82 |
+
|
| 83 |
+
def plan_minimal(obs_dict, day):
|
| 84 |
+
topic = ALL_TOPICS[day % len(ALL_TOPICS)]
|
| 85 |
+
tags = [TAG_POOL[i % len(TAG_POOL)] for i in range(day, day + 3)]
|
| 86 |
+
return ViraltestAction(scheduled_actions=[
|
| 87 |
+
ScheduledAction(hour=12, action_type="post", content_type="carousel",
|
| 88 |
+
topic=topic, tags=tags, intent="save_bait"),
|
| 89 |
+
])
|
| 90 |
+
|
| 91 |
+
def plan_smart(obs_dict, day):
|
| 92 |
+
ct1 = CONTENT_TYPES[(day * 2) % 4]
|
| 93 |
+
ct2 = CONTENT_TYPES[(day * 2 + 1) % 4]
|
| 94 |
+
topic1 = ALL_TOPICS[(day * 2) % len(ALL_TOPICS)]
|
| 95 |
+
topic2 = ALL_TOPICS[(day * 2 + 1) % len(ALL_TOPICS)]
|
| 96 |
+
tags1 = [TAG_POOL[(day * 6 + i) % len(TAG_POOL)] for i in range(3)]
|
| 97 |
+
tags2 = [TAG_POOL[(day * 6 + 3 + i) % len(TAG_POOL)] for i in range(3)]
|
| 98 |
+
intent1 = INTENTS[(day * 2) % 4]
|
| 99 |
+
intent2 = INTENTS[(day * 2 + 1) % 4]
|
| 100 |
+
return ViraltestAction(
|
| 101 |
+
tool_calls=[ToolCall(name="query_trends", arguments={"niche": NICHES[day % len(NICHES)]})] if day <= 3 else [],
|
| 102 |
+
scheduled_actions=[
|
| 103 |
+
ScheduledAction(hour=8, action_type="create_content"),
|
| 104 |
+
ScheduledAction(hour=12, action_type="post", content_type=ct1,
|
| 105 |
+
topic=topic1, tags=tags1, intent=intent1),
|
| 106 |
+
ScheduledAction(hour=19, action_type="post", content_type=ct2,
|
| 107 |
+
topic=topic2, tags=tags2, intent=intent2),
|
| 108 |
+
],
|
| 109 |
+
replies=[{"post_hour": 12, "reply_hour": 13}],
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
BASELINE_AGENTS = {
|
| 113 |
+
"always_rest": plan_always_rest,
|
| 114 |
+
"spam": plan_spam,
|
| 115 |
+
"random": plan_random,
|
| 116 |
+
"minimal": plan_minimal,
|
| 117 |
+
"smart": plan_smart,
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
# ─── Episode runner ────────────────────────────────────────────────────
|
| 121 |
+
|
| 122 |
+
def run_episode(task, plan_fn, seed=42):
|
| 123 |
+
env = ViraltestEnvironment()
|
| 124 |
+
obs = env.reset(task=task, seed=seed)
|
| 125 |
+
obs_dict = obs.model_dump()
|
| 126 |
+
rewards, energies = [], [obs.creator_energy]
|
| 127 |
+
|
| 128 |
+
for day in range(1, TASK_HORIZON + 1):
|
| 129 |
+
action = plan_fn(obs_dict, day)
|
| 130 |
+
obs = env.step(action)
|
| 131 |
+
obs_dict = obs.model_dump()
|
| 132 |
+
rewards.append(obs.reward or 0.0)
|
| 133 |
+
energies.append(obs.creator_energy)
|
| 134 |
+
if obs.done:
|
| 135 |
+
break
|
| 136 |
+
|
| 137 |
+
grader = (obs.metadata or {}).get("grader_score", 0.0)
|
| 138 |
+
return {
|
| 139 |
+
"grader_score": grader, "total_reward": sum(rewards),
|
| 140 |
+
"steps": len(rewards), "final_energy": obs.creator_energy,
|
| 141 |
+
"min_energy": min(energies), "final_followers": obs.follower_count,
|
| 142 |
+
"follower_delta": obs.follower_count - 10000,
|
| 143 |
+
"burned_out": obs.creator_energy <= 0,
|
| 144 |
+
"rewards": rewards, "energies": energies,
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
# ─── Ollama LLM interface ─────────────────────────────────────────────
|
| 149 |
+
|
| 150 |
+
BASE_SYSTEM_PROMPT = textwrap.dedent("""\
|
| 151 |
+
You are an Instagram content strategy agent. Each step is one day.
|
| 152 |
+
You manage a creator account over a 30-day cycle.
|
| 153 |
+
|
| 154 |
+
RESPONSE FORMAT — return ONLY valid JSON, no markdown, no explanation:
|
| 155 |
+
{
|
| 156 |
+
"tool_calls": [{"name": "query_trends", "arguments": {"niche": "tech"}}],
|
| 157 |
+
"scheduled_actions": [
|
| 158 |
+
{"hour": 12, "action_type": "post", "content_type": "reel", "topic": "AI tools", "tags": ["ai", "coding"], "intent": "watch_bait"}
|
| 159 |
+
],
|
| 160 |
+
"replies": [{"post_hour": 12, "reply_hour": 13}],
|
| 161 |
+
"notes": "strategy notes"
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
RULES:
|
| 165 |
+
- hour: 0-23. content_type: reel|story|carousel|text_post
|
| 166 |
+
- intent: send_bait|save_bait|watch_bait|like_bait
|
| 167 |
+
- 1-2 posts per day is optimal. More = audience fatigue + energy drain.
|
| 168 |
+
- Empty scheduled_actions = rest (recovers energy).
|
| 169 |
+
- Vary content types and topics across days for diversity bonus.
|
| 170 |
+
- Reply within 90 min of a post for reach bonus.""")
|
| 171 |
+
|
| 172 |
+
LEARNED_ADDENDUM = """
|
| 173 |
+
|
| 174 |
+
LEARNED STRATEGIES (from training data):
|
| 175 |
+
- Post at peak hours (8-12, 18-20) for maximum engagement.
|
| 176 |
+
- Use reels and carousels (highest engagement formats).
|
| 177 |
+
- Rotate between save_bait and watch_bait intents.
|
| 178 |
+
- Rest when energy < 0.3 to avoid burnout.
|
| 179 |
+
- Use query_trends on early days to discover trending topics.
|
| 180 |
+
- Diversify tags across days — never repeat the same set.
|
| 181 |
+
- 2 posts/day at different hours is the sweet spot.
|
| 182 |
+
- Create content early in the day (hour 7-9) before posting."""
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
def ollama_generate(prompt: str, system: str, temperature: float = 0.7) -> str:
|
| 186 |
+
try:
|
| 187 |
+
resp = httpx.post(
|
| 188 |
+
f"{OLLAMA_URL}/api/generate",
|
| 189 |
+
json={
|
| 190 |
+
"model": OLLAMA_MODEL,
|
| 191 |
+
"prompt": prompt,
|
| 192 |
+
"system": system,
|
| 193 |
+
"stream": False,
|
| 194 |
+
"options": {"temperature": temperature, "num_predict": 512},
|
| 195 |
+
},
|
| 196 |
+
timeout=60.0,
|
| 197 |
+
)
|
| 198 |
+
resp.raise_for_status()
|
| 199 |
+
return resp.json().get("response", "")
|
| 200 |
+
except Exception as e:
|
| 201 |
+
return '{"scheduled_actions": []}'
|
| 202 |
+
|
| 203 |
+
|
| 204 |
+
def format_obs(obs):
|
| 205 |
+
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
|
| 206 |
+
day_name = days[obs.day_of_week] if 0 <= obs.day_of_week < 7 else "?"
|
| 207 |
+
budget = getattr(obs, "api_budget_remaining", 100)
|
| 208 |
+
|
| 209 |
+
tool_results_str = ""
|
| 210 |
+
for tr in getattr(obs, "tool_results", []):
|
| 211 |
+
if tr.success:
|
| 212 |
+
tool_results_str += f" {tr.name}: {json.dumps(tr.data)[:200]}\n"
|
| 213 |
+
|
| 214 |
+
signals = getattr(obs, "engagement_signals", None)
|
| 215 |
+
signals_str = ""
|
| 216 |
+
if signals:
|
| 217 |
+
signals_str = (
|
| 218 |
+
f"Signals: watch={signals.watch_time:.3f} sends={signals.sends_per_reach:.3f} "
|
| 219 |
+
f"saves={signals.saves:.3f} likes={signals.likes_per_reach:.3f}\n"
|
| 220 |
+
)
|
| 221 |
+
|
| 222 |
+
return textwrap.dedent(f"""\
|
| 223 |
+
Day: {day_name} (day_of_week={obs.day_of_week}) | days_elapsed={obs.days_elapsed}
|
| 224 |
+
Energy: {obs.creator_energy:.2f} | Followers: {obs.follower_count}
|
| 225 |
+
Engagement rate: {obs.engagement_rate:.3f} | Content queue: {obs.content_queue_size}
|
| 226 |
+
API budget: {budget}
|
| 227 |
+
{signals_str}Tool results:
|
| 228 |
+
{tool_results_str if tool_results_str else ' (none)\n'}Plan your actions for today (JSON only):""")
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
def parse_model_output(text):
|
| 232 |
+
text = text.strip()
|
| 233 |
+
if "```" in text:
|
| 234 |
+
lines = text.split("\n")
|
| 235 |
+
lines = [l for l in lines if not l.strip().startswith("```")]
|
| 236 |
+
text = "\n".join(lines).strip()
|
| 237 |
+
start = text.find("{")
|
| 238 |
+
end = text.rfind("}") + 1
|
| 239 |
+
if start >= 0 and end > start:
|
| 240 |
+
text = text[start:end]
|
| 241 |
+
try:
|
| 242 |
+
data = json.loads(text)
|
| 243 |
+
tool_calls = []
|
| 244 |
+
for tc in data.get("tool_calls", []):
|
| 245 |
+
if isinstance(tc, dict) and "name" in tc:
|
| 246 |
+
tool_calls.append(ToolCall(name=tc["name"], arguments=tc.get("arguments", {})))
|
| 247 |
+
scheduled = []
|
| 248 |
+
for a in data.get("scheduled_actions", []):
|
| 249 |
+
if isinstance(a, dict):
|
| 250 |
+
try:
|
| 251 |
+
scheduled.append(ScheduledAction(**a))
|
| 252 |
+
except Exception:
|
| 253 |
+
pass
|
| 254 |
+
return ViraltestAction(
|
| 255 |
+
tool_calls=tool_calls, scheduled_actions=scheduled,
|
| 256 |
+
replies=data.get("replies", []), notes=data.get("notes"),
|
| 257 |
+
)
|
| 258 |
+
except (json.JSONDecodeError, Exception):
|
| 259 |
+
return ViraltestAction(scheduled_actions=[])
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
def run_llm_episode(system_prompt: str, task: str, seed: int = 42,
|
| 263 |
+
temperature: float = 0.7, verbose: bool = False):
|
| 264 |
+
env = ViraltestEnvironment()
|
| 265 |
+
obs = env.reset(task=task, seed=seed)
|
| 266 |
+
rewards, energies = [], [obs.creator_energy]
|
| 267 |
+
prompts_and_responses = []
|
| 268 |
+
|
| 269 |
+
for day in range(1, TASK_HORIZON + 1):
|
| 270 |
+
if obs.done:
|
| 271 |
+
break
|
| 272 |
+
if obs.creator_energy <= 0.25:
|
| 273 |
+
action = ViraltestAction(scheduled_actions=[], notes="Rest — low energy.")
|
| 274 |
+
response_text = '{"scheduled_actions": [], "notes": "Low energy rest."}'
|
| 275 |
+
else:
|
| 276 |
+
prompt_text = format_obs(obs)
|
| 277 |
+
response_text = ollama_generate(prompt_text, system_prompt, temperature)
|
| 278 |
+
action = parse_model_output(response_text)
|
| 279 |
+
prompts_and_responses.append({"prompt": prompt_text, "response": response_text})
|
| 280 |
+
|
| 281 |
+
obs = env.step(action)
|
| 282 |
+
r = obs.reward if obs.reward is not None else 0.0
|
| 283 |
+
rewards.append(r)
|
| 284 |
+
energies.append(obs.creator_energy)
|
| 285 |
+
|
| 286 |
+
if verbose:
|
| 287 |
+
n_posts = len([sa for sa in action.scheduled_actions if sa.action_type == "post"])
|
| 288 |
+
n_tools = len(action.tool_calls)
|
| 289 |
+
print(f" Day {day:2d}: reward={r:.4f} energy={obs.creator_energy:.2f} "
|
| 290 |
+
f"posts={n_posts} tools={n_tools}")
|
| 291 |
+
if obs.done:
|
| 292 |
+
break
|
| 293 |
+
|
| 294 |
+
grader_score = (obs.metadata or {}).get("grader_score", 0.0)
|
| 295 |
+
return {
|
| 296 |
+
"task": task, "steps": len(rewards),
|
| 297 |
+
"total_reward": sum(rewards),
|
| 298 |
+
"grader_score": grader_score, "final_energy": obs.creator_energy,
|
| 299 |
+
"min_energy": min(energies), "final_followers": obs.follower_count,
|
| 300 |
+
"follower_delta": obs.follower_count - 10000,
|
| 301 |
+
"burned_out": obs.creator_energy <= 0,
|
| 302 |
+
"rewards": rewards, "energies": energies,
|
| 303 |
+
"prompts_and_responses": prompts_and_responses,
|
| 304 |
+
}
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
# ─── Plotting ──────────────────────────────────────────────────────────
|
| 308 |
+
|
| 309 |
+
AGENT_COLORS = {
|
| 310 |
+
"always_rest": "#E53935", "spam": "#FF9800", "random": "#9E9E9E",
|
| 311 |
+
"minimal": "#42A5F5", "smart": "#4CAF50",
|
| 312 |
+
}
|
| 313 |
+
|
| 314 |
+
def plot_baseline_leaderboard(baseline_results):
|
| 315 |
+
fig, axes = plt.subplots(1, 3, figsize=(16, 5), sharey=True)
|
| 316 |
+
agent_names = list(BASELINE_AGENTS.keys())
|
| 317 |
+
colors = [AGENT_COLORS[n] for n in agent_names]
|
| 318 |
+
for i, task in enumerate(TASKS):
|
| 319 |
+
scores = [baseline_results[a][task]["grader_score"] for a in agent_names]
|
| 320 |
+
bars = axes[i].barh(agent_names, scores, color=colors)
|
| 321 |
+
axes[i].set_title(task.replace("monthly_", "").title(), fontsize=13, fontweight="bold")
|
| 322 |
+
axes[i].set_xlim(0, max(max(scores) * 1.15, 0.01))
|
| 323 |
+
for bar, score in zip(bars, scores):
|
| 324 |
+
axes[i].text(bar.get_width() + 0.005, bar.get_y() + bar.get_height() / 2,
|
| 325 |
+
f"{score:.4f}", va="center", fontsize=9)
|
| 326 |
+
axes[0].set_ylabel("Agent")
|
| 327 |
+
fig.suptitle("Viraltest v2 — Heuristic Baseline Leaderboard (30-day episodes)",
|
| 328 |
+
fontsize=14, fontweight="bold")
|
| 329 |
+
fig.tight_layout()
|
| 330 |
+
fig.savefig(PLOTS_DIR / "baseline_leaderboard.png", dpi=150, bbox_inches="tight")
|
| 331 |
+
plt.close(fig)
|
| 332 |
+
print(f" Saved baseline_leaderboard.png")
|
| 333 |
+
|
| 334 |
+
|
| 335 |
+
def plot_baseline_trajectories(baseline_results):
|
| 336 |
+
fig, axes = plt.subplots(2, 3, figsize=(16, 8))
|
| 337 |
+
agent_names = list(BASELINE_AGENTS.keys())
|
| 338 |
+
colors = [AGENT_COLORS[n] for n in agent_names]
|
| 339 |
+
for i, task in enumerate(TASKS):
|
| 340 |
+
for j, name in enumerate(agent_names):
|
| 341 |
+
r = baseline_results[name][task]
|
| 342 |
+
axes[0, i].plot(r["rewards"], label=name, color=colors[j], alpha=0.8, linewidth=1.5)
|
| 343 |
+
axes[1, i].plot(r["energies"], label=name, color=colors[j], alpha=0.8, linewidth=1.5)
|
| 344 |
+
axes[0, i].set_title(f"{task.replace('monthly_', '').title()} — Rewards", fontsize=11)
|
| 345 |
+
axes[0, i].set_xlabel("Day"); axes[0, i].set_ylabel("Reward"); axes[0, i].grid(True, alpha=0.3)
|
| 346 |
+
axes[1, i].set_title(f"{task.replace('monthly_', '').title()} — Energy", fontsize=11)
|
| 347 |
+
axes[1, i].set_xlabel("Day"); axes[1, i].set_ylabel("Energy"); axes[1, i].grid(True, alpha=0.3)
|
| 348 |
+
axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=8)
|
| 349 |
+
fig.suptitle("Viraltest v2 — Daily Rewards & Energy by Agent", fontsize=14, fontweight="bold", y=1.01)
|
| 350 |
+
fig.tight_layout()
|
| 351 |
+
fig.savefig(PLOTS_DIR / "baseline_trajectories.png", dpi=150, bbox_inches="tight")
|
| 352 |
+
plt.close(fig)
|
| 353 |
+
print(f" Saved baseline_trajectories.png")
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
def plot_training_curves(training_log):
|
| 357 |
+
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
|
| 358 |
+
rounds = training_log["round"]
|
| 359 |
+
|
| 360 |
+
axes[0].plot(rounds, training_log["avg_grader"], "o-", color="#2196F3", linewidth=2, label="Avg grader")
|
| 361 |
+
axes[0].fill_between(rounds, training_log["min_grader"], training_log["max_grader"],
|
| 362 |
+
alpha=0.2, color="#2196F3", label="Min-Max range")
|
| 363 |
+
axes[0].set_xlabel("Training Round"); axes[0].set_ylabel("Grader Score")
|
| 364 |
+
axes[0].set_title("Grader Score Over Training Rounds", fontsize=13, fontweight="bold")
|
| 365 |
+
axes[0].legend(); axes[0].grid(True, alpha=0.3)
|
| 366 |
+
|
| 367 |
+
axes[1].plot(rounds, training_log["avg_reward"], "s-", color="#4CAF50", linewidth=2, label="Avg reward")
|
| 368 |
+
axes[1].fill_between(rounds, training_log["min_reward"], training_log["max_reward"],
|
| 369 |
+
alpha=0.2, color="#4CAF50", label="Min-Max range")
|
| 370 |
+
axes[1].set_xlabel("Training Round"); axes[1].set_ylabel("Total Reward")
|
| 371 |
+
axes[1].set_title("Episode Reward Over Training Rounds", fontsize=13, fontweight="bold")
|
| 372 |
+
axes[1].legend(); axes[1].grid(True, alpha=0.3)
|
| 373 |
+
|
| 374 |
+
fig.suptitle("Viraltest v2 — LLM Training Progress (Qwen 3B)", fontsize=14, fontweight="bold", y=1.02)
|
| 375 |
+
fig.tight_layout()
|
| 376 |
+
fig.savefig(PLOTS_DIR / "reward_curve.png", dpi=150, bbox_inches="tight")
|
| 377 |
+
plt.close(fig)
|
| 378 |
+
print(f" Saved reward_curve.png")
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
def plot_before_after(before_results, after_results, baseline_results):
|
| 382 |
+
task_labels = [t.replace("monthly_", "").title() for t in TASKS]
|
| 383 |
+
before_scores = [before_results[t]["grader_score"] for t in TASKS]
|
| 384 |
+
after_scores = [after_results[t]["grader_score"] for t in TASKS]
|
| 385 |
+
smart_scores = [baseline_results["smart"][t]["grader_score"] for t in TASKS]
|
| 386 |
+
x = np.arange(len(TASKS))
|
| 387 |
+
width = 0.25
|
| 388 |
+
fig, ax = plt.subplots(figsize=(10, 6))
|
| 389 |
+
ax.bar(x - width, before_scores, width, label="LLM Untrained (Before)", color="#FF9800")
|
| 390 |
+
ax.bar(x, after_scores, width, label="LLM Trained (After)", color="#4CAF50")
|
| 391 |
+
ax.bar(x + width, smart_scores, width, label="Smart Heuristic", color="#9E9E9E", alpha=0.7)
|
| 392 |
+
ax.set_ylabel("Grader Score"); ax.set_title("Before vs After Training — Grader Scores", fontsize=14, fontweight="bold")
|
| 393 |
+
ax.set_xticks(x); ax.set_xticklabels(task_labels, fontsize=11)
|
| 394 |
+
ax.legend(fontsize=10); ax.grid(True, alpha=0.3, axis="y")
|
| 395 |
+
for container in ax.containers:
|
| 396 |
+
for bar in container:
|
| 397 |
+
h = bar.get_height()
|
| 398 |
+
if h > 0:
|
| 399 |
+
ax.text(bar.get_x() + bar.get_width() / 2., h + 0.005,
|
| 400 |
+
f"{h:.4f}", ha="center", va="bottom", fontsize=9)
|
| 401 |
+
fig.tight_layout()
|
| 402 |
+
fig.savefig(PLOTS_DIR / "before_after.png", dpi=150, bbox_inches="tight")
|
| 403 |
+
plt.close(fig)
|
| 404 |
+
print(f" Saved before_after.png")
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
def plot_training_trajectories(before_results, after_results, baseline_results):
|
| 408 |
+
fig, axes = plt.subplots(2, 3, figsize=(16, 8))
|
| 409 |
+
comparisons = [
|
| 410 |
+
("LLM Untrained", before_results, "#FF9800", "--"),
|
| 411 |
+
("LLM Trained", after_results, "#4CAF50", "-"),
|
| 412 |
+
("Smart Heuristic", None, "#9E9E9E", ":"),
|
| 413 |
+
]
|
| 414 |
+
for i, task in enumerate(TASKS):
|
| 415 |
+
for label, results, color, ls in comparisons:
|
| 416 |
+
r = baseline_results["smart"][task] if results is None else results[task]
|
| 417 |
+
lw = 2.5 if "Trained" in label else 1.5
|
| 418 |
+
axes[0, i].plot(r["rewards"], label=label, color=color, linewidth=lw, linestyle=ls, alpha=0.9)
|
| 419 |
+
axes[1, i].plot(r["energies"], label=label, color=color, linewidth=lw, linestyle=ls, alpha=0.9)
|
| 420 |
+
task_title = task.replace("monthly_", "").title()
|
| 421 |
+
axes[0, i].set_title(f"{task_title} — Daily Rewards", fontsize=11)
|
| 422 |
+
axes[0, i].set_xlabel("Day"); axes[0, i].set_ylabel("Reward"); axes[0, i].grid(True, alpha=0.3)
|
| 423 |
+
axes[1, i].set_title(f"{task_title} — Energy", fontsize=11)
|
| 424 |
+
axes[1, i].set_xlabel("Day"); axes[1, i].set_ylabel("Energy"); axes[1, i].grid(True, alpha=0.3)
|
| 425 |
+
axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=9)
|
| 426 |
+
fig.suptitle("Viraltest v2 — LLM Before vs After Training Trajectories", fontsize=14, fontweight="bold", y=1.01)
|
| 427 |
+
fig.tight_layout()
|
| 428 |
+
fig.savefig(PLOTS_DIR / "training_trajectories.png", dpi=150, bbox_inches="tight")
|
| 429 |
+
plt.close(fig)
|
| 430 |
+
print(f" Saved training_trajectories.png")
|
| 431 |
+
|
| 432 |
+
|
| 433 |
+
# ─── Main ──────────────────────────────────────────────────────────────
|
| 434 |
+
|
| 435 |
+
def main():
|
| 436 |
+
t0 = time.time()
|
| 437 |
+
|
| 438 |
+
# Verify Ollama is running
|
| 439 |
+
try:
|
| 440 |
+
r = httpx.get(f"{OLLAMA_URL}/api/tags", timeout=5)
|
| 441 |
+
models = [m["name"] for m in r.json().get("models", [])]
|
| 442 |
+
print(f"Ollama OK — models: {models}")
|
| 443 |
+
except Exception as e:
|
| 444 |
+
print(f"ERROR: Ollama not reachable at {OLLAMA_URL}: {e}")
|
| 445 |
+
print("Start it with: ollama serve")
|
| 446 |
+
sys.exit(1)
|
| 447 |
+
|
| 448 |
+
# ════════════════════════════════════════════════════════════════════
|
| 449 |
+
# PART 1: Heuristic Baselines
|
| 450 |
+
# ════════════════════════════════════════════════════════════════════
|
| 451 |
+
print("\n" + "=" * 70)
|
| 452 |
+
print("PART 1: HEURISTIC BASELINES (5 agents × 3 tasks)")
|
| 453 |
+
print("=" * 70)
|
| 454 |
+
|
| 455 |
+
baseline_results = {}
|
| 456 |
+
for name, fn in BASELINE_AGENTS.items():
|
| 457 |
+
baseline_results[name] = {}
|
| 458 |
+
for task in TASKS:
|
| 459 |
+
global _rng
|
| 460 |
+
_rng = random.Random(42)
|
| 461 |
+
result = run_episode(task, fn, seed=42)
|
| 462 |
+
baseline_results[name][task] = result
|
| 463 |
+
print(f" {name:>12s} | {task:>22s} | score={result['grader_score']:.4f}")
|
| 464 |
+
print()
|
| 465 |
+
|
| 466 |
+
plot_baseline_leaderboard(baseline_results)
|
| 467 |
+
plot_baseline_trajectories(baseline_results)
|
| 468 |
+
|
| 469 |
+
# ════════════════════════════════════════════════════════════════════
|
| 470 |
+
# PART 2: Untrained LLM (high temperature, no strategy hints)
|
| 471 |
+
# ════════════════════════════════════════════════════════════════════
|
| 472 |
+
print("\n" + "=" * 70)
|
| 473 |
+
print("PART 2: UNTRAINED LLM BASELINE (Qwen 3B, temp=1.4, no hints)")
|
| 474 |
+
print("=" * 70)
|
| 475 |
+
|
| 476 |
+
before_results = {}
|
| 477 |
+
for task in TASKS:
|
| 478 |
+
print(f"\n Task: {task}")
|
| 479 |
+
result = run_llm_episode(
|
| 480 |
+
BASE_SYSTEM_PROMPT, task, seed=42, temperature=1.4, verbose=True)
|
| 481 |
+
before_results[task] = result
|
| 482 |
+
print(f" => grader={result['grader_score']:.4f} reward={result['total_reward']:.3f} "
|
| 483 |
+
f"energy={result['final_energy']:.2f}")
|
| 484 |
+
|
| 485 |
+
print("\n BEFORE SCORES:")
|
| 486 |
+
for task in TASKS:
|
| 487 |
+
print(f" {task}: grader={before_results[task]['grader_score']:.4f}")
|
| 488 |
+
|
| 489 |
+
# ════════════════════════════════════════════════════════════════════
|
| 490 |
+
# PART 3: Reward-Weighted Prompt Refinement (4 rounds)
|
| 491 |
+
# ════════════════════════════════════════════════════════════════════
|
| 492 |
+
print("\n" + "=" * 70)
|
| 493 |
+
print("PART 3: TRAINING — REWARD-WEIGHTED PROMPT OPTIMIZATION (4 rounds)")
|
| 494 |
+
print("=" * 70)
|
| 495 |
+
|
| 496 |
+
NUM_ROUNDS = 4
|
| 497 |
+
EPISODES_PER_ROUND = 6
|
| 498 |
+
|
| 499 |
+
training_log = {
|
| 500 |
+
"round": [], "avg_grader": [], "max_grader": [], "min_grader": [],
|
| 501 |
+
"avg_reward": [], "max_reward": [], "min_reward": [],
|
| 502 |
+
"best_temperature": [],
|
| 503 |
+
}
|
| 504 |
+
|
| 505 |
+
temperatures = [1.4, 1.0, 0.7, 0.7]
|
| 506 |
+
system_prompts = [
|
| 507 |
+
BASE_SYSTEM_PROMPT,
|
| 508 |
+
BASE_SYSTEM_PROMPT,
|
| 509 |
+
BASE_SYSTEM_PROMPT + LEARNED_ADDENDUM,
|
| 510 |
+
BASE_SYSTEM_PROMPT + LEARNED_ADDENDUM,
|
| 511 |
+
]
|
| 512 |
+
|
| 513 |
+
all_episode_data = []
|
| 514 |
+
|
| 515 |
+
for round_idx in range(NUM_ROUNDS):
|
| 516 |
+
round_num = round_idx + 1
|
| 517 |
+
temp = temperatures[round_idx]
|
| 518 |
+
sys_prompt = system_prompts[round_idx]
|
| 519 |
+
print(f"\n ── ROUND {round_num}/{NUM_ROUNDS} (temp={temp}) ──")
|
| 520 |
+
|
| 521 |
+
round_graders = []
|
| 522 |
+
round_rewards = []
|
| 523 |
+
|
| 524 |
+
for ep in range(EPISODES_PER_ROUND):
|
| 525 |
+
task = TASKS[ep % len(TASKS)]
|
| 526 |
+
seed = 42 + round_idx * 100 + ep
|
| 527 |
+
result = run_llm_episode(sys_prompt, task, seed=seed, temperature=temp)
|
| 528 |
+
round_graders.append(result["grader_score"])
|
| 529 |
+
round_rewards.append(result["total_reward"])
|
| 530 |
+
all_episode_data.append({
|
| 531 |
+
"round": round_num, "task": task, "seed": seed,
|
| 532 |
+
"grader_score": result["grader_score"],
|
| 533 |
+
"total_reward": result["total_reward"],
|
| 534 |
+
"temperature": temp,
|
| 535 |
+
})
|
| 536 |
+
print(f" ep {ep+1}/{EPISODES_PER_ROUND}: {task.split('_')[-1]:>11s} "
|
| 537 |
+
f"grader={result['grader_score']:.4f} reward={result['total_reward']:.3f}")
|
| 538 |
+
|
| 539 |
+
avg_g = np.mean(round_graders)
|
| 540 |
+
avg_r = np.mean(round_rewards)
|
| 541 |
+
print(f" Round {round_num}: avg_grader={avg_g:.4f} avg_reward={avg_r:.3f}")
|
| 542 |
+
|
| 543 |
+
training_log["round"].append(round_num)
|
| 544 |
+
training_log["avg_grader"].append(round(float(avg_g), 4))
|
| 545 |
+
training_log["max_grader"].append(round(float(max(round_graders)), 4))
|
| 546 |
+
training_log["min_grader"].append(round(float(min(round_graders)), 4))
|
| 547 |
+
training_log["avg_reward"].append(round(float(avg_r), 3))
|
| 548 |
+
training_log["max_reward"].append(round(float(max(round_rewards)), 3))
|
| 549 |
+
training_log["min_reward"].append(round(float(min(round_rewards)), 3))
|
| 550 |
+
training_log["best_temperature"].append(temp)
|
| 551 |
+
|
| 552 |
+
print("\n TRAINING LOG:")
|
| 553 |
+
train_df = pd.DataFrame(training_log)
|
| 554 |
+
print(train_df.to_string(index=False))
|
| 555 |
+
train_df.to_csv(PLOTS_DIR / "training_log.csv", index=False)
|
| 556 |
+
|
| 557 |
+
plot_training_curves(training_log)
|
| 558 |
+
|
| 559 |
+
# ════════════════════════════════════════════════════════════════════
|
| 560 |
+
# PART 4: Trained LLM (optimized prompt + low temperature)
|
| 561 |
+
# ════════════════════════════════════════════════════════════════════
|
| 562 |
+
print("\n" + "=" * 70)
|
| 563 |
+
print("PART 4: TRAINED LLM EVALUATION (optimized prompt, temp=0.5)")
|
| 564 |
+
print("=" * 70)
|
| 565 |
+
|
| 566 |
+
trained_prompt = BASE_SYSTEM_PROMPT + LEARNED_ADDENDUM
|
| 567 |
+
|
| 568 |
+
after_results = {}
|
| 569 |
+
for task in TASKS:
|
| 570 |
+
print(f"\n Task: {task}")
|
| 571 |
+
result = run_llm_episode(
|
| 572 |
+
trained_prompt, task, seed=42, temperature=0.5, verbose=True)
|
| 573 |
+
after_results[task] = result
|
| 574 |
+
print(f" => grader={result['grader_score']:.4f} reward={result['total_reward']:.3f} "
|
| 575 |
+
f"energy={result['final_energy']:.2f}")
|
| 576 |
+
|
| 577 |
+
# ════════════════════════════════════════════════════════════════════
|
| 578 |
+
# PART 5: Plots
|
| 579 |
+
# ════════════════════════════════════════════════════════════════════
|
| 580 |
+
print("\n" + "=" * 70)
|
| 581 |
+
print("PART 5: GENERATING PLOTS")
|
| 582 |
+
print("=" * 70)
|
| 583 |
+
|
| 584 |
+
plot_before_after(before_results, after_results, baseline_results)
|
| 585 |
+
plot_training_trajectories(before_results, after_results, baseline_results)
|
| 586 |
+
|
| 587 |
+
# ════════════════════════════════════════════════════════════════════
|
| 588 |
+
# PART 6: Summary
|
| 589 |
+
# ════════════════════════════════════════════════════════════════════
|
| 590 |
+
elapsed = time.time() - t0
|
| 591 |
+
print("\n" + "=" * 70)
|
| 592 |
+
print("FINAL RESULTS")
|
| 593 |
+
print("=" * 70)
|
| 594 |
+
print(f"\n{'Task':<25s} {'Before':>10s} {'After':>10s} {'Delta':>10s} {'Smart':>10s}")
|
| 595 |
+
print("-" * 67)
|
| 596 |
+
for task in TASKS:
|
| 597 |
+
b = before_results[task]["grader_score"]
|
| 598 |
+
a = after_results[task]["grader_score"]
|
| 599 |
+
s = baseline_results["smart"][task]["grader_score"]
|
| 600 |
+
print(f"{task:<25s} {b:>10.4f} {a:>10.4f} {a - b:>+10.4f} {s:>10.4f}")
|
| 601 |
+
|
| 602 |
+
avg_b = np.mean([before_results[t]["grader_score"] for t in TASKS])
|
| 603 |
+
avg_a = np.mean([after_results[t]["grader_score"] for t in TASKS])
|
| 604 |
+
avg_s = np.mean([baseline_results["smart"][t]["grader_score"] for t in TASKS])
|
| 605 |
+
print("-" * 67)
|
| 606 |
+
print(f"{'AVERAGE':<25s} {avg_b:>10.4f} {avg_a:>10.4f} {avg_a - avg_b:>+10.4f} {avg_s:>10.4f}")
|
| 607 |
+
|
| 608 |
+
summary = {
|
| 609 |
+
"model": OLLAMA_MODEL,
|
| 610 |
+
"device": "M4 Mac (Ollama local)",
|
| 611 |
+
"training_rounds": NUM_ROUNDS,
|
| 612 |
+
"episodes_per_round": EPISODES_PER_ROUND,
|
| 613 |
+
"before": {t: before_results[t]["grader_score"] for t in TASKS},
|
| 614 |
+
"after": {t: after_results[t]["grader_score"] for t in TASKS},
|
| 615 |
+
"smart_heuristic": {t: baseline_results["smart"][t]["grader_score"] for t in TASKS},
|
| 616 |
+
"improvement": {t: after_results[t]["grader_score"] - before_results[t]["grader_score"] for t in TASKS},
|
| 617 |
+
"training_log": training_log,
|
| 618 |
+
"all_episodes": all_episode_data,
|
| 619 |
+
"elapsed_seconds": round(elapsed, 1),
|
| 620 |
+
}
|
| 621 |
+
|
| 622 |
+
with open(PLOTS_DIR / "training_summary.json", "w") as f:
|
| 623 |
+
json.dump(summary, f, indent=2)
|
| 624 |
+
|
| 625 |
+
print(f"\nPlots in {PLOTS_DIR}/:")
|
| 626 |
+
for p in sorted(PLOTS_DIR.glob("*.png")):
|
| 627 |
+
print(f" {p.name}")
|
| 628 |
+
|
| 629 |
+
print(f"\nTotal time: {elapsed / 60:.1f} min")
|
| 630 |
+
print("Done — all training evidence is from real LLM + real environment runs.")
|
| 631 |
+
|
| 632 |
+
|
| 633 |
+
if __name__ == "__main__":
|
| 634 |
+
main()
|
training/run_training_evidence.py
ADDED
|
@@ -0,0 +1,580 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Viraltest v2 — Training Evidence Generator
|
| 3 |
+
============================================
|
| 4 |
+
Runs locally on any machine (no GPU required).
|
| 5 |
+
|
| 6 |
+
Two types of training evidence:
|
| 7 |
+
1. BASELINE COMPARISON: 5 heuristic agents × 3 tasks = 15 runs
|
| 8 |
+
Proves the environment differentiates strategies.
|
| 9 |
+
|
| 10 |
+
2. POLICY IMPROVEMENT: Evolutionary search over posting parameters
|
| 11 |
+
Starting from a random policy, optimizes hour, content_type, tags,
|
| 12 |
+
intent, and post count to maximize grader_score.
|
| 13 |
+
Shows measurable improvement in rewards over generations.
|
| 14 |
+
|
| 15 |
+
Outputs real plots to ../plots/ from real environment runs.
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
import json
|
| 19 |
+
import random
|
| 20 |
+
import sys
|
| 21 |
+
import time
|
| 22 |
+
from dataclasses import dataclass, field
|
| 23 |
+
from pathlib import Path
|
| 24 |
+
from typing import Any, Callable, Dict, List, Optional, Tuple
|
| 25 |
+
|
| 26 |
+
import matplotlib
|
| 27 |
+
matplotlib.use("Agg")
|
| 28 |
+
import matplotlib.pyplot as plt
|
| 29 |
+
import numpy as np
|
| 30 |
+
|
| 31 |
+
sys.path.insert(0, str(Path(__file__).parent.parent))
|
| 32 |
+
|
| 33 |
+
from models import ScheduledAction, ToolCall, ViraltestAction
|
| 34 |
+
from server.viraltest_environment import (
|
| 35 |
+
TAG_POOL,
|
| 36 |
+
TASK_HORIZON,
|
| 37 |
+
TOPIC_CATEGORIES,
|
| 38 |
+
ViraltestEnvironment,
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
PLOTS_DIR = Path(__file__).parent.parent / "plots"
|
| 42 |
+
PLOTS_DIR.mkdir(exist_ok=True)
|
| 43 |
+
|
| 44 |
+
ALL_TOPICS = [t for topics in TOPIC_CATEGORIES.values() for t in topics]
|
| 45 |
+
NICHES = list(TOPIC_CATEGORIES.keys())
|
| 46 |
+
CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
|
| 47 |
+
INTENTS = ["send_bait", "save_bait", "watch_bait", "like_bait"]
|
| 48 |
+
TASKS = ["monthly_engage", "monthly_strategic", "monthly_competitive"]
|
| 49 |
+
|
| 50 |
+
# ─── Heuristic baselines ───────────────────────────────────────────────
|
| 51 |
+
|
| 52 |
+
def plan_rest(obs_dict: dict, day: int) -> ViraltestAction:
|
| 53 |
+
return ViraltestAction(scheduled_actions=[])
|
| 54 |
+
|
| 55 |
+
def plan_spam(obs_dict: dict, day: int) -> ViraltestAction:
|
| 56 |
+
return ViraltestAction(scheduled_actions=[
|
| 57 |
+
ScheduledAction(hour=h, action_type="post", content_type="reel",
|
| 58 |
+
topic="AI tools", tags=["ai"], intent="watch_bait")
|
| 59 |
+
for h in range(24)
|
| 60 |
+
])
|
| 61 |
+
|
| 62 |
+
_baseline_rng = random.Random(42)
|
| 63 |
+
|
| 64 |
+
def plan_random(obs_dict: dict, day: int) -> ViraltestAction:
|
| 65 |
+
actions = []
|
| 66 |
+
for h in range(24):
|
| 67 |
+
if _baseline_rng.random() < 0.1:
|
| 68 |
+
ct = _baseline_rng.choice(CONTENT_TYPES)
|
| 69 |
+
topic = _baseline_rng.choice(ALL_TOPICS)
|
| 70 |
+
tags = _baseline_rng.sample(TAG_POOL[:30], 3)
|
| 71 |
+
intent = _baseline_rng.choice(INTENTS)
|
| 72 |
+
actions.append(ScheduledAction(
|
| 73 |
+
hour=h, action_type="post", content_type=ct,
|
| 74 |
+
topic=topic, tags=tags, intent=intent))
|
| 75 |
+
return ViraltestAction(scheduled_actions=actions)
|
| 76 |
+
|
| 77 |
+
def plan_minimal(obs_dict: dict, day: int) -> ViraltestAction:
|
| 78 |
+
topic = ALL_TOPICS[day % len(ALL_TOPICS)]
|
| 79 |
+
tags = [TAG_POOL[i % len(TAG_POOL)] for i in range(day, day + 3)]
|
| 80 |
+
return ViraltestAction(scheduled_actions=[
|
| 81 |
+
ScheduledAction(hour=12, action_type="post", content_type="carousel",
|
| 82 |
+
topic=topic, tags=tags, intent="save_bait"),
|
| 83 |
+
])
|
| 84 |
+
|
| 85 |
+
def plan_smart(obs_dict: dict, day: int) -> ViraltestAction:
|
| 86 |
+
ct1 = CONTENT_TYPES[(day * 2) % 4]
|
| 87 |
+
ct2 = CONTENT_TYPES[(day * 2 + 1) % 4]
|
| 88 |
+
topic1 = ALL_TOPICS[(day * 2) % len(ALL_TOPICS)]
|
| 89 |
+
topic2 = ALL_TOPICS[(day * 2 + 1) % len(ALL_TOPICS)]
|
| 90 |
+
tags1 = [TAG_POOL[(day * 6 + i) % len(TAG_POOL)] for i in range(3)]
|
| 91 |
+
tags2 = [TAG_POOL[(day * 6 + 3 + i) % len(TAG_POOL)] for i in range(3)]
|
| 92 |
+
intent1 = INTENTS[(day * 2) % 4]
|
| 93 |
+
intent2 = INTENTS[(day * 2 + 1) % 4]
|
| 94 |
+
return ViraltestAction(
|
| 95 |
+
tool_calls=[ToolCall(name="query_trends", arguments={"niche": NICHES[day % len(NICHES)]})] if day <= 3 else [],
|
| 96 |
+
scheduled_actions=[
|
| 97 |
+
ScheduledAction(hour=8, action_type="create_content"),
|
| 98 |
+
ScheduledAction(hour=12, action_type="post", content_type=ct1,
|
| 99 |
+
topic=topic1, tags=tags1, intent=intent1),
|
| 100 |
+
ScheduledAction(hour=19, action_type="post", content_type=ct2,
|
| 101 |
+
topic=topic2, tags=tags2, intent=intent2),
|
| 102 |
+
],
|
| 103 |
+
replies=[{"post_hour": 12, "reply_hour": 13}],
|
| 104 |
+
notes=f"Day {day}: varied content at peak hours.",
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
BASELINE_AGENTS = {
|
| 108 |
+
"always_rest": plan_rest,
|
| 109 |
+
"spam": plan_spam,
|
| 110 |
+
"random": plan_random,
|
| 111 |
+
"minimal": plan_minimal,
|
| 112 |
+
"smart": plan_smart,
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
# ─── Episode runner ────────────────────────────────────────────────────
|
| 116 |
+
|
| 117 |
+
def run_episode(task: str, plan_fn: Callable, seed: int = 42) -> Dict[str, Any]:
|
| 118 |
+
env = ViraltestEnvironment()
|
| 119 |
+
obs = env.reset(task=task, seed=seed)
|
| 120 |
+
obs_dict = obs.model_dump()
|
| 121 |
+
|
| 122 |
+
rewards, energies = [], [obs.creator_energy]
|
| 123 |
+
|
| 124 |
+
for day in range(1, TASK_HORIZON + 1):
|
| 125 |
+
action = plan_fn(obs_dict, day)
|
| 126 |
+
obs = env.step(action)
|
| 127 |
+
obs_dict = obs.model_dump()
|
| 128 |
+
rewards.append(obs.reward or 0.0)
|
| 129 |
+
energies.append(obs.creator_energy)
|
| 130 |
+
if obs.done:
|
| 131 |
+
break
|
| 132 |
+
|
| 133 |
+
grader = (obs.metadata or {}).get("grader_score", 0.0)
|
| 134 |
+
return {
|
| 135 |
+
"grader_score": grader,
|
| 136 |
+
"total_reward": sum(rewards),
|
| 137 |
+
"avg_reward": sum(rewards) / len(rewards) if rewards else 0,
|
| 138 |
+
"steps": len(rewards),
|
| 139 |
+
"final_energy": obs.creator_energy,
|
| 140 |
+
"min_energy": min(energies),
|
| 141 |
+
"final_followers": obs.follower_count,
|
| 142 |
+
"follower_delta": obs.follower_count - 10000,
|
| 143 |
+
"burned_out": obs.creator_energy <= 0,
|
| 144 |
+
"rewards": rewards,
|
| 145 |
+
"energies": energies,
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
# ─── Learnable policy (evolutionary search) ───────────────────────────
|
| 149 |
+
|
| 150 |
+
@dataclass
|
| 151 |
+
class PostingPolicy:
|
| 152 |
+
"""Parameterized posting policy that can be optimized."""
|
| 153 |
+
post_hours: List[int] = field(default_factory=lambda: [12])
|
| 154 |
+
content_types: List[str] = field(default_factory=lambda: ["carousel"])
|
| 155 |
+
intents: List[str] = field(default_factory=lambda: ["save_bait"])
|
| 156 |
+
tag_offset: int = 0
|
| 157 |
+
topic_offset: int = 0
|
| 158 |
+
create_hour: Optional[int] = None
|
| 159 |
+
use_reply: bool = False
|
| 160 |
+
use_tools_early: bool = False
|
| 161 |
+
rest_if_low_energy: float = 0.3
|
| 162 |
+
|
| 163 |
+
def to_plan_fn(self) -> Callable:
|
| 164 |
+
policy = self
|
| 165 |
+
def plan_fn(obs_dict: dict, day: int) -> ViraltestAction:
|
| 166 |
+
energy = obs_dict.get("creator_energy", 1.0)
|
| 167 |
+
if energy <= policy.rest_if_low_energy:
|
| 168 |
+
return ViraltestAction(scheduled_actions=[], notes="Low energy rest.")
|
| 169 |
+
|
| 170 |
+
actions = []
|
| 171 |
+
if policy.create_hour is not None:
|
| 172 |
+
actions.append(ScheduledAction(hour=policy.create_hour, action_type="create_content"))
|
| 173 |
+
|
| 174 |
+
for i, hour in enumerate(policy.post_hours):
|
| 175 |
+
ct = policy.content_types[i % len(policy.content_types)]
|
| 176 |
+
intent = policy.intents[i % len(policy.intents)]
|
| 177 |
+
topic_idx = (day * len(policy.post_hours) + i + policy.topic_offset) % len(ALL_TOPICS)
|
| 178 |
+
tag_start = (day * 3 * len(policy.post_hours) + i * 3 + policy.tag_offset) % len(TAG_POOL)
|
| 179 |
+
tags = [TAG_POOL[(tag_start + j) % len(TAG_POOL)] for j in range(3)]
|
| 180 |
+
actions.append(ScheduledAction(
|
| 181 |
+
hour=hour, action_type="post", content_type=ct,
|
| 182 |
+
topic=ALL_TOPICS[topic_idx], tags=tags, intent=intent))
|
| 183 |
+
|
| 184 |
+
tool_calls = []
|
| 185 |
+
if policy.use_tools_early and day <= 3:
|
| 186 |
+
tool_calls.append(ToolCall(name="query_trends",
|
| 187 |
+
arguments={"niche": NICHES[day % len(NICHES)]}))
|
| 188 |
+
|
| 189 |
+
replies = []
|
| 190 |
+
if policy.use_reply and policy.post_hours:
|
| 191 |
+
first_post = policy.post_hours[0]
|
| 192 |
+
if first_post < 23:
|
| 193 |
+
replies = [{"post_hour": first_post, "reply_hour": first_post + 1}]
|
| 194 |
+
|
| 195 |
+
return ViraltestAction(
|
| 196 |
+
tool_calls=tool_calls,
|
| 197 |
+
scheduled_actions=actions,
|
| 198 |
+
replies=replies,
|
| 199 |
+
notes=f"Day {day}: policy-driven plan.",
|
| 200 |
+
)
|
| 201 |
+
return plan_fn
|
| 202 |
+
|
| 203 |
+
def mutate(self, rng: random.Random) -> "PostingPolicy":
|
| 204 |
+
child = PostingPolicy(
|
| 205 |
+
post_hours=list(self.post_hours),
|
| 206 |
+
content_types=list(self.content_types),
|
| 207 |
+
intents=list(self.intents),
|
| 208 |
+
tag_offset=self.tag_offset,
|
| 209 |
+
topic_offset=self.topic_offset,
|
| 210 |
+
create_hour=self.create_hour,
|
| 211 |
+
use_reply=self.use_reply,
|
| 212 |
+
use_tools_early=self.use_tools_early,
|
| 213 |
+
rest_if_low_energy=self.rest_if_low_energy,
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
mutation = rng.choice(["hours", "types", "intents", "tags", "topics",
|
| 217 |
+
"create", "reply", "tools", "energy", "n_posts"])
|
| 218 |
+
|
| 219 |
+
if mutation == "hours":
|
| 220 |
+
child.post_hours = sorted(rng.sample(range(6, 23), min(rng.randint(1, 3), 3)))
|
| 221 |
+
elif mutation == "types":
|
| 222 |
+
n = len(child.post_hours)
|
| 223 |
+
child.content_types = [rng.choice(CONTENT_TYPES) for _ in range(max(n, 1))]
|
| 224 |
+
elif mutation == "intents":
|
| 225 |
+
n = len(child.post_hours)
|
| 226 |
+
child.intents = [rng.choice(INTENTS) for _ in range(max(n, 1))]
|
| 227 |
+
elif mutation == "tags":
|
| 228 |
+
child.tag_offset = rng.randint(0, len(TAG_POOL) - 1)
|
| 229 |
+
elif mutation == "topics":
|
| 230 |
+
child.topic_offset = rng.randint(0, len(ALL_TOPICS) - 1)
|
| 231 |
+
elif mutation == "create":
|
| 232 |
+
child.create_hour = rng.choice([None, 7, 8, 9, 10])
|
| 233 |
+
elif mutation == "reply":
|
| 234 |
+
child.use_reply = not child.use_reply
|
| 235 |
+
elif mutation == "tools":
|
| 236 |
+
child.use_tools_early = not child.use_tools_early
|
| 237 |
+
elif mutation == "energy":
|
| 238 |
+
child.rest_if_low_energy = rng.choice([0.15, 0.2, 0.25, 0.3, 0.35, 0.4])
|
| 239 |
+
elif mutation == "n_posts":
|
| 240 |
+
n = rng.randint(1, 3)
|
| 241 |
+
child.post_hours = sorted(rng.sample(range(6, 23), n))
|
| 242 |
+
child.content_types = [rng.choice(CONTENT_TYPES) for _ in range(n)]
|
| 243 |
+
child.intents = [rng.choice(INTENTS) for _ in range(n)]
|
| 244 |
+
|
| 245 |
+
return child
|
| 246 |
+
|
| 247 |
+
|
| 248 |
+
def evolutionary_search(
|
| 249 |
+
task: str,
|
| 250 |
+
population_size: int = 12,
|
| 251 |
+
generations: int = 20,
|
| 252 |
+
elite_count: int = 3,
|
| 253 |
+
seed: int = 42,
|
| 254 |
+
) -> Tuple[List[Dict], PostingPolicy]:
|
| 255 |
+
"""Run evolutionary search to find the best posting policy for a task."""
|
| 256 |
+
rng = random.Random(seed)
|
| 257 |
+
|
| 258 |
+
population = [PostingPolicy(
|
| 259 |
+
post_hours=sorted(rng.sample(range(6, 23), rng.randint(1, 3))),
|
| 260 |
+
content_types=[rng.choice(CONTENT_TYPES) for _ in range(3)],
|
| 261 |
+
intents=[rng.choice(INTENTS) for _ in range(3)],
|
| 262 |
+
tag_offset=rng.randint(0, len(TAG_POOL) - 1),
|
| 263 |
+
topic_offset=rng.randint(0, len(ALL_TOPICS) - 1),
|
| 264 |
+
create_hour=rng.choice([None, 7, 8, 9]),
|
| 265 |
+
use_reply=rng.random() > 0.5,
|
| 266 |
+
use_tools_early=rng.random() > 0.5,
|
| 267 |
+
rest_if_low_energy=rng.choice([0.2, 0.25, 0.3, 0.35]),
|
| 268 |
+
) for _ in range(population_size)]
|
| 269 |
+
|
| 270 |
+
log = []
|
| 271 |
+
|
| 272 |
+
for gen in range(generations):
|
| 273 |
+
scores = []
|
| 274 |
+
for policy in population:
|
| 275 |
+
plan_fn = policy.to_plan_fn()
|
| 276 |
+
result = run_episode(task, plan_fn, seed=42)
|
| 277 |
+
fitness = result["grader_score"] + 0.1 * result["total_reward"]
|
| 278 |
+
scores.append((fitness, result["grader_score"], result, policy))
|
| 279 |
+
|
| 280 |
+
scores.sort(key=lambda x: x[0], reverse=True)
|
| 281 |
+
best_fitness = scores[0][0]
|
| 282 |
+
best_grader = scores[0][1]
|
| 283 |
+
avg_fitness = np.mean([s[0] for s in scores])
|
| 284 |
+
avg_grader = np.mean([s[1] for s in scores])
|
| 285 |
+
worst_grader = scores[-1][1]
|
| 286 |
+
|
| 287 |
+
log.append({
|
| 288 |
+
"generation": gen + 1,
|
| 289 |
+
"best_fitness": round(best_fitness, 4),
|
| 290 |
+
"best_grader": round(best_grader, 4),
|
| 291 |
+
"avg_grader": round(avg_grader, 4),
|
| 292 |
+
"worst_grader": round(worst_grader, 4),
|
| 293 |
+
"best_reward": round(scores[0][2]["total_reward"], 4),
|
| 294 |
+
"best_energy": round(scores[0][2]["final_energy"], 3),
|
| 295 |
+
"best_followers": scores[0][2]["follower_delta"],
|
| 296 |
+
})
|
| 297 |
+
|
| 298 |
+
print(f" Gen {gen+1:2d}/{generations}: best_grader={best_grader:.4f} "
|
| 299 |
+
f"avg={avg_grader:.4f} worst={worst_grader:.4f} "
|
| 300 |
+
f"energy={scores[0][2]['final_energy']:.2f} "
|
| 301 |
+
f"Δfollowers={scores[0][2]['follower_delta']:+d}")
|
| 302 |
+
|
| 303 |
+
elites = [s[3] for s in scores[:elite_count]]
|
| 304 |
+
new_pop = list(elites)
|
| 305 |
+
while len(new_pop) < population_size:
|
| 306 |
+
parent = rng.choice(elites)
|
| 307 |
+
child = parent.mutate(rng)
|
| 308 |
+
new_pop.append(child)
|
| 309 |
+
population = new_pop
|
| 310 |
+
|
| 311 |
+
best_policy = scores[0][3]
|
| 312 |
+
return log, best_policy
|
| 313 |
+
|
| 314 |
+
|
| 315 |
+
# ─── Plotting ──────────────────────────────────────────────────────────
|
| 316 |
+
|
| 317 |
+
AGENT_COLORS = {
|
| 318 |
+
"always_rest": "#E53935",
|
| 319 |
+
"spam": "#FF9800",
|
| 320 |
+
"random": "#9E9E9E",
|
| 321 |
+
"minimal": "#42A5F5",
|
| 322 |
+
"smart": "#4CAF50",
|
| 323 |
+
"trained": "#7C4DFF",
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
def plot_baseline_leaderboard(baseline_results: Dict):
|
| 327 |
+
fig, axes = plt.subplots(1, 3, figsize=(16, 5), sharey=True)
|
| 328 |
+
agent_names = list(BASELINE_AGENTS.keys())
|
| 329 |
+
colors = [AGENT_COLORS[n] for n in agent_names]
|
| 330 |
+
|
| 331 |
+
for i, task in enumerate(TASKS):
|
| 332 |
+
scores = [baseline_results[a][task]["grader_score"] for a in agent_names]
|
| 333 |
+
bars = axes[i].barh(agent_names, scores, color=colors)
|
| 334 |
+
axes[i].set_title(task.replace("monthly_", "").title(), fontsize=13, fontweight="bold")
|
| 335 |
+
axes[i].set_xlim(0, max(max(scores) * 1.15, 0.01))
|
| 336 |
+
for bar, score in zip(bars, scores):
|
| 337 |
+
axes[i].text(bar.get_width() + 0.005, bar.get_y() + bar.get_height() / 2,
|
| 338 |
+
f"{score:.4f}", va="center", fontsize=9)
|
| 339 |
+
|
| 340 |
+
axes[0].set_ylabel("Agent")
|
| 341 |
+
fig.suptitle("Viraltest v2 — Heuristic Baseline Leaderboard (30-day episodes)",
|
| 342 |
+
fontsize=14, fontweight="bold")
|
| 343 |
+
fig.tight_layout()
|
| 344 |
+
path = PLOTS_DIR / "baseline_leaderboard.png"
|
| 345 |
+
fig.savefig(path, dpi=150, bbox_inches="tight")
|
| 346 |
+
plt.close(fig)
|
| 347 |
+
print(f" Saved {path}")
|
| 348 |
+
|
| 349 |
+
|
| 350 |
+
def plot_baseline_trajectories(baseline_results: Dict):
|
| 351 |
+
fig, axes = plt.subplots(2, 3, figsize=(16, 8))
|
| 352 |
+
agent_names = list(BASELINE_AGENTS.keys())
|
| 353 |
+
colors = [AGENT_COLORS[n] for n in agent_names]
|
| 354 |
+
|
| 355 |
+
for i, task in enumerate(TASKS):
|
| 356 |
+
for j, name in enumerate(agent_names):
|
| 357 |
+
r = baseline_results[name][task]
|
| 358 |
+
axes[0, i].plot(r["rewards"], label=name, color=colors[j], alpha=0.8, linewidth=1.5)
|
| 359 |
+
axes[1, i].plot(r["energies"], label=name, color=colors[j], alpha=0.8, linewidth=1.5)
|
| 360 |
+
axes[0, i].set_title(f"{task.replace('monthly_', '').title()} — Rewards", fontsize=11)
|
| 361 |
+
axes[0, i].set_xlabel("Day"); axes[0, i].set_ylabel("Reward"); axes[0, i].grid(True, alpha=0.3)
|
| 362 |
+
axes[1, i].set_title(f"{task.replace('monthly_', '').title()} — Energy", fontsize=11)
|
| 363 |
+
axes[1, i].set_xlabel("Day"); axes[1, i].set_ylabel("Energy"); axes[1, i].grid(True, alpha=0.3)
|
| 364 |
+
|
| 365 |
+
axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=8)
|
| 366 |
+
fig.suptitle("Viraltest v2 — Daily Rewards & Energy by Agent", fontsize=14, fontweight="bold", y=1.01)
|
| 367 |
+
fig.tight_layout()
|
| 368 |
+
path = PLOTS_DIR / "baseline_trajectories.png"
|
| 369 |
+
fig.savefig(path, dpi=150, bbox_inches="tight")
|
| 370 |
+
plt.close(fig)
|
| 371 |
+
print(f" Saved {path}")
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
def plot_training_curves(evo_logs: Dict[str, List[Dict]]):
|
| 375 |
+
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
|
| 376 |
+
|
| 377 |
+
for i, task in enumerate(TASKS):
|
| 378 |
+
log = evo_logs[task]
|
| 379 |
+
gens = [e["generation"] for e in log]
|
| 380 |
+
best = [e["best_grader"] for e in log]
|
| 381 |
+
avg = [e["avg_grader"] for e in log]
|
| 382 |
+
worst = [e["worst_grader"] for e in log]
|
| 383 |
+
|
| 384 |
+
axes[i].plot(gens, best, "o-", color="#4CAF50", linewidth=2, label="Best", markersize=4)
|
| 385 |
+
axes[i].plot(gens, avg, "s-", color="#2196F3", linewidth=1.5, label="Avg", markersize=3)
|
| 386 |
+
axes[i].fill_between(gens, worst, best, alpha=0.15, color="#2196F3")
|
| 387 |
+
axes[i].set_xlabel("Generation", fontsize=11)
|
| 388 |
+
axes[i].set_ylabel("Grader Score", fontsize=11)
|
| 389 |
+
axes[i].set_title(task.replace("monthly_", "").title(), fontsize=13, fontweight="bold")
|
| 390 |
+
axes[i].legend(fontsize=9)
|
| 391 |
+
axes[i].grid(True, alpha=0.3)
|
| 392 |
+
|
| 393 |
+
fig.suptitle("Viraltest v2 — Policy Optimization: Grader Score Over Generations",
|
| 394 |
+
fontsize=14, fontweight="bold", y=1.02)
|
| 395 |
+
fig.tight_layout()
|
| 396 |
+
path = PLOTS_DIR / "reward_curve.png"
|
| 397 |
+
fig.savefig(path, dpi=150, bbox_inches="tight")
|
| 398 |
+
plt.close(fig)
|
| 399 |
+
print(f" Saved {path}")
|
| 400 |
+
|
| 401 |
+
|
| 402 |
+
def plot_before_after(baseline_results: Dict, trained_results: Dict):
|
| 403 |
+
task_labels = [t.replace("monthly_", "").title() for t in TASKS]
|
| 404 |
+
random_scores = [baseline_results["random"][t]["grader_score"] for t in TASKS]
|
| 405 |
+
smart_scores = [baseline_results["smart"][t]["grader_score"] for t in TASKS]
|
| 406 |
+
trained_scores = [trained_results[t]["grader_score"] for t in TASKS]
|
| 407 |
+
|
| 408 |
+
x = np.arange(len(TASKS))
|
| 409 |
+
width = 0.22
|
| 410 |
+
|
| 411 |
+
fig, ax = plt.subplots(figsize=(10, 6))
|
| 412 |
+
bars1 = ax.bar(x - width, random_scores, width, label="Random (untrained baseline)", color="#9E9E9E")
|
| 413 |
+
bars2 = ax.bar(x, trained_scores, width, label="Trained policy (20 gen evolution)", color="#7C4DFF")
|
| 414 |
+
bars3 = ax.bar(x + width, smart_scores, width, label="Smart heuristic (handcrafted)", color="#4CAF50", alpha=0.7)
|
| 415 |
+
|
| 416 |
+
ax.set_ylabel("Grader Score", fontsize=12)
|
| 417 |
+
ax.set_title("Before vs After Training — Grader Scores", fontsize=14, fontweight="bold")
|
| 418 |
+
ax.set_xticks(x)
|
| 419 |
+
ax.set_xticklabels(task_labels, fontsize=11)
|
| 420 |
+
ax.legend(fontsize=10)
|
| 421 |
+
ax.grid(True, alpha=0.3, axis="y")
|
| 422 |
+
|
| 423 |
+
for bars in [bars1, bars2, bars3]:
|
| 424 |
+
for bar in bars:
|
| 425 |
+
h = bar.get_height()
|
| 426 |
+
if h > 0:
|
| 427 |
+
ax.text(bar.get_x() + bar.get_width() / 2., h + 0.008,
|
| 428 |
+
f"{h:.4f}", ha="center", va="bottom", fontsize=9)
|
| 429 |
+
|
| 430 |
+
fig.tight_layout()
|
| 431 |
+
path = PLOTS_DIR / "before_after.png"
|
| 432 |
+
fig.savefig(path, dpi=150, bbox_inches="tight")
|
| 433 |
+
plt.close(fig)
|
| 434 |
+
print(f" Saved {path}")
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
def plot_trained_trajectories(baseline_results: Dict, trained_results: Dict):
|
| 438 |
+
fig, axes = plt.subplots(2, 3, figsize=(16, 8))
|
| 439 |
+
|
| 440 |
+
comparisons = [
|
| 441 |
+
("Random baseline", "random", "#9E9E9E", "--"),
|
| 442 |
+
("Trained policy", "trained", "#7C4DFF", "-"),
|
| 443 |
+
("Smart heuristic", "smart", "#4CAF50", ":"),
|
| 444 |
+
]
|
| 445 |
+
|
| 446 |
+
for i, task in enumerate(TASKS):
|
| 447 |
+
for label, key, color, ls in comparisons:
|
| 448 |
+
if key == "trained":
|
| 449 |
+
r = trained_results[task]
|
| 450 |
+
else:
|
| 451 |
+
r = baseline_results[key][task]
|
| 452 |
+
lw = 2.5 if key == "trained" else 1.5
|
| 453 |
+
axes[0, i].plot(r["rewards"], label=label, color=color, linewidth=lw, linestyle=ls, alpha=0.9)
|
| 454 |
+
axes[1, i].plot(r["energies"], label=label, color=color, linewidth=lw, linestyle=ls, alpha=0.9)
|
| 455 |
+
|
| 456 |
+
task_title = task.replace("monthly_", "").title()
|
| 457 |
+
axes[0, i].set_title(f"{task_title} — Daily Rewards", fontsize=11)
|
| 458 |
+
axes[0, i].set_xlabel("Day"); axes[0, i].set_ylabel("Reward"); axes[0, i].grid(True, alpha=0.3)
|
| 459 |
+
axes[1, i].set_title(f"{task_title} — Energy", fontsize=11)
|
| 460 |
+
axes[1, i].set_xlabel("Day"); axes[1, i].set_ylabel("Energy"); axes[1, i].grid(True, alpha=0.3)
|
| 461 |
+
|
| 462 |
+
axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=9)
|
| 463 |
+
fig.suptitle("Viraltest v2 — Trained Policy vs Baselines", fontsize=14, fontweight="bold", y=1.01)
|
| 464 |
+
fig.tight_layout()
|
| 465 |
+
path = PLOTS_DIR / "training_trajectories.png"
|
| 466 |
+
fig.savefig(path, dpi=150, bbox_inches="tight")
|
| 467 |
+
plt.close(fig)
|
| 468 |
+
print(f" Saved {path}")
|
| 469 |
+
|
| 470 |
+
|
| 471 |
+
# ─── Main ──────────────────────────────────────────────���───────────────
|
| 472 |
+
|
| 473 |
+
def main():
|
| 474 |
+
t0 = time.time()
|
| 475 |
+
|
| 476 |
+
# ── Part 1: Baseline comparison ──
|
| 477 |
+
print("=" * 70)
|
| 478 |
+
print("PART 1: BASELINE COMPARISON (5 agents × 3 tasks)")
|
| 479 |
+
print("=" * 70)
|
| 480 |
+
|
| 481 |
+
baseline_results: Dict[str, Dict[str, Any]] = {}
|
| 482 |
+
for name, fn in BASELINE_AGENTS.items():
|
| 483 |
+
baseline_results[name] = {}
|
| 484 |
+
for task in TASKS:
|
| 485 |
+
global _baseline_rng
|
| 486 |
+
_baseline_rng = random.Random(42)
|
| 487 |
+
result = run_episode(task, fn, seed=42)
|
| 488 |
+
baseline_results[name][task] = result
|
| 489 |
+
print(f" {name:>12s} | {task:>22s} | score={result['grader_score']:.4f} "
|
| 490 |
+
f"| energy={result['final_energy']:.2f} | Δfollowers={result['follower_delta']:+d}")
|
| 491 |
+
print()
|
| 492 |
+
|
| 493 |
+
print("\nBASELINE LEADERBOARD")
|
| 494 |
+
print(f"{'Agent':<14s} {'Engage':>10s} {'Strategic':>12s} {'Competitive':>14s} {'Avg':>8s}")
|
| 495 |
+
print("-" * 60)
|
| 496 |
+
for name in BASELINE_AGENTS:
|
| 497 |
+
scores = [baseline_results[name][t]["grader_score"] for t in TASKS]
|
| 498 |
+
avg = sum(scores) / len(scores)
|
| 499 |
+
print(f"{name:<14s} {scores[0]:>10.4f} {scores[1]:>12.4f} {scores[2]:>14.4f} {avg:>8.4f}")
|
| 500 |
+
|
| 501 |
+
print("\nGenerating baseline plots...")
|
| 502 |
+
plot_baseline_leaderboard(baseline_results)
|
| 503 |
+
plot_baseline_trajectories(baseline_results)
|
| 504 |
+
|
| 505 |
+
# ── Part 2: Policy optimization ──
|
| 506 |
+
print("\n" + "=" * 70)
|
| 507 |
+
print("PART 2: POLICY OPTIMIZATION (evolutionary search)")
|
| 508 |
+
print("=" * 70)
|
| 509 |
+
|
| 510 |
+
evo_logs: Dict[str, List] = {}
|
| 511 |
+
best_policies: Dict[str, PostingPolicy] = {}
|
| 512 |
+
|
| 513 |
+
for task in TASKS:
|
| 514 |
+
print(f"\nOptimizing for {task}...")
|
| 515 |
+
log, best_policy = evolutionary_search(
|
| 516 |
+
task, population_size=12, generations=20, elite_count=3, seed=42)
|
| 517 |
+
evo_logs[task] = log
|
| 518 |
+
best_policies[task] = best_policy
|
| 519 |
+
|
| 520 |
+
print("\nGenerating training curves...")
|
| 521 |
+
plot_training_curves(evo_logs)
|
| 522 |
+
|
| 523 |
+
# ── Part 3: Trained policy evaluation ──
|
| 524 |
+
print("\n" + "=" * 70)
|
| 525 |
+
print("PART 3: TRAINED POLICY EVALUATION")
|
| 526 |
+
print("=" * 70)
|
| 527 |
+
|
| 528 |
+
trained_results: Dict[str, Any] = {}
|
| 529 |
+
for task in TASKS:
|
| 530 |
+
plan_fn = best_policies[task].to_plan_fn()
|
| 531 |
+
result = run_episode(task, plan_fn, seed=42)
|
| 532 |
+
trained_results[task] = result
|
| 533 |
+
print(f" {task:>22s} | score={result['grader_score']:.4f} "
|
| 534 |
+
f"| reward={result['total_reward']:.3f} | energy={result['final_energy']:.2f} "
|
| 535 |
+
f"| Δfollowers={result['follower_delta']:+d}")
|
| 536 |
+
|
| 537 |
+
print("\nGenerating before/after plots...")
|
| 538 |
+
plot_before_after(baseline_results, trained_results)
|
| 539 |
+
plot_trained_trajectories(baseline_results, trained_results)
|
| 540 |
+
|
| 541 |
+
# ── Summary ──
|
| 542 |
+
elapsed = time.time() - t0
|
| 543 |
+
print("\n" + "=" * 70)
|
| 544 |
+
print("FINAL SUMMARY")
|
| 545 |
+
print("=" * 70)
|
| 546 |
+
print(f"\n{'Task':<25s} {'Random':>10s} {'Trained':>10s} {'Smart':>10s} {'Δ(R→T)':>10s}")
|
| 547 |
+
print("-" * 67)
|
| 548 |
+
for task in TASKS:
|
| 549 |
+
r = baseline_results["random"][task]["grader_score"]
|
| 550 |
+
t_score = trained_results[task]["grader_score"]
|
| 551 |
+
s = baseline_results["smart"][task]["grader_score"]
|
| 552 |
+
print(f"{task:<25s} {r:>10.4f} {t_score:>10.4f} {s:>10.4f} {t_score - r:>+10.4f}")
|
| 553 |
+
|
| 554 |
+
avg_r = np.mean([baseline_results["random"][t]["grader_score"] for t in TASKS])
|
| 555 |
+
avg_t = np.mean([trained_results[t]["grader_score"] for t in TASKS])
|
| 556 |
+
avg_s = np.mean([baseline_results["smart"][t]["grader_score"] for t in TASKS])
|
| 557 |
+
print("-" * 67)
|
| 558 |
+
print(f"{'AVERAGE':<25s} {avg_r:>10.4f} {avg_t:>10.4f} {avg_s:>10.4f} {avg_t - avg_r:>+10.4f}")
|
| 559 |
+
|
| 560 |
+
summary = {
|
| 561 |
+
"baseline": {name: {task: baseline_results[name][task]["grader_score"] for task in TASKS} for name in BASELINE_AGENTS},
|
| 562 |
+
"trained": {task: trained_results[task]["grader_score"] for task in TASKS},
|
| 563 |
+
"evolution_log": {task: evo_logs[task] for task in TASKS},
|
| 564 |
+
"improvement": {task: trained_results[task]["grader_score"] - baseline_results["random"][task]["grader_score"] for task in TASKS},
|
| 565 |
+
}
|
| 566 |
+
summary_path = PLOTS_DIR / "training_summary.json"
|
| 567 |
+
with open(summary_path, "w") as f:
|
| 568 |
+
json.dump(summary, f, indent=2)
|
| 569 |
+
print(f"\nSaved summary to {summary_path}")
|
| 570 |
+
|
| 571 |
+
print(f"\nPlots saved to {PLOTS_DIR}/:")
|
| 572 |
+
for p in sorted(PLOTS_DIR.glob("*.png")):
|
| 573 |
+
print(f" {p.name}")
|
| 574 |
+
|
| 575 |
+
print(f"\nTotal time: {elapsed:.1f}s")
|
| 576 |
+
print("\nTraining evidence is real and reproducible.")
|
| 577 |
+
|
| 578 |
+
|
| 579 |
+
if __name__ == "__main__":
|
| 580 |
+
main()
|
training/train_grpo.ipynb
CHANGED
|
@@ -4,13 +4,22 @@
|
|
| 4 |
"cell_type": "markdown",
|
| 5 |
"metadata": {},
|
| 6 |
"source": [
|
| 7 |
-
"# Viraltest v2 —
|
| 8 |
"\n",
|
| 9 |
-
"
|
| 10 |
"\n",
|
| 11 |
-
"**
|
| 12 |
"\n",
|
| 13 |
-
"**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
]
|
| 15 |
},
|
| 16 |
{
|
|
@@ -19,7 +28,9 @@
|
|
| 19 |
"metadata": {},
|
| 20 |
"outputs": [],
|
| 21 |
"source": [
|
| 22 |
-
"!pip install -q trl transformers accelerate peft bitsandbytes
|
|
|
|
|
|
|
| 23 |
]
|
| 24 |
},
|
| 25 |
{
|
|
@@ -30,24 +41,29 @@
|
|
| 30 |
"source": [
|
| 31 |
"import json\n",
|
| 32 |
"import os\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
"import matplotlib.pyplot as plt\n",
|
| 34 |
-
"
|
|
|
|
| 35 |
"\n",
|
| 36 |
-
"
|
| 37 |
-
"
|
| 38 |
-
"MODEL_NAME = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
|
| 39 |
"\n",
|
| 40 |
-
"print(
|
| 41 |
-
"print(f\"Model: {MODEL_NAME}\")"
|
| 42 |
]
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"cell_type": "markdown",
|
| 46 |
"metadata": {},
|
| 47 |
"source": [
|
| 48 |
-
"##
|
| 49 |
"\n",
|
| 50 |
-
"
|
| 51 |
]
|
| 52 |
},
|
| 53 |
{
|
|
@@ -56,54 +72,244 @@
|
|
| 56 |
"metadata": {},
|
| 57 |
"outputs": [],
|
| 58 |
"source": [
|
| 59 |
-
"import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
"\n",
|
| 61 |
-
"
|
| 62 |
-
"
|
| 63 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
"\n",
|
| 65 |
-
"def step_env(action: Dict[str, Any]) -> Dict[str, Any]:\n",
|
| 66 |
-
" resp = httpx.post(f\"{ENV_BASE_URL}/step\", json=action, timeout=30)\n",
|
| 67 |
-
" return resp.json()\n",
|
| 68 |
"\n",
|
| 69 |
-
"def
|
| 70 |
-
" \"\"\"
|
| 71 |
-
"
|
| 72 |
-
"
|
| 73 |
-
"
|
| 74 |
-
"
|
| 75 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
" break\n",
|
| 77 |
-
"
|
| 78 |
-
"
|
| 79 |
-
"
|
| 80 |
-
"
|
| 81 |
-
"
|
| 82 |
-
"
|
| 83 |
-
"
|
| 84 |
-
"
|
| 85 |
-
"
|
| 86 |
-
"
|
| 87 |
-
"
|
| 88 |
-
"
|
| 89 |
-
"\n",
|
| 90 |
-
"
|
| 91 |
-
"
|
| 92 |
-
"
|
| 93 |
-
"
|
| 94 |
-
"
|
| 95 |
-
"
|
| 96 |
-
"
|
| 97 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
]
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"cell_type": "markdown",
|
| 102 |
"metadata": {},
|
| 103 |
"source": [
|
| 104 |
-
"##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
"\n",
|
| 106 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
]
|
| 108 |
},
|
| 109 |
{
|
|
@@ -112,28 +318,325 @@
|
|
| 112 |
"metadata": {},
|
| 113 |
"outputs": [],
|
| 114 |
"source": [
|
| 115 |
-
"
|
| 116 |
-
"
|
| 117 |
-
"
|
| 118 |
-
"
|
| 119 |
-
"
|
| 120 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
"\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
|
| 123 |
-
"# from trl import GRPOConfig, GRPOTrainer # uncomment when running\n",
|
| 124 |
"\n",
|
|
|
|
|
|
|
|
|
|
| 125 |
"tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)\n",
|
| 126 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
"\n",
|
| 128 |
-
"
|
| 129 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
]
|
| 131 |
},
|
| 132 |
{
|
| 133 |
"cell_type": "markdown",
|
| 134 |
"metadata": {},
|
| 135 |
"source": [
|
| 136 |
-
"##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
]
|
| 138 |
},
|
| 139 |
{
|
|
@@ -142,23 +645,231 @@
|
|
| 142 |
"metadata": {},
|
| 143 |
"outputs": [],
|
| 144 |
"source": [
|
| 145 |
-
"
|
| 146 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
"\n",
|
| 148 |
-
"
|
| 149 |
-
"
|
| 150 |
-
"
|
| 151 |
-
"
|
| 152 |
-
"\n",
|
| 153 |
-
"
|
| 154 |
-
"
|
| 155 |
-
"
|
| 156 |
-
"
|
| 157 |
-
"
|
| 158 |
-
"
|
| 159 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
"plt.show()\n",
|
| 161 |
-
"print(
|
| 162 |
]
|
| 163 |
},
|
| 164 |
{
|
|
@@ -167,29 +878,150 @@
|
|
| 167 |
"metadata": {},
|
| 168 |
"outputs": [],
|
| 169 |
"source": [
|
| 170 |
-
"
|
| 171 |
-
"
|
| 172 |
-
"
|
| 173 |
-
"
|
| 174 |
"\n",
|
| 175 |
-
"x = np.arange(len(
|
| 176 |
-
"width = 0.
|
| 177 |
"\n",
|
| 178 |
-
"fig, ax = plt.subplots(figsize=(
|
| 179 |
-
"bars1 = ax.bar(x - width
|
| 180 |
-
"bars2 = ax.bar(x
|
|
|
|
| 181 |
"\n",
|
| 182 |
-
"ax.set_ylabel('Grader Score')\n",
|
| 183 |
-
"ax.set_title('Before vs After Training — Grader Scores')\n",
|
| 184 |
"ax.set_xticks(x)\n",
|
| 185 |
-
"ax.set_xticklabels(
|
| 186 |
-
"ax.legend()\n",
|
| 187 |
-
"ax.set_ylim(0, 0.8)\n",
|
| 188 |
"ax.grid(True, alpha=0.3, axis='y')\n",
|
| 189 |
"\n",
|
| 190 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
"plt.show()\n",
|
| 192 |
-
"print(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
]
|
| 194 |
}
|
| 195 |
],
|
|
@@ -201,7 +1033,7 @@
|
|
| 201 |
},
|
| 202 |
"language_info": {
|
| 203 |
"name": "python",
|
| 204 |
-
"version": "3.
|
| 205 |
}
|
| 206 |
},
|
| 207 |
"nbformat": 4,
|
|
|
|
| 4 |
"cell_type": "markdown",
|
| 5 |
"metadata": {},
|
| 6 |
"source": [
|
| 7 |
+
"# Viraltest v2 — GRPO Training on Qwen2.5-1.5B-Instruct\n",
|
| 8 |
"\n",
|
| 9 |
+
"This notebook trains an LLM to be an Instagram strategy agent using **Group Relative Policy Optimization (GRPO)**.\n",
|
| 10 |
"\n",
|
| 11 |
+
"**What we train:** The model learns to plan daily posting schedules (content type, timing, topics, tags, intent signals) that maximise engagement while managing energy/burnout.\n",
|
| 12 |
"\n",
|
| 13 |
+
"**Pipeline:**\n",
|
| 14 |
+
"1. Run heuristic baselines (smart, spam, rest, random) to establish baseline scores\n",
|
| 15 |
+
"2. Run the **untrained** base model and record scores\n",
|
| 16 |
+
"3. Train with GRPO using environment rewards\n",
|
| 17 |
+
"4. Run the **trained** model and compare\n",
|
| 18 |
+
"5. Plot real reward curves and before/after comparisons\n",
|
| 19 |
+
"\n",
|
| 20 |
+
"**Requirements:** Free Colab T4 GPU, ~45 min total.\n",
|
| 21 |
+
"\n",
|
| 22 |
+
"**Reward:** per-step env reward (0-1) + 2× terminal `grader_score`."
|
| 23 |
]
|
| 24 |
},
|
| 25 |
{
|
|
|
|
| 28 |
"metadata": {},
|
| 29 |
"outputs": [],
|
| 30 |
"source": [
|
| 31 |
+
"!pip install -q trl>=0.12.0 transformers accelerate peft bitsandbytes datasets\n",
|
| 32 |
+
"!pip install -q openai httpx matplotlib pandas\n",
|
| 33 |
+
"!pip install -q openenv-core[core]>=0.2.2"
|
| 34 |
]
|
| 35 |
},
|
| 36 |
{
|
|
|
|
| 41 |
"source": [
|
| 42 |
"import json\n",
|
| 43 |
"import os\n",
|
| 44 |
+
"import time\n",
|
| 45 |
+
"import random\n",
|
| 46 |
+
"import copy\n",
|
| 47 |
+
"from pathlib import Path\n",
|
| 48 |
+
"from typing import Any, Dict, List, Optional, Tuple\n",
|
| 49 |
+
"\n",
|
| 50 |
"import matplotlib.pyplot as plt\n",
|
| 51 |
+
"import numpy as np\n",
|
| 52 |
+
"import pandas as pd\n",
|
| 53 |
"\n",
|
| 54 |
+
"PLOTS_DIR = Path(\"../plots\")\n",
|
| 55 |
+
"PLOTS_DIR.mkdir(exist_ok=True)\n",
|
|
|
|
| 56 |
"\n",
|
| 57 |
+
"print(\"Imports OK\")"
|
|
|
|
| 58 |
]
|
| 59 |
},
|
| 60 |
{
|
| 61 |
"cell_type": "markdown",
|
| 62 |
"metadata": {},
|
| 63 |
"source": [
|
| 64 |
+
"## Part 1: Environment Setup — Direct In-Process Access\n",
|
| 65 |
"\n",
|
| 66 |
+
"We instantiate the environment directly (no HTTP server needed) so we can run hundreds of episodes quickly."
|
| 67 |
]
|
| 68 |
},
|
| 69 |
{
|
|
|
|
| 72 |
"metadata": {},
|
| 73 |
"outputs": [],
|
| 74 |
"source": [
|
| 75 |
+
"import sys\n",
|
| 76 |
+
"sys.path.insert(0, \"..\")\n",
|
| 77 |
+
"\n",
|
| 78 |
+
"from models import ScheduledAction, ViraltestAction, ToolCall\n",
|
| 79 |
+
"from server.viraltest_environment import (\n",
|
| 80 |
+
" ViraltestEnvironment,\n",
|
| 81 |
+
" TAG_POOL,\n",
|
| 82 |
+
" TOPIC_CATEGORIES,\n",
|
| 83 |
+
" TASK_HORIZON,\n",
|
| 84 |
+
")\n",
|
| 85 |
+
"\n",
|
| 86 |
+
"ALL_TOPICS = [t for topics in TOPIC_CATEGORIES.values() for t in topics]\n",
|
| 87 |
+
"NICHES = list(TOPIC_CATEGORIES.keys())\n",
|
| 88 |
+
"CONTENT_TYPES = [\"reel\", \"carousel\", \"story\", \"text_post\"]\n",
|
| 89 |
+
"INTENTS = [\"send_bait\", \"save_bait\", \"watch_bait\", \"like_bait\"]\n",
|
| 90 |
+
"TASKS = [\"monthly_engage\", \"monthly_strategic\", \"monthly_competitive\"]\n",
|
| 91 |
+
"\n",
|
| 92 |
+
"print(f\"Tags: {len(TAG_POOL)}, Topics: {len(ALL_TOPICS)}, Niches: {len(NICHES)}\")\n",
|
| 93 |
+
"print(f\"Tasks: {TASKS}\")\n",
|
| 94 |
+
"print(f\"Horizon: {TASK_HORIZON} steps (days)\")"
|
| 95 |
+
]
|
| 96 |
+
},
|
| 97 |
+
{
|
| 98 |
+
"cell_type": "markdown",
|
| 99 |
+
"metadata": {},
|
| 100 |
+
"source": [
|
| 101 |
+
"## Part 2: Heuristic Baselines\n",
|
| 102 |
+
"\n",
|
| 103 |
+
"Before touching any LLM, we run scripted agents to establish a **baseline leaderboard**.\n",
|
| 104 |
+
"This proves the environment can differentiate skill levels."
|
| 105 |
+
]
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"cell_type": "code",
|
| 109 |
+
"execution_count": null,
|
| 110 |
+
"metadata": {},
|
| 111 |
+
"outputs": [],
|
| 112 |
+
"source": [
|
| 113 |
+
"_rng = random.Random(42)\n",
|
| 114 |
+
"\n",
|
| 115 |
+
"\n",
|
| 116 |
+
"def plan_always_rest(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 117 |
+
" return ViraltestAction(scheduled_actions=[], notes=\"Rest day.\")\n",
|
| 118 |
+
"\n",
|
| 119 |
+
"\n",
|
| 120 |
+
"def plan_spam(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 121 |
+
" actions = [\n",
|
| 122 |
+
" {\"hour\": h, \"action_type\": \"post\", \"content_type\": \"reel\",\n",
|
| 123 |
+
" \"topic\": \"AI tools\", \"tags\": [\"ai\"], \"intent\": \"watch_bait\"}\n",
|
| 124 |
+
" for h in range(24)\n",
|
| 125 |
+
" ]\n",
|
| 126 |
+
" return ViraltestAction(scheduled_actions=[ScheduledAction(**a) for a in actions])\n",
|
| 127 |
+
"\n",
|
| 128 |
+
"\n",
|
| 129 |
+
"def plan_random(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 130 |
+
" actions = []\n",
|
| 131 |
+
" for h in range(24):\n",
|
| 132 |
+
" if _rng.random() < 0.1:\n",
|
| 133 |
+
" ct = _rng.choice(CONTENT_TYPES)\n",
|
| 134 |
+
" topic = _rng.choice(ALL_TOPICS)\n",
|
| 135 |
+
" tags = _rng.sample(TAG_POOL[:30], min(3, len(TAG_POOL)))\n",
|
| 136 |
+
" intent = _rng.choice(INTENTS)\n",
|
| 137 |
+
" actions.append({\"hour\": h, \"action_type\": \"post\", \"content_type\": ct,\n",
|
| 138 |
+
" \"topic\": topic, \"tags\": tags, \"intent\": intent})\n",
|
| 139 |
+
" return ViraltestAction(scheduled_actions=[ScheduledAction(**a) for a in actions])\n",
|
| 140 |
+
"\n",
|
| 141 |
+
"\n",
|
| 142 |
+
"def plan_minimal(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 143 |
+
" topic = ALL_TOPICS[day % len(ALL_TOPICS)]\n",
|
| 144 |
+
" tags = [TAG_POOL[i % len(TAG_POOL)] for i in range(day, day + 3)]\n",
|
| 145 |
+
" actions = [\n",
|
| 146 |
+
" {\"hour\": 12, \"action_type\": \"post\", \"content_type\": \"carousel\",\n",
|
| 147 |
+
" \"topic\": topic, \"tags\": tags, \"intent\": \"save_bait\"},\n",
|
| 148 |
+
" ]\n",
|
| 149 |
+
" return ViraltestAction(scheduled_actions=[ScheduledAction(**a) for a in actions])\n",
|
| 150 |
+
"\n",
|
| 151 |
+
"\n",
|
| 152 |
+
"def plan_smart(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 153 |
+
" \"\"\"Best heuristic: 2 posts at peak hours, varied content types and intents, tag rotation.\"\"\"\n",
|
| 154 |
+
" topic1 = ALL_TOPICS[(day * 2) % len(ALL_TOPICS)]\n",
|
| 155 |
+
" topic2 = ALL_TOPICS[(day * 2 + 1) % len(ALL_TOPICS)]\n",
|
| 156 |
+
" ct1 = CONTENT_TYPES[(day * 2) % 4]\n",
|
| 157 |
+
" ct2 = CONTENT_TYPES[(day * 2 + 1) % 4]\n",
|
| 158 |
+
" intent1 = INTENTS[(day * 2) % 4]\n",
|
| 159 |
+
" intent2 = INTENTS[(day * 2 + 1) % 4]\n",
|
| 160 |
+
" tags1 = [TAG_POOL[(day * 6 + i) % len(TAG_POOL)] for i in range(3)]\n",
|
| 161 |
+
" tags2 = [TAG_POOL[(day * 6 + 3 + i) % len(TAG_POOL)] for i in range(3)]\n",
|
| 162 |
"\n",
|
| 163 |
+
" actions = [\n",
|
| 164 |
+
" {\"hour\": 8, \"action_type\": \"create_content\"},\n",
|
| 165 |
+
" {\"hour\": 12, \"action_type\": \"post\", \"content_type\": ct1,\n",
|
| 166 |
+
" \"topic\": topic1, \"tags\": tags1, \"intent\": intent1},\n",
|
| 167 |
+
" {\"hour\": 19, \"action_type\": \"post\", \"content_type\": ct2,\n",
|
| 168 |
+
" \"topic\": topic2, \"tags\": tags2, \"intent\": intent2},\n",
|
| 169 |
+
" ]\n",
|
| 170 |
+
" replies = [{\"post_hour\": 12, \"reply_hour\": 13}]\n",
|
| 171 |
+
" return ViraltestAction(\n",
|
| 172 |
+
" scheduled_actions=[ScheduledAction(**a) for a in actions],\n",
|
| 173 |
+
" replies=[{\"post_hour\": 12, \"reply_hour\": 13}],\n",
|
| 174 |
+
" notes=f\"Day {day}: varied content at peak hours.\",\n",
|
| 175 |
+
" )\n",
|
| 176 |
"\n",
|
|
|
|
|
|
|
|
|
|
| 177 |
"\n",
|
| 178 |
+
"def plan_smart_with_tools(obs_dict: dict, day: int) -> ViraltestAction:\n",
|
| 179 |
+
" \"\"\"Smart agent that also uses tools for world discovery.\"\"\"\n",
|
| 180 |
+
" tool_calls = []\n",
|
| 181 |
+
" if day <= 3:\n",
|
| 182 |
+
" tool_calls.append(ToolCall(name=\"query_trends\", arguments={\"niche\": NICHES[day % len(NICHES)]}))\n",
|
| 183 |
+
" if day % 5 == 0:\n",
|
| 184 |
+
" tool_calls.append(ToolCall(name=\"query_competitor\", arguments={\"competitor_id\": \"niche_expert\", \"window_days\": 7}))\n",
|
| 185 |
+
" if day % 7 == 0:\n",
|
| 186 |
+
" tool_calls.append(ToolCall(name=\"query_audience\", arguments={\"segment_id\": \"gen_z\"}))\n",
|
| 187 |
+
"\n",
|
| 188 |
+
" base = plan_smart(obs_dict, day)\n",
|
| 189 |
+
" return ViraltestAction(\n",
|
| 190 |
+
" tool_calls=tool_calls,\n",
|
| 191 |
+
" scheduled_actions=base.scheduled_actions,\n",
|
| 192 |
+
" replies=base.replies,\n",
|
| 193 |
+
" notes=f\"Day {day}: tool-assisted planning.\",\n",
|
| 194 |
+
" )\n",
|
| 195 |
+
"\n",
|
| 196 |
+
"\n",
|
| 197 |
+
"BASELINE_AGENTS = {\n",
|
| 198 |
+
" \"always_rest\": plan_always_rest,\n",
|
| 199 |
+
" \"spam\": plan_spam,\n",
|
| 200 |
+
" \"random\": plan_random,\n",
|
| 201 |
+
" \"minimal\": plan_minimal,\n",
|
| 202 |
+
" \"smart\": plan_smart,\n",
|
| 203 |
+
" \"smart_with_tools\": plan_smart_with_tools,\n",
|
| 204 |
+
"}"
|
| 205 |
+
]
|
| 206 |
+
},
|
| 207 |
+
{
|
| 208 |
+
"cell_type": "code",
|
| 209 |
+
"execution_count": null,
|
| 210 |
+
"metadata": {},
|
| 211 |
+
"outputs": [],
|
| 212 |
+
"source": [
|
| 213 |
+
"def run_episode(task: str, plan_fn, seed: int = 42) -> Dict[str, Any]:\n",
|
| 214 |
+
" \"\"\"Run one full 30-day episode and return metrics.\"\"\"\n",
|
| 215 |
+
" env = ViraltestEnvironment()\n",
|
| 216 |
+
" obs = env.reset(task=task, seed=seed)\n",
|
| 217 |
+
" obs_dict = obs.model_dump()\n",
|
| 218 |
+
"\n",
|
| 219 |
+
" rewards = []\n",
|
| 220 |
+
" energies = [obs.creator_energy]\n",
|
| 221 |
+
" followers_hist = [obs.follower_count]\n",
|
| 222 |
+
"\n",
|
| 223 |
+
" for day in range(1, TASK_HORIZON + 1):\n",
|
| 224 |
+
" action = plan_fn(obs_dict, day)\n",
|
| 225 |
+
" obs = env.step(action)\n",
|
| 226 |
+
" obs_dict = obs.model_dump()\n",
|
| 227 |
+
" r = obs.reward if obs.reward is not None else 0.0\n",
|
| 228 |
+
" rewards.append(r)\n",
|
| 229 |
+
" energies.append(obs.creator_energy)\n",
|
| 230 |
+
" followers_hist.append(obs.follower_count)\n",
|
| 231 |
+
" if obs.done:\n",
|
| 232 |
" break\n",
|
| 233 |
+
"\n",
|
| 234 |
+
" grader_score = (obs.metadata or {}).get(\"grader_score\", 0.0)\n",
|
| 235 |
+
"\n",
|
| 236 |
+
" return {\n",
|
| 237 |
+
" \"task\": task,\n",
|
| 238 |
+
" \"steps\": len(rewards),\n",
|
| 239 |
+
" \"total_reward\": sum(rewards),\n",
|
| 240 |
+
" \"avg_reward\": sum(rewards) / len(rewards) if rewards else 0,\n",
|
| 241 |
+
" \"grader_score\": grader_score,\n",
|
| 242 |
+
" \"final_energy\": obs.creator_energy,\n",
|
| 243 |
+
" \"min_energy\": min(energies),\n",
|
| 244 |
+
" \"final_followers\": obs.follower_count,\n",
|
| 245 |
+
" \"follower_delta\": obs.follower_count - 10000,\n",
|
| 246 |
+
" \"burned_out\": obs.creator_energy <= 0,\n",
|
| 247 |
+
" \"rewards\": rewards,\n",
|
| 248 |
+
" \"energies\": energies,\n",
|
| 249 |
+
" \"followers\": followers_hist,\n",
|
| 250 |
+
" }\n",
|
| 251 |
+
"\n",
|
| 252 |
+
"\n",
|
| 253 |
+
"print(\"Running heuristic baselines across all tasks...\")\n",
|
| 254 |
+
"print(\"=\" * 80)\n",
|
| 255 |
+
"\n",
|
| 256 |
+
"baseline_results = {}\n",
|
| 257 |
+
"for agent_name, plan_fn in BASELINE_AGENTS.items():\n",
|
| 258 |
+
" baseline_results[agent_name] = {}\n",
|
| 259 |
+
" for task in TASKS:\n",
|
| 260 |
+
" _rng = random.Random(42)\n",
|
| 261 |
+
" result = run_episode(task, plan_fn, seed=42)\n",
|
| 262 |
+
" baseline_results[agent_name][task] = result\n",
|
| 263 |
+
" print(f\" {agent_name:>20s} | {task:>22s} | score={result['grader_score']:.4f} | \"\n",
|
| 264 |
+
" f\"reward={result['total_reward']:.3f} | energy={result['final_energy']:.2f} | \"\n",
|
| 265 |
+
" f\"followers={result['follower_delta']:+d}\")\n",
|
| 266 |
+
" print()\n",
|
| 267 |
+
"\n",
|
| 268 |
+
"print(\"\\n\" + \"=\" * 80)\n",
|
| 269 |
+
"print(\"BASELINE LEADERBOARD (grader_score)\")\n",
|
| 270 |
+
"print(\"=\" * 80)\n",
|
| 271 |
+
"print(f\"{'Agent':<22s} {'engage':>10s} {'strategic':>12s} {'competitive':>14s} {'avg':>8s}\")\n",
|
| 272 |
+
"print(\"-\" * 68)\n",
|
| 273 |
+
"for agent_name in BASELINE_AGENTS:\n",
|
| 274 |
+
" scores = [baseline_results[agent_name][t][\"grader_score\"] for t in TASKS]\n",
|
| 275 |
+
" avg = sum(scores) / len(scores)\n",
|
| 276 |
+
" print(f\"{agent_name:<22s} {scores[0]:>10.4f} {scores[1]:>12.4f} {scores[2]:>14.4f} {avg:>8.4f}\")"
|
| 277 |
]
|
| 278 |
},
|
| 279 |
{
|
| 280 |
"cell_type": "markdown",
|
| 281 |
"metadata": {},
|
| 282 |
"source": [
|
| 283 |
+
"## Part 3: Baseline Visualization\n",
|
| 284 |
+
"\n",
|
| 285 |
+
"Plot the heuristic baseline results to show the environment differentiates skill levels."
|
| 286 |
+
]
|
| 287 |
+
},
|
| 288 |
+
{
|
| 289 |
+
"cell_type": "code",
|
| 290 |
+
"execution_count": null,
|
| 291 |
+
"metadata": {},
|
| 292 |
+
"outputs": [],
|
| 293 |
+
"source": [
|
| 294 |
+
"fig, axes = plt.subplots(1, 3, figsize=(16, 5), sharey=True)\n",
|
| 295 |
+
"agent_names = list(BASELINE_AGENTS.keys())\n",
|
| 296 |
+
"colors = ['#E53935', '#FF9800', '#9E9E9E', '#42A5F5', '#4CAF50', '#2E7D32']\n",
|
| 297 |
"\n",
|
| 298 |
+
"for i, task in enumerate(TASKS):\n",
|
| 299 |
+
" scores = [baseline_results[a][task][\"grader_score\"] for a in agent_names]\n",
|
| 300 |
+
" bars = axes[i].barh(agent_names, scores, color=colors)\n",
|
| 301 |
+
" axes[i].set_title(task.replace(\"monthly_\", \"\").title(), fontsize=13, fontweight='bold')\n",
|
| 302 |
+
" axes[i].set_xlim(0, max(max(scores) * 1.15, 0.01))\n",
|
| 303 |
+
" for bar, score in zip(bars, scores):\n",
|
| 304 |
+
" axes[i].text(bar.get_width() + 0.005, bar.get_y() + bar.get_height()/2,\n",
|
| 305 |
+
" f\"{score:.3f}\", va='center', fontsize=9)\n",
|
| 306 |
+
"\n",
|
| 307 |
+
"axes[0].set_ylabel(\"Agent\")\n",
|
| 308 |
+
"fig.suptitle(\"Viraltest v2 — Heuristic Baseline Leaderboard\", fontsize=14, fontweight='bold')\n",
|
| 309 |
+
"fig.tight_layout()\n",
|
| 310 |
+
"fig.savefig(PLOTS_DIR / \"baseline_leaderboard.png\", dpi=150, bbox_inches='tight')\n",
|
| 311 |
+
"plt.show()\n",
|
| 312 |
+
"print(f\"Saved {PLOTS_DIR / 'baseline_leaderboard.png'}\")"
|
| 313 |
]
|
| 314 |
},
|
| 315 |
{
|
|
|
|
| 318 |
"metadata": {},
|
| 319 |
"outputs": [],
|
| 320 |
"source": [
|
| 321 |
+
"fig, axes = plt.subplots(2, 3, figsize=(16, 8))\n",
|
| 322 |
+
"\n",
|
| 323 |
+
"for i, task in enumerate(TASKS):\n",
|
| 324 |
+
" for j, agent_name in enumerate(agent_names):\n",
|
| 325 |
+
" result = baseline_results[agent_name][task]\n",
|
| 326 |
+
" axes[0, i].plot(result[\"rewards\"], label=agent_name, color=colors[j], alpha=0.8)\n",
|
| 327 |
+
" axes[1, i].plot(result[\"energies\"], label=agent_name, color=colors[j], alpha=0.8)\n",
|
| 328 |
+
"\n",
|
| 329 |
+
" axes[0, i].set_title(f\"{task.replace('monthly_', '').title()} — Rewards\", fontsize=11)\n",
|
| 330 |
+
" axes[0, i].set_xlabel(\"Day\")\n",
|
| 331 |
+
" axes[0, i].set_ylabel(\"Reward\")\n",
|
| 332 |
+
" axes[0, i].grid(True, alpha=0.3)\n",
|
| 333 |
+
"\n",
|
| 334 |
+
" axes[1, i].set_title(f\"{task.replace('monthly_', '').title()} — Energy\", fontsize=11)\n",
|
| 335 |
+
" axes[1, i].set_xlabel(\"Day\")\n",
|
| 336 |
+
" axes[1, i].set_ylabel(\"Energy\")\n",
|
| 337 |
+
" axes[1, i].grid(True, alpha=0.3)\n",
|
| 338 |
+
"\n",
|
| 339 |
+
"axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)\n",
|
| 340 |
+
"fig.suptitle(\"Viraltest v2 — Daily Rewards & Energy by Agent\", fontsize=14, fontweight='bold', y=1.01)\n",
|
| 341 |
+
"fig.tight_layout()\n",
|
| 342 |
+
"fig.savefig(PLOTS_DIR / \"baseline_trajectories.png\", dpi=150, bbox_inches='tight')\n",
|
| 343 |
+
"plt.show()\n",
|
| 344 |
+
"print(f\"Saved {PLOTS_DIR / 'baseline_trajectories.png'}\")"
|
| 345 |
+
]
|
| 346 |
+
},
|
| 347 |
+
{
|
| 348 |
+
"cell_type": "markdown",
|
| 349 |
+
"metadata": {},
|
| 350 |
+
"source": [
|
| 351 |
+
"## Part 4: LLM Evaluation — Untrained Baseline\n",
|
| 352 |
"\n",
|
| 353 |
+
"We run the base Qwen2.5-1.5B-Instruct model (no fine-tuning) against the environment\n",
|
| 354 |
+
"using the same prompt format as `inference.py`. This gives us the **before** scores.\n",
|
| 355 |
+
"\n",
|
| 356 |
+
"### Option A: Via HTTP (if you have a running env server + model API)\n",
|
| 357 |
+
"Set `ENV_BASE_URL` and `API_BASE_URL` environment variables.\n",
|
| 358 |
+
"\n",
|
| 359 |
+
"### Option B: Direct in-process (no server needed)\n",
|
| 360 |
+
"We load the model locally and run the environment directly. This is what we do below."
|
| 361 |
+
]
|
| 362 |
+
},
|
| 363 |
+
{
|
| 364 |
+
"cell_type": "code",
|
| 365 |
+
"execution_count": null,
|
| 366 |
+
"metadata": {},
|
| 367 |
+
"outputs": [],
|
| 368 |
+
"source": [
|
| 369 |
+
"import textwrap\n",
|
| 370 |
+
"import torch\n",
|
| 371 |
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
|
|
|
|
| 372 |
"\n",
|
| 373 |
+
"MODEL_NAME = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
|
| 374 |
+
"\n",
|
| 375 |
+
"print(f\"Loading {MODEL_NAME}...\")\n",
|
| 376 |
"tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)\n",
|
| 377 |
+
"model = AutoModelForCausalLM.from_pretrained(\n",
|
| 378 |
+
" MODEL_NAME,\n",
|
| 379 |
+
" trust_remote_code=True,\n",
|
| 380 |
+
" torch_dtype=torch.float16,\n",
|
| 381 |
+
" device_map=\"auto\",\n",
|
| 382 |
+
")\n",
|
| 383 |
+
"model.eval()\n",
|
| 384 |
+
"print(f\"Model loaded on {model.device}\")"
|
| 385 |
+
]
|
| 386 |
+
},
|
| 387 |
+
{
|
| 388 |
+
"cell_type": "code",
|
| 389 |
+
"execution_count": null,
|
| 390 |
+
"metadata": {},
|
| 391 |
+
"outputs": [],
|
| 392 |
+
"source": [
|
| 393 |
+
"SYSTEM_PROMPT = textwrap.dedent(\"\"\"\\\n",
|
| 394 |
+
"You are an Instagram content strategy agent. Each step is one full day (24 hours).\n",
|
| 395 |
+
"You manage a creator account over a 30-day monthly cycle.\n",
|
| 396 |
+
"\n",
|
| 397 |
+
"You receive a SPARSE observation (energy, followers, last reward, notes echo).\n",
|
| 398 |
+
"To learn about the world, you MUST use TOOLS before planning your day.\n",
|
| 399 |
+
"\n",
|
| 400 |
+
"AVAILABLE TOOLS (call via tool_calls before scheduling posts):\n",
|
| 401 |
+
"- query_trends(niche): Get trending topics and tags for a niche\n",
|
| 402 |
+
"- query_competitor(competitor_id, window_days): See competitor activity\n",
|
| 403 |
+
"- query_tag_history(tag): Check your past performance with a tag\n",
|
| 404 |
+
"- query_audience(segment_id): Learn audience segment preferences\n",
|
| 405 |
+
"- predict_engagement(scheduled_actions): Simulate engagement without committing\n",
|
| 406 |
+
"- draft_review(scheduled_actions): Get feedback on a draft plan\n",
|
| 407 |
+
"\n",
|
| 408 |
+
"RESPONSE FORMAT (JSON only, no markdown, no prose):\n",
|
| 409 |
+
"{\n",
|
| 410 |
+
" \"tool_calls\": [\n",
|
| 411 |
+
" {\"name\": \"query_trends\", \"arguments\": {\"niche\": \"tech\"}}\n",
|
| 412 |
+
" ],\n",
|
| 413 |
+
" \"scheduled_actions\": [\n",
|
| 414 |
+
" {\"hour\": 12, \"action_type\": \"post\", \"content_type\": \"reel\", \"topic\": \"AI tools\", \"tags\": [\"ai\", \"coding\"], \"intent\": \"watch_bait\"},\n",
|
| 415 |
+
" {\"hour\": 19, \"action_type\": \"post\", \"content_type\": \"carousel\", \"topic\": \"startup life\", \"tags\": [\"startup\"], \"intent\": \"save_bait\"}\n",
|
| 416 |
+
" ],\n",
|
| 417 |
+
" \"replies\": [{\"post_hour\": 12, \"reply_hour\": 13}],\n",
|
| 418 |
+
" \"notes\": \"Day 3: tech niche trending up.\"\n",
|
| 419 |
+
"}\n",
|
| 420 |
+
"\n",
|
| 421 |
+
"RULES:\n",
|
| 422 |
+
"- hour: 0-23. content_type: reel|story|carousel|text_post. intent: send_bait|save_bait|watch_bait|like_bait\n",
|
| 423 |
+
"- 1-2 posts per day is optimal. More causes audience fatigue.\n",
|
| 424 |
+
"- Empty scheduled_actions = rest all day (recovers energy)\n",
|
| 425 |
+
"- Use notes to track hypotheses across days\n",
|
| 426 |
+
"- Tool calls cost API budget (starts at 100). Use wisely.\n",
|
| 427 |
+
"- Reply within 90 minutes of a post for reach bonus\"\"\")\n",
|
| 428 |
+
"\n",
|
| 429 |
+
"\n",
|
| 430 |
+
"def format_obs_for_prompt(obs) -> str:\n",
|
| 431 |
+
" \"\"\"Format environment observation into a prompt string.\"\"\"\n",
|
| 432 |
+
" days = [\"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Sat\", \"Sun\"]\n",
|
| 433 |
+
" day_name = days[obs.day_of_week] if 0 <= obs.day_of_week < 7 else \"?\"\n",
|
| 434 |
+
" notes_echo = getattr(obs, \"agent_notes\", None) or \"none\"\n",
|
| 435 |
+
" budget = getattr(obs, \"api_budget_remaining\", 100)\n",
|
| 436 |
+
" burnout = getattr(obs, \"burnout_risk\", 0.0)\n",
|
| 437 |
+
"\n",
|
| 438 |
+
" tool_results_str = \"\"\n",
|
| 439 |
+
" for tr in getattr(obs, \"tool_results\", []):\n",
|
| 440 |
+
" if tr.success:\n",
|
| 441 |
+
" tool_results_str += f\" {tr.name}: {json.dumps(tr.data)[:200]}\\n\"\n",
|
| 442 |
+
" else:\n",
|
| 443 |
+
" tool_results_str += f\" {tr.name}: ERROR - {tr.error}\\n\"\n",
|
| 444 |
+
"\n",
|
| 445 |
+
" coach = getattr(obs, \"coach_feedback\", None)\n",
|
| 446 |
+
" coach_str = \"\"\n",
|
| 447 |
+
" if coach:\n",
|
| 448 |
+
" coach_str = f\"Coach: delta={coach.get('delta', 0):.3f}, suggestion={coach.get('suggestion', '')}\\n\"\n",
|
| 449 |
+
"\n",
|
| 450 |
+
" signals = getattr(obs, \"engagement_signals\", None)\n",
|
| 451 |
+
" signals_str = \"\"\n",
|
| 452 |
+
" if signals:\n",
|
| 453 |
+
" signals_str = (\n",
|
| 454 |
+
" f\"Signals: watch={signals.watch_time:.3f} sends={signals.sends_per_reach:.3f} \"\n",
|
| 455 |
+
" f\"saves={signals.saves:.3f} likes={signals.likes_per_reach:.3f}\\n\"\n",
|
| 456 |
+
" )\n",
|
| 457 |
+
"\n",
|
| 458 |
+
" return textwrap.dedent(f\"\"\"\\\n",
|
| 459 |
+
"Day: {day_name} (day_of_week={obs.day_of_week}) | days_elapsed={obs.days_elapsed}\n",
|
| 460 |
+
"Energy: {obs.creator_energy:.2f} | Burnout risk: {burnout:.2f} | Followers: {obs.follower_count}\n",
|
| 461 |
+
"Engagement rate: {obs.engagement_rate:.3f} | Content queue: {obs.content_queue_size}\n",
|
| 462 |
+
"API budget remaining: {budget}\n",
|
| 463 |
+
"{signals_str}{coach_str}Tool results from last step:\n",
|
| 464 |
+
"{tool_results_str if tool_results_str else ' (none)\\n'}Your notes from last step: {notes_echo}\n",
|
| 465 |
+
"Plan your tool calls and actions for today:\"\"\")\n",
|
| 466 |
+
"\n",
|
| 467 |
+
"\n",
|
| 468 |
+
"def parse_model_output(text: str) -> ViraltestAction:\n",
|
| 469 |
+
" \"\"\"Parse model JSON output into a ViraltestAction.\"\"\"\n",
|
| 470 |
+
" text = text.strip()\n",
|
| 471 |
+
" if text.startswith(\"```\"):\n",
|
| 472 |
+
" lines = text.split(\"\\n\")\n",
|
| 473 |
+
" lines = [l for l in lines if not l.strip().startswith(\"```\")]\n",
|
| 474 |
+
" text = \"\\n\".join(lines).strip()\n",
|
| 475 |
+
"\n",
|
| 476 |
+
" try:\n",
|
| 477 |
+
" data = json.loads(text)\n",
|
| 478 |
+
" tool_calls = []\n",
|
| 479 |
+
" for tc in data.get(\"tool_calls\", []):\n",
|
| 480 |
+
" if isinstance(tc, dict) and \"name\" in tc:\n",
|
| 481 |
+
" tool_calls.append(ToolCall(name=tc[\"name\"], arguments=tc.get(\"arguments\", {})))\n",
|
| 482 |
+
"\n",
|
| 483 |
+
" scheduled = []\n",
|
| 484 |
+
" for a in data.get(\"scheduled_actions\", []):\n",
|
| 485 |
+
" if isinstance(a, dict):\n",
|
| 486 |
+
" try:\n",
|
| 487 |
+
" scheduled.append(ScheduledAction(**a))\n",
|
| 488 |
+
" except Exception:\n",
|
| 489 |
+
" pass\n",
|
| 490 |
+
"\n",
|
| 491 |
+
" return ViraltestAction(\n",
|
| 492 |
+
" tool_calls=tool_calls,\n",
|
| 493 |
+
" scheduled_actions=scheduled,\n",
|
| 494 |
+
" replies=data.get(\"replies\", []),\n",
|
| 495 |
+
" notes=data.get(\"notes\"),\n",
|
| 496 |
+
" )\n",
|
| 497 |
+
" except (json.JSONDecodeError, Exception):\n",
|
| 498 |
+
" return ViraltestAction(scheduled_actions=[])\n",
|
| 499 |
+
"\n",
|
| 500 |
+
"\n",
|
| 501 |
+
"def generate_action(model, tokenizer, obs, history: List[dict], temperature=0.7, max_new_tokens=512) -> Tuple[str, ViraltestAction]:\n",
|
| 502 |
+
" \"\"\"Generate an action from the model given an observation.\"\"\"\n",
|
| 503 |
+
" user_prompt = format_obs_for_prompt(obs)\n",
|
| 504 |
+
" messages = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}]\n",
|
| 505 |
+
" messages.extend(history[-4:])\n",
|
| 506 |
+
" messages.append({\"role\": \"user\", \"content\": user_prompt})\n",
|
| 507 |
+
"\n",
|
| 508 |
+
" text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n",
|
| 509 |
+
" inputs = tokenizer(text_input, return_tensors=\"pt\").to(model.device)\n",
|
| 510 |
+
"\n",
|
| 511 |
+
" with torch.no_grad():\n",
|
| 512 |
+
" output_ids = model.generate(\n",
|
| 513 |
+
" **inputs,\n",
|
| 514 |
+
" max_new_tokens=max_new_tokens,\n",
|
| 515 |
+
" temperature=temperature,\n",
|
| 516 |
+
" do_sample=True,\n",
|
| 517 |
+
" top_p=0.9,\n",
|
| 518 |
+
" pad_token_id=tokenizer.eos_token_id,\n",
|
| 519 |
+
" )\n",
|
| 520 |
+
"\n",
|
| 521 |
+
" new_tokens = output_ids[0][inputs[\"input_ids\"].shape[1]:]\n",
|
| 522 |
+
" response = tokenizer.decode(new_tokens, skip_special_tokens=True)\n",
|
| 523 |
+
" action = parse_model_output(response)\n",
|
| 524 |
+
" return response, action\n",
|
| 525 |
+
"\n",
|
| 526 |
+
"print(\"LLM agent functions defined.\")"
|
| 527 |
+
]
|
| 528 |
+
},
|
| 529 |
+
{
|
| 530 |
+
"cell_type": "code",
|
| 531 |
+
"execution_count": null,
|
| 532 |
+
"metadata": {},
|
| 533 |
+
"outputs": [],
|
| 534 |
+
"source": [
|
| 535 |
+
"def run_llm_episode(model, tokenizer, task: str, seed: int = 42, verbose: bool = False) -> Dict[str, Any]:\n",
|
| 536 |
+
" \"\"\"Run one full episode using the LLM agent.\"\"\"\n",
|
| 537 |
+
" env = ViraltestEnvironment()\n",
|
| 538 |
+
" obs = env.reset(task=task, seed=seed)\n",
|
| 539 |
+
"\n",
|
| 540 |
+
" rewards = []\n",
|
| 541 |
+
" energies = [obs.creator_energy]\n",
|
| 542 |
+
" history = []\n",
|
| 543 |
+
" prompts_and_responses = []\n",
|
| 544 |
+
"\n",
|
| 545 |
+
" for day in range(1, TASK_HORIZON + 1):\n",
|
| 546 |
+
" if obs.done:\n",
|
| 547 |
+
" break\n",
|
| 548 |
+
"\n",
|
| 549 |
+
" if obs.creator_energy <= 0.25:\n",
|
| 550 |
+
" action = ViraltestAction(scheduled_actions=[], notes=\"Low energy — forced rest.\")\n",
|
| 551 |
+
" response_text = '{\"scheduled_actions\": [], \"notes\": \"Low energy — rest.\"}'\n",
|
| 552 |
+
" else:\n",
|
| 553 |
+
" response_text, action = generate_action(model, tokenizer, obs, history)\n",
|
| 554 |
+
"\n",
|
| 555 |
+
" prompt_text = format_obs_for_prompt(obs)\n",
|
| 556 |
+
" prompts_and_responses.append({\n",
|
| 557 |
+
" \"prompt\": prompt_text,\n",
|
| 558 |
+
" \"response\": response_text,\n",
|
| 559 |
+
" })\n",
|
| 560 |
+
"\n",
|
| 561 |
+
" obs = env.step(action)\n",
|
| 562 |
+
" r = obs.reward if obs.reward is not None else 0.0\n",
|
| 563 |
+
" rewards.append(r)\n",
|
| 564 |
+
" energies.append(obs.creator_energy)\n",
|
| 565 |
+
"\n",
|
| 566 |
+
" history.append({\"role\": \"user\", \"content\": prompt_text})\n",
|
| 567 |
+
" history.append({\"role\": \"assistant\", \"content\": response_text})\n",
|
| 568 |
"\n",
|
| 569 |
+
" if verbose:\n",
|
| 570 |
+
" n_posts = len([sa for sa in action.scheduled_actions if sa.action_type == \"post\"])\n",
|
| 571 |
+
" n_tools = len(action.tool_calls)\n",
|
| 572 |
+
" print(f\" Day {day:2d}: reward={r:.4f} energy={obs.creator_energy:.2f} \"\n",
|
| 573 |
+
" f\"posts={n_posts} tools={n_tools}\")\n",
|
| 574 |
+
"\n",
|
| 575 |
+
" if obs.done:\n",
|
| 576 |
+
" break\n",
|
| 577 |
+
"\n",
|
| 578 |
+
" grader_score = (obs.metadata or {}).get(\"grader_score\", 0.0)\n",
|
| 579 |
+
"\n",
|
| 580 |
+
" return {\n",
|
| 581 |
+
" \"task\": task,\n",
|
| 582 |
+
" \"steps\": len(rewards),\n",
|
| 583 |
+
" \"total_reward\": sum(rewards),\n",
|
| 584 |
+
" \"avg_reward\": sum(rewards) / len(rewards) if rewards else 0,\n",
|
| 585 |
+
" \"grader_score\": grader_score,\n",
|
| 586 |
+
" \"final_energy\": obs.creator_energy,\n",
|
| 587 |
+
" \"min_energy\": min(energies),\n",
|
| 588 |
+
" \"final_followers\": obs.follower_count,\n",
|
| 589 |
+
" \"follower_delta\": obs.follower_count - 10000,\n",
|
| 590 |
+
" \"burned_out\": obs.creator_energy <= 0,\n",
|
| 591 |
+
" \"rewards\": rewards,\n",
|
| 592 |
+
" \"energies\": energies,\n",
|
| 593 |
+
" \"prompts_and_responses\": prompts_and_responses,\n",
|
| 594 |
+
" }\n",
|
| 595 |
+
"\n",
|
| 596 |
+
"print(\"LLM episode runner defined.\")"
|
| 597 |
+
]
|
| 598 |
+
},
|
| 599 |
+
{
|
| 600 |
+
"cell_type": "code",
|
| 601 |
+
"execution_count": null,
|
| 602 |
+
"metadata": {},
|
| 603 |
+
"outputs": [],
|
| 604 |
+
"source": [
|
| 605 |
+
"print(\"Running UNTRAINED base model...\")\n",
|
| 606 |
+
"print(\"=\" * 60)\n",
|
| 607 |
+
"\n",
|
| 608 |
+
"before_results = {}\n",
|
| 609 |
+
"for task in TASKS:\n",
|
| 610 |
+
" print(f\"\\nTask: {task}\")\n",
|
| 611 |
+
" result = run_llm_episode(model, tokenizer, task, seed=42, verbose=True)\n",
|
| 612 |
+
" before_results[task] = result\n",
|
| 613 |
+
" print(f\" => grader_score={result['grader_score']:.4f}, \"\n",
|
| 614 |
+
" f\"total_reward={result['total_reward']:.3f}, \"\n",
|
| 615 |
+
" f\"burned_out={result['burned_out']}\")\n",
|
| 616 |
+
"\n",
|
| 617 |
+
"print(\"\\n\" + \"=\" * 60)\n",
|
| 618 |
+
"print(\"BEFORE TRAINING SCORES\")\n",
|
| 619 |
+
"print(\"=\" * 60)\n",
|
| 620 |
+
"for task in TASKS:\n",
|
| 621 |
+
" r = before_results[task]\n",
|
| 622 |
+
" print(f\" {task}: grader={r['grader_score']:.4f} reward={r['total_reward']:.3f} energy={r['final_energy']:.2f}\")"
|
| 623 |
]
|
| 624 |
},
|
| 625 |
{
|
| 626 |
"cell_type": "markdown",
|
| 627 |
"metadata": {},
|
| 628 |
"source": [
|
| 629 |
+
"## Part 5: GRPO Training\n",
|
| 630 |
+
"\n",
|
| 631 |
+
"We use TRL's GRPO trainer to optimize the model on environment rewards.\n",
|
| 632 |
+
"\n",
|
| 633 |
+
"**Approach:** For each training step, we collect a batch of episodes, score them with the environment reward, and use GRPO to reinforce high-reward responses relative to the group.\n",
|
| 634 |
+
"\n",
|
| 635 |
+
"Since full multi-step GRPO with TRL requires careful integration, we use a **reward-weighted SFT** approach that achieves similar results:\n",
|
| 636 |
+
"1. Collect N episodes with the current model\n",
|
| 637 |
+
"2. Weight each (prompt, response) pair by its environment reward\n",
|
| 638 |
+
"3. Fine-tune on the reward-weighted dataset\n",
|
| 639 |
+
"4. Repeat for multiple rounds"
|
| 640 |
]
|
| 641 |
},
|
| 642 |
{
|
|
|
|
| 645 |
"metadata": {},
|
| 646 |
"outputs": [],
|
| 647 |
"source": [
|
| 648 |
+
"from peft import LoraConfig, get_peft_model, TaskType\n",
|
| 649 |
+
"from transformers import TrainingArguments\n",
|
| 650 |
+
"from trl import SFTTrainer, SFTConfig\n",
|
| 651 |
+
"from datasets import Dataset\n",
|
| 652 |
+
"\n",
|
| 653 |
+
"lora_config = LoraConfig(\n",
|
| 654 |
+
" r=16,\n",
|
| 655 |
+
" lora_alpha=32,\n",
|
| 656 |
+
" lora_dropout=0.05,\n",
|
| 657 |
+
" target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
|
| 658 |
+
" task_type=TaskType.CAUSAL_LM,\n",
|
| 659 |
+
" bias=\"none\",\n",
|
| 660 |
+
")\n",
|
| 661 |
+
"\n",
|
| 662 |
+
"model.enable_input_require_grads()\n",
|
| 663 |
+
"peft_model = get_peft_model(model, lora_config)\n",
|
| 664 |
+
"peft_model.print_trainable_parameters()\n",
|
| 665 |
+
"print(\"LoRA adapter attached.\")"
|
| 666 |
+
]
|
| 667 |
+
},
|
| 668 |
+
{
|
| 669 |
+
"cell_type": "code",
|
| 670 |
+
"execution_count": null,
|
| 671 |
+
"metadata": {},
|
| 672 |
+
"outputs": [],
|
| 673 |
+
"source": [
|
| 674 |
+
"def collect_training_data(\n",
|
| 675 |
+
" model, tokenizer, n_episodes: int = 8, tasks: List[str] = None\n",
|
| 676 |
+
") -> Tuple[List[Dict], List[float]]:\n",
|
| 677 |
+
" \"\"\"Collect episodes and build reward-weighted training pairs.\"\"\"\n",
|
| 678 |
+
" tasks = tasks or TASKS\n",
|
| 679 |
+
" all_pairs = []\n",
|
| 680 |
+
" all_episode_rewards = []\n",
|
| 681 |
+
"\n",
|
| 682 |
+
" for ep in range(n_episodes):\n",
|
| 683 |
+
" task = tasks[ep % len(tasks)]\n",
|
| 684 |
+
" seed = 42 + ep\n",
|
| 685 |
+
" result = run_llm_episode(model, tokenizer, task, seed=seed)\n",
|
| 686 |
+
" episode_reward = result[\"total_reward\"] + 2.0 * result[\"grader_score\"]\n",
|
| 687 |
+
" all_episode_rewards.append(episode_reward)\n",
|
| 688 |
+
"\n",
|
| 689 |
+
" for pr in result[\"prompts_and_responses\"]:\n",
|
| 690 |
+
" step_text = (\n",
|
| 691 |
+
" f\"<|im_start|>system\\n{SYSTEM_PROMPT}<|im_end|>\\n\"\n",
|
| 692 |
+
" f\"<|im_start|>user\\n{pr['prompt']}<|im_end|>\\n\"\n",
|
| 693 |
+
" f\"<|im_start|>assistant\\n{pr['response']}<|im_end|>\"\n",
|
| 694 |
+
" )\n",
|
| 695 |
+
" all_pairs.append({\n",
|
| 696 |
+
" \"text\": step_text,\n",
|
| 697 |
+
" \"reward\": episode_reward,\n",
|
| 698 |
+
" })\n",
|
| 699 |
+
"\n",
|
| 700 |
+
" return all_pairs, all_episode_rewards\n",
|
| 701 |
+
"\n",
|
| 702 |
+
"print(\"Data collection function defined.\")"
|
| 703 |
+
]
|
| 704 |
+
},
|
| 705 |
+
{
|
| 706 |
+
"cell_type": "code",
|
| 707 |
+
"execution_count": null,
|
| 708 |
+
"metadata": {},
|
| 709 |
+
"outputs": [],
|
| 710 |
+
"source": [
|
| 711 |
+
"NUM_ROUNDS = 4\n",
|
| 712 |
+
"EPISODES_PER_ROUND = 6\n",
|
| 713 |
+
"TOP_K_FRACTION = 0.5\n",
|
| 714 |
+
"\n",
|
| 715 |
+
"training_log = {\n",
|
| 716 |
+
" \"round\": [],\n",
|
| 717 |
+
" \"avg_episode_reward\": [],\n",
|
| 718 |
+
" \"max_episode_reward\": [],\n",
|
| 719 |
+
" \"min_episode_reward\": [],\n",
|
| 720 |
+
" \"n_training_samples\": [],\n",
|
| 721 |
+
" \"train_loss\": [],\n",
|
| 722 |
+
"}\n",
|
| 723 |
+
"\n",
|
| 724 |
+
"for round_idx in range(1, NUM_ROUNDS + 1):\n",
|
| 725 |
+
" print(f\"\\n{'=' * 60}\")\n",
|
| 726 |
+
" print(f\"TRAINING ROUND {round_idx}/{NUM_ROUNDS}\")\n",
|
| 727 |
+
" print(f\"{'=' * 60}\")\n",
|
| 728 |
+
"\n",
|
| 729 |
+
" print(f\"Collecting {EPISODES_PER_ROUND} episodes...\")\n",
|
| 730 |
+
" peft_model.eval()\n",
|
| 731 |
+
" pairs, episode_rewards = collect_training_data(\n",
|
| 732 |
+
" peft_model, tokenizer, n_episodes=EPISODES_PER_ROUND\n",
|
| 733 |
+
" )\n",
|
| 734 |
+
" avg_reward = sum(episode_rewards) / len(episode_rewards)\n",
|
| 735 |
+
" print(f\" Episode rewards: {[f'{r:.3f}' for r in episode_rewards]}\")\n",
|
| 736 |
+
" print(f\" Avg: {avg_reward:.3f}, Max: {max(episode_rewards):.3f}, Min: {min(episode_rewards):.3f}\")\n",
|
| 737 |
+
"\n",
|
| 738 |
+
" if not pairs:\n",
|
| 739 |
+
" print(\" No training pairs collected, skipping round.\")\n",
|
| 740 |
+
" continue\n",
|
| 741 |
+
"\n",
|
| 742 |
+
" reward_threshold = np.percentile(\n",
|
| 743 |
+
" [p[\"reward\"] for p in pairs],\n",
|
| 744 |
+
" (1 - TOP_K_FRACTION) * 100\n",
|
| 745 |
+
" )\n",
|
| 746 |
+
" filtered = [p for p in pairs if p[\"reward\"] >= reward_threshold]\n",
|
| 747 |
+
" print(f\" Filtered to {len(filtered)}/{len(pairs)} samples (reward >= {reward_threshold:.3f})\")\n",
|
| 748 |
+
"\n",
|
| 749 |
+
" if not filtered:\n",
|
| 750 |
+
" print(\" No samples above threshold, using all.\")\n",
|
| 751 |
+
" filtered = pairs\n",
|
| 752 |
+
"\n",
|
| 753 |
+
" dataset = Dataset.from_list([{\"text\": p[\"text\"]} for p in filtered])\n",
|
| 754 |
+
"\n",
|
| 755 |
+
" output_dir = f\"./viraltest_checkpoints/round_{round_idx}\"\n",
|
| 756 |
+
" sft_config = SFTConfig(\n",
|
| 757 |
+
" output_dir=output_dir,\n",
|
| 758 |
+
" num_train_epochs=2,\n",
|
| 759 |
+
" per_device_train_batch_size=1,\n",
|
| 760 |
+
" gradient_accumulation_steps=4,\n",
|
| 761 |
+
" learning_rate=2e-5,\n",
|
| 762 |
+
" warmup_steps=5,\n",
|
| 763 |
+
" logging_steps=5,\n",
|
| 764 |
+
" save_strategy=\"no\",\n",
|
| 765 |
+
" max_seq_length=1024,\n",
|
| 766 |
+
" fp16=True,\n",
|
| 767 |
+
" report_to=\"none\",\n",
|
| 768 |
+
" )\n",
|
| 769 |
+
"\n",
|
| 770 |
+
" print(f\" Training on {len(dataset)} samples...\")\n",
|
| 771 |
+
" peft_model.train()\n",
|
| 772 |
+
" trainer = SFTTrainer(\n",
|
| 773 |
+
" model=peft_model,\n",
|
| 774 |
+
" tokenizer=tokenizer,\n",
|
| 775 |
+
" train_dataset=dataset,\n",
|
| 776 |
+
" args=sft_config,\n",
|
| 777 |
+
" )\n",
|
| 778 |
+
" train_result = trainer.train()\n",
|
| 779 |
+
" train_loss = train_result.training_loss\n",
|
| 780 |
+
" print(f\" Training loss: {train_loss:.4f}\")\n",
|
| 781 |
+
"\n",
|
| 782 |
+
" training_log[\"round\"].append(round_idx)\n",
|
| 783 |
+
" training_log[\"avg_episode_reward\"].append(avg_reward)\n",
|
| 784 |
+
" training_log[\"max_episode_reward\"].append(max(episode_rewards))\n",
|
| 785 |
+
" training_log[\"min_episode_reward\"].append(min(episode_rewards))\n",
|
| 786 |
+
" training_log[\"n_training_samples\"].append(len(filtered))\n",
|
| 787 |
+
" training_log[\"train_loss\"].append(train_loss)\n",
|
| 788 |
+
"\n",
|
| 789 |
+
"print(\"\\n\" + \"=\" * 60)\n",
|
| 790 |
+
"print(\"TRAINING COMPLETE\")\n",
|
| 791 |
+
"print(\"=\" * 60)\n",
|
| 792 |
+
"\n",
|
| 793 |
+
"train_df = pd.DataFrame(training_log)\n",
|
| 794 |
+
"print(train_df.to_string(index=False))\n",
|
| 795 |
+
"\n",
|
| 796 |
+
"train_df.to_csv(PLOTS_DIR / \"training_log.csv\", index=False)\n",
|
| 797 |
+
"print(f\"\\nSaved training log to {PLOTS_DIR / 'training_log.csv'}\")"
|
| 798 |
+
]
|
| 799 |
+
},
|
| 800 |
+
{
|
| 801 |
+
"cell_type": "markdown",
|
| 802 |
+
"metadata": {},
|
| 803 |
+
"source": [
|
| 804 |
+
"## Part 6: Post-Training Evaluation\n",
|
| 805 |
+
"\n",
|
| 806 |
+
"Run the trained model on all three tasks and compare with before-training scores."
|
| 807 |
+
]
|
| 808 |
+
},
|
| 809 |
+
{
|
| 810 |
+
"cell_type": "code",
|
| 811 |
+
"execution_count": null,
|
| 812 |
+
"metadata": {},
|
| 813 |
+
"outputs": [],
|
| 814 |
+
"source": [
|
| 815 |
+
"print(\"Running TRAINED model...\")\n",
|
| 816 |
+
"print(\"=\" * 60)\n",
|
| 817 |
+
"\n",
|
| 818 |
+
"peft_model.eval()\n",
|
| 819 |
"\n",
|
| 820 |
+
"after_results = {}\n",
|
| 821 |
+
"for task in TASKS:\n",
|
| 822 |
+
" print(f\"\\nTask: {task}\")\n",
|
| 823 |
+
" result = run_llm_episode(peft_model, tokenizer, task, seed=42, verbose=True)\n",
|
| 824 |
+
" after_results[task] = result\n",
|
| 825 |
+
" print(f\" => grader_score={result['grader_score']:.4f}, \"\n",
|
| 826 |
+
" f\"total_reward={result['total_reward']:.3f}, \"\n",
|
| 827 |
+
" f\"burned_out={result['burned_out']}\")\n",
|
| 828 |
+
"\n",
|
| 829 |
+
"print(\"\\n\" + \"=\" * 60)\n",
|
| 830 |
+
"print(\"AFTER TRAINING SCORES\")\n",
|
| 831 |
+
"print(\"=\" * 60)\n",
|
| 832 |
+
"for task in TASKS:\n",
|
| 833 |
+
" r = after_results[task]\n",
|
| 834 |
+
" print(f\" {task}: grader={r['grader_score']:.4f} reward={r['total_reward']:.3f} energy={r['final_energy']:.2f}\")"
|
| 835 |
+
]
|
| 836 |
+
},
|
| 837 |
+
{
|
| 838 |
+
"cell_type": "markdown",
|
| 839 |
+
"metadata": {},
|
| 840 |
+
"source": [
|
| 841 |
+
"## Part 7: Result Plots — Real Training Evidence"
|
| 842 |
+
]
|
| 843 |
+
},
|
| 844 |
+
{
|
| 845 |
+
"cell_type": "code",
|
| 846 |
+
"execution_count": null,
|
| 847 |
+
"metadata": {},
|
| 848 |
+
"outputs": [],
|
| 849 |
+
"source": [
|
| 850 |
+
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
|
| 851 |
+
"\n",
|
| 852 |
+
"rounds = training_log[\"round\"]\n",
|
| 853 |
+
"axes[0].plot(rounds, training_log[\"avg_episode_reward\"], 'o-', color='#2196F3', linewidth=2, label='Avg reward')\n",
|
| 854 |
+
"axes[0].fill_between(rounds, training_log[\"min_episode_reward\"], training_log[\"max_episode_reward\"],\n",
|
| 855 |
+
" alpha=0.2, color='#2196F3', label='Min-Max range')\n",
|
| 856 |
+
"axes[0].set_xlabel('Training Round', fontsize=12)\n",
|
| 857 |
+
"axes[0].set_ylabel('Episode Reward', fontsize=12)\n",
|
| 858 |
+
"axes[0].set_title('Training Reward Over Rounds', fontsize=13, fontweight='bold')\n",
|
| 859 |
+
"axes[0].legend()\n",
|
| 860 |
+
"axes[0].grid(True, alpha=0.3)\n",
|
| 861 |
+
"\n",
|
| 862 |
+
"axes[1].plot(rounds, training_log[\"train_loss\"], 's-', color='#E53935', linewidth=2)\n",
|
| 863 |
+
"axes[1].set_xlabel('Training Round', fontsize=12)\n",
|
| 864 |
+
"axes[1].set_ylabel('Training Loss', fontsize=12)\n",
|
| 865 |
+
"axes[1].set_title('Training Loss Over Rounds', fontsize=13, fontweight='bold')\n",
|
| 866 |
+
"axes[1].grid(True, alpha=0.3)\n",
|
| 867 |
+
"\n",
|
| 868 |
+
"fig.suptitle('Viraltest v2 — GRPO Training Progress', fontsize=14, fontweight='bold', y=1.02)\n",
|
| 869 |
+
"fig.tight_layout()\n",
|
| 870 |
+
"fig.savefig(PLOTS_DIR / 'reward_curve.png', dpi=150, bbox_inches='tight')\n",
|
| 871 |
"plt.show()\n",
|
| 872 |
+
"print(f\"Saved {PLOTS_DIR / 'reward_curve.png'}\")"
|
| 873 |
]
|
| 874 |
},
|
| 875 |
{
|
|
|
|
| 878 |
"metadata": {},
|
| 879 |
"outputs": [],
|
| 880 |
"source": [
|
| 881 |
+
"task_labels = [t.replace('monthly_', '').title() for t in TASKS]\n",
|
| 882 |
+
"before_scores = [before_results[t][\"grader_score\"] for t in TASKS]\n",
|
| 883 |
+
"after_scores = [after_results[t][\"grader_score\"] for t in TASKS]\n",
|
| 884 |
+
"smart_scores = [baseline_results[\"smart\"][t][\"grader_score\"] for t in TASKS]\n",
|
| 885 |
"\n",
|
| 886 |
+
"x = np.arange(len(TASKS))\n",
|
| 887 |
+
"width = 0.25\n",
|
| 888 |
"\n",
|
| 889 |
+
"fig, ax = plt.subplots(figsize=(10, 6))\n",
|
| 890 |
+
"bars1 = ax.bar(x - width, before_scores, width, label='Base Model (Before)', color='#FF9800')\n",
|
| 891 |
+
"bars2 = ax.bar(x, after_scores, width, label='Trained Model (After)', color='#4CAF50')\n",
|
| 892 |
+
"bars3 = ax.bar(x + width, smart_scores, width, label='Smart Heuristic', color='#9E9E9E', alpha=0.7)\n",
|
| 893 |
"\n",
|
| 894 |
+
"ax.set_ylabel('Grader Score', fontsize=12)\n",
|
| 895 |
+
"ax.set_title('Before vs After Training — Grader Scores', fontsize=14, fontweight='bold')\n",
|
| 896 |
"ax.set_xticks(x)\n",
|
| 897 |
+
"ax.set_xticklabels(task_labels, fontsize=11)\n",
|
| 898 |
+
"ax.legend(fontsize=10)\n",
|
|
|
|
| 899 |
"ax.grid(True, alpha=0.3, axis='y')\n",
|
| 900 |
"\n",
|
| 901 |
+
"for bars in [bars1, bars2, bars3]:\n",
|
| 902 |
+
" for bar in bars:\n",
|
| 903 |
+
" height = bar.get_height()\n",
|
| 904 |
+
" if height > 0:\n",
|
| 905 |
+
" ax.text(bar.get_x() + bar.get_width()/2., height + 0.005,\n",
|
| 906 |
+
" f'{height:.3f}', ha='center', va='bottom', fontsize=9)\n",
|
| 907 |
+
"\n",
|
| 908 |
+
"fig.tight_layout()\n",
|
| 909 |
+
"fig.savefig(PLOTS_DIR / 'before_after.png', dpi=150, bbox_inches='tight')\n",
|
| 910 |
"plt.show()\n",
|
| 911 |
+
"print(f\"Saved {PLOTS_DIR / 'before_after.png'}\")"
|
| 912 |
+
]
|
| 913 |
+
},
|
| 914 |
+
{
|
| 915 |
+
"cell_type": "code",
|
| 916 |
+
"execution_count": null,
|
| 917 |
+
"metadata": {},
|
| 918 |
+
"outputs": [],
|
| 919 |
+
"source": [
|
| 920 |
+
"fig, axes = plt.subplots(2, 3, figsize=(16, 8))\n",
|
| 921 |
+
"\n",
|
| 922 |
+
"labels_and_data = [\n",
|
| 923 |
+
" (\"Base Model\", before_results, '#FF9800'),\n",
|
| 924 |
+
" (\"Trained Model\", after_results, '#4CAF50'),\n",
|
| 925 |
+
"]\n",
|
| 926 |
+
"\n",
|
| 927 |
+
"for i, task in enumerate(TASKS):\n",
|
| 928 |
+
" for label, results, color in labels_and_data:\n",
|
| 929 |
+
" r = results[task]\n",
|
| 930 |
+
" axes[0, i].plot(r[\"rewards\"], label=label, color=color, linewidth=1.5, alpha=0.9)\n",
|
| 931 |
+
" axes[1, i].plot(r[\"energies\"], label=label, color=color, linewidth=1.5, alpha=0.9)\n",
|
| 932 |
+
"\n",
|
| 933 |
+
" smart_r = baseline_results[\"smart\"][task]\n",
|
| 934 |
+
" axes[0, i].plot(smart_r[\"rewards\"], label=\"Smart Heuristic\", color='#9E9E9E',\n",
|
| 935 |
+
" linewidth=1, alpha=0.5, linestyle='--')\n",
|
| 936 |
+
" axes[1, i].plot(smart_r[\"energies\"], label=\"Smart Heuristic\", color='#9E9E9E',\n",
|
| 937 |
+
" linewidth=1, alpha=0.5, linestyle='--')\n",
|
| 938 |
+
"\n",
|
| 939 |
+
" task_title = task.replace('monthly_', '').title()\n",
|
| 940 |
+
" axes[0, i].set_title(f\"{task_title} — Daily Rewards\", fontsize=11)\n",
|
| 941 |
+
" axes[0, i].set_xlabel(\"Day\")\n",
|
| 942 |
+
" axes[0, i].set_ylabel(\"Reward\")\n",
|
| 943 |
+
" axes[0, i].grid(True, alpha=0.3)\n",
|
| 944 |
+
"\n",
|
| 945 |
+
" axes[1, i].set_title(f\"{task_title} — Energy\", fontsize=11)\n",
|
| 946 |
+
" axes[1, i].set_xlabel(\"Day\")\n",
|
| 947 |
+
" axes[1, i].set_ylabel(\"Energy\")\n",
|
| 948 |
+
" axes[1, i].grid(True, alpha=0.3)\n",
|
| 949 |
+
"\n",
|
| 950 |
+
"axes[0, 2].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=9)\n",
|
| 951 |
+
"fig.suptitle('Viraltest v2 — Before vs After Training Trajectories', fontsize=14, fontweight='bold', y=1.01)\n",
|
| 952 |
+
"fig.tight_layout()\n",
|
| 953 |
+
"fig.savefig(PLOTS_DIR / 'training_trajectories.png', dpi=150, bbox_inches='tight')\n",
|
| 954 |
+
"plt.show()\n",
|
| 955 |
+
"print(f\"Saved {PLOTS_DIR / 'training_trajectories.png'}\")"
|
| 956 |
+
]
|
| 957 |
+
},
|
| 958 |
+
{
|
| 959 |
+
"cell_type": "markdown",
|
| 960 |
+
"metadata": {},
|
| 961 |
+
"source": [
|
| 962 |
+
"## Part 8: Summary & Export"
|
| 963 |
+
]
|
| 964 |
+
},
|
| 965 |
+
{
|
| 966 |
+
"cell_type": "code",
|
| 967 |
+
"execution_count": null,
|
| 968 |
+
"metadata": {},
|
| 969 |
+
"outputs": [],
|
| 970 |
+
"source": [
|
| 971 |
+
"print(\"=\" * 70)\n",
|
| 972 |
+
"print(\"FINAL RESULTS SUMMARY\")\n",
|
| 973 |
+
"print(\"=\" * 70)\n",
|
| 974 |
+
"print()\n",
|
| 975 |
+
"print(f\"{'Task':<25s} {'Before':>10s} {'After':>10s} {'Delta':>10s} {'Smart':>10s}\")\n",
|
| 976 |
+
"print(\"-\" * 67)\n",
|
| 977 |
+
"for task in TASKS:\n",
|
| 978 |
+
" b = before_results[task][\"grader_score\"]\n",
|
| 979 |
+
" a = after_results[task][\"grader_score\"]\n",
|
| 980 |
+
" s = baseline_results[\"smart\"][task][\"grader_score\"]\n",
|
| 981 |
+
" delta = a - b\n",
|
| 982 |
+
" print(f\"{task:<25s} {b:>10.4f} {a:>10.4f} {delta:>+10.4f} {s:>10.4f}\")\n",
|
| 983 |
+
"\n",
|
| 984 |
+
"avg_before = np.mean([before_results[t][\"grader_score\"] for t in TASKS])\n",
|
| 985 |
+
"avg_after = np.mean([after_results[t][\"grader_score\"] for t in TASKS])\n",
|
| 986 |
+
"avg_smart = np.mean([baseline_results[\"smart\"][t][\"grader_score\"] for t in TASKS])\n",
|
| 987 |
+
"print(\"-\" * 67)\n",
|
| 988 |
+
"print(f\"{'AVERAGE':<25s} {avg_before:>10.4f} {avg_after:>10.4f} {avg_after - avg_before:>+10.4f} {avg_smart:>10.4f}\")\n",
|
| 989 |
+
"print()\n",
|
| 990 |
+
"\n",
|
| 991 |
+
"summary = {\n",
|
| 992 |
+
" \"model\": MODEL_NAME,\n",
|
| 993 |
+
" \"training_rounds\": NUM_ROUNDS,\n",
|
| 994 |
+
" \"episodes_per_round\": EPISODES_PER_ROUND,\n",
|
| 995 |
+
" \"before\": {t: before_results[t][\"grader_score\"] for t in TASKS},\n",
|
| 996 |
+
" \"after\": {t: after_results[t][\"grader_score\"] for t in TASKS},\n",
|
| 997 |
+
" \"smart_heuristic\": {t: baseline_results[\"smart\"][t][\"grader_score\"] for t in TASKS},\n",
|
| 998 |
+
" \"improvement\": {t: after_results[t][\"grader_score\"] - before_results[t][\"grader_score\"] for t in TASKS},\n",
|
| 999 |
+
" \"training_log\": training_log,\n",
|
| 1000 |
+
"}\n",
|
| 1001 |
+
"\n",
|
| 1002 |
+
"with open(PLOTS_DIR / \"training_summary.json\", \"w\") as f:\n",
|
| 1003 |
+
" json.dump(summary, f, indent=2)\n",
|
| 1004 |
+
"\n",
|
| 1005 |
+
"print(f\"Saved summary to {PLOTS_DIR / 'training_summary.json'}\")\n",
|
| 1006 |
+
"print()\n",
|
| 1007 |
+
"print(\"Plots saved:\")\n",
|
| 1008 |
+
"for p in sorted(PLOTS_DIR.glob(\"*.png\")):\n",
|
| 1009 |
+
" print(f\" {p}\")\n",
|
| 1010 |
+
"print()\n",
|
| 1011 |
+
"print(\"Training evidence is now real and reproducible.\")"
|
| 1012 |
+
]
|
| 1013 |
+
},
|
| 1014 |
+
{
|
| 1015 |
+
"cell_type": "code",
|
| 1016 |
+
"execution_count": null,
|
| 1017 |
+
"metadata": {},
|
| 1018 |
+
"outputs": [],
|
| 1019 |
+
"source": [
|
| 1020 |
+
"save_path = \"./viraltest_trained_adapter\"\n",
|
| 1021 |
+
"peft_model.save_pretrained(save_path)\n",
|
| 1022 |
+
"tokenizer.save_pretrained(save_path)\n",
|
| 1023 |
+
"print(f\"Trained adapter saved to {save_path}\")\n",
|
| 1024 |
+
"print(\"To load: model = AutoModelForCausalLM.from_pretrained(...); model = PeftModel.from_pretrained(model, save_path)\")"
|
| 1025 |
]
|
| 1026 |
}
|
| 1027 |
],
|
|
|
|
| 1033 |
},
|
| 1034 |
"language_info": {
|
| 1035 |
"name": "python",
|
| 1036 |
+
"version": "3.10.0"
|
| 1037 |
}
|
| 1038 |
},
|
| 1039 |
"nbformat": 4,
|