vaibhav12332112312 commited on
Commit
fc3950d
·
1 Parent(s): fcfbc38

firstiteration

Browse files
README.md CHANGED
@@ -11,263 +11,178 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # Viraltest — RL-Based Creator Optimization Environment
15
 
16
- An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment that simulates a social media creator’s weekly posting lifecycle. An AI agent learns **when to post**, **what format**, **which tags**, and **how to differentiate from competitors** — maximizing engagement while managing burnout and sleep.
 
17
 
18
- ## Submission requirements — how this repo maps
19
 
20
- Use this table to confirm Phase 1 (automated) gates before you submit.
21
-
22
- | Requirement | Status in this repo | Where to verify |
23
- |---------------|---------------------|-----------------|
24
- | Real-world task (not a toy/game) | **Met** creator scheduling, energy, trends, competitors | `server/viraltest_environment.py`, `DESIGN.md` |
25
- | Full OpenEnv spec: `openenv.yaml`, typed models, HTTP API | **Met** | `openenv.yaml`, `models.py`, `server/app.py` (`create_app`) |
26
- | `step()` / `reset()` / `state()` | **Met** standard OpenEnv HTTP endpoints | Run `openenv validate` |
27
- | ≥3 tasks with graders (easyhard), scores in **0.0–1.0** | **Met** `weekly_engage`, `weekly_strategic`, `weekly_competitive` | `_run_grader()` in `server/viraltest_environment.py` |
28
- | Meaningful reward + partial progress | **Met** — per-step `_compute_reward()` | `_compute_reward()` |
29
- | Baseline inference script, reproducible | **Met** — root `inference.py` | See **Baseline inference** below |
30
- | `Dockerfile` builds | **Expected** — root `Dockerfile` | `docker build -t viraltest .` (run locally) |
31
- | HF Space deploys; `POST /reset` returns **200** | **You must configure** | See **Hugging Face Spaces** — ping **Space root**, not only `/web` |
32
- | `openenv validate` passes | **Met** in dev (`.venv/bin/openenv validate`) | CI / local |
33
- | Env vars: `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN` | **Documented** — `inference.py` reads them (see **Environment variables**) | HF Space **Settings → Secrets** |
34
- | `inference.py` at repo root; OpenAI client for LLM calls | **Met** | `inference.py` |
35
- | Structured stdout: `[START]`, `[STEP]`, `[END]` | **Met** — match field order in `log_*` helpers | `inference.py` |
36
- | Inference under 20 minutes; 2 vCPU / 8 GB | **Check** — 3 tasks × up to 168 steps each = many LLM calls; use a fast endpoint and sensible `MAX_TOKENS` | `inference.py` |
37
-
38
- ### Minor items to double-check before judging
39
-
40
- 1. **`[STEP]` `error=` field** — The spec asks for the raw `last_action_error` or `null`. This repo logs errors with spaces replaced by underscores so each line stays a single token after `error=`. If the organizer’s parser expects literal spaces inside unquoted messages, align with their sample; otherwise this is fine for one-line logs.
41
- 2. **Default `API_BASE_URL` in `inference.py`** — Defaults are for local dev. On Hugging Face, set **`API_BASE_URL`** (e.g. `https://router.huggingface.co/v1`) and **`MODEL_NAME`** in Secrets so evaluation matches your setup.
42
- 3. **Space URL for the validator** — The official script POSTs to `{your_space_url}/reset` with body `{}`. That must be the **root** of the Space (e.g. `https://YOURNAME-spacename.hf.space`), not the Gradio path under `base_path: /web`. Confirm with curl (see **Pre-submission validation**).
43
-
44
- ---
45
 
46
  ## Why this matters
47
 
48
- Many creators burn out while optimizing posting times and formats. This environment turns that tradeoff into a reproducible simulation so agents can be trained and compared on the same weekly horizon (**168** hourly steps).
49
 
50
- ---
51
-
52
- ## Quick Start (Python)
53
-
54
- The HTTP client is **async** (same pattern as root `inference.py`):
55
 
56
  ```python
57
  import asyncio
58
  from viraltest import ViraltestAction, ViraltestEnv
 
59
 
60
  async def main():
61
  env = ViraltestEnv(base_url="http://localhost:8000")
62
  try:
63
- result = await env.reset(task="weekly_engage")
64
  action = ViraltestAction(
65
- action_type="post",
66
- content_type="reel",
67
- topic="AI trends",
68
- tags=["ai", "coding", "devtools"],
 
 
 
 
69
  )
70
  result = await env.step(action)
71
- print(result.observation.engagement_rate, result.observation.creator_energy)
72
  finally:
73
  await env.close()
74
 
75
  asyncio.run(main())
76
  ```
77
 
78
- ---
79
 
80
- ## Action space
81
 
82
- | Field | Type | Description |
83
- |-------|------|-------------|
84
- | `action_type` | `"post" \| "rest" \| "create_content"` | What the agent does this hour |
85
- | `content_type` | `"reel" \| "story" \| "carousel" \| "text_post"` | Required when posting |
86
- | `topic` | `str` (≤200 chars) | Post topic |
87
- | `tags` | `list[str]` (≤5) | Tags from the environment tag pool |
88
 
89
- ---
 
 
 
 
 
90
 
91
- ## Observation space (high level)
92
 
93
- | Field | Description |
94
- |-------|-------------|
95
- | `current_hour`, `day_of_week`, `days_elapsed` | Simulated calendar |
96
- | `creator_energy`, `hours_since_sleep`, `sleep_debt` | Burnout and sleep |
97
- | `follower_count`, `engagement_rate` | Growth and rolling engagement |
98
- | `trending_topics`, `trending_tags`, `tag_performance` | Trends and learned tag quality |
99
- | `competitor_recent_posts`, `competitor_avg_engagement`, `niche_saturation` | Competition |
100
- | `error`, `reward`, `done`, `metadata` | Errors, shaping reward, termination, **`metadata["grader_score"]` at episode end** |
101
 
102
- Full schema: `GET /schema` when the server is running.
103
 
104
- ---
 
 
 
 
105
 
106
- ## Tasks and graders (168 steps each)
107
 
108
  | Task | Difficulty | Grader focus |
109
- |------|------------|--------------|
110
- | `weekly_engage` | Easier | Total engagement vs theoretical max; burnout penalty |
111
- | `weekly_strategic` | Medium | Engagement + tag discovery/exploitation + energy + consistency |
112
- | `weekly_competitive` | Hard | Adds growth vs competitors, differentiation, diversity constraints |
113
 
114
- Episode ends after **168** steps or if **energy ≤ 0**. Final normalized score is in **`observation.metadata["grader_score"]`** in **\[0, 1\]**.
115
 
116
- ---
 
 
 
 
 
 
 
 
 
117
 
118
- ## Reward shaping
119
 
120
- Per-step reward in **`[0, 1]`** combines engagement, energy change, posting consistency, tags, and competitor differentiation (`_compute_reward` in `server/viraltest_environment.py`). It is dense enough for learning signals before the terminal grader runs.
121
 
122
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  ## Local development
125
 
126
  ```bash
127
- git clone <your-repo-url>
128
- cd viral-posts-env # or your fork name
129
-
130
- # Install (uv recommended; pip works too)
131
  uv sync
132
- # source .venv/bin/activate # optional
133
 
134
  # Terminal 1 — API server
135
  uvicorn viraltest.server.app:app --host 0.0.0.0 --port 8000
136
 
137
- # Terminal 2 — optional UI
138
- # Open http://localhost:8000/dashboard (see server routes in server/app.py)
139
- ```
140
-
141
- Validate the OpenEnv layout:
142
-
143
- ```bash
144
- .venv/bin/openenv validate
145
- # Expect: [OK] ... Ready for multi-mode deployment
146
  ```
147
 
148
- ---
149
-
150
  ## Docker
151
 
152
- From the repository root (same directory as `Dockerfile`):
153
-
154
  ```bash
155
  docker build -t viraltest-env:latest .
156
  docker run --rm -p 8000:8000 viraltest-env:latest
 
157
  ```
158
 
159
- Smoke test:
160
-
161
- ```bash
162
- curl -s -o /dev/null -w "%{http_code}" -X POST -H "Content-Type: application/json" -d '{}' http://localhost:8000/reset
163
- # Expect: 200
164
- ```
165
-
166
- ---
167
-
168
- ## Hugging Face Spaces — deploy
169
-
170
- 1. **Create a Space** with **Docker** SDK (this repo’s README frontmatter uses `sdk: docker`).
171
- 2. **Push this repository** (or connect GitHub) so the Space builds from the root `Dockerfile`.
172
- 3. **Settings → Variables and secrets** — add at least:
173
- - **`HF_TOKEN`** — Hugging Face API token for inference (and Space pull if private).
174
- - **`API_BASE_URL`** — OpenAI-compatible base URL (e.g. `https://router.huggingface.co/v1`).
175
- - **`MODEL_NAME`** — Model id for that router (e.g. `Qwen/Qwen2.5-72B-Instruct`).
176
- 4. **App port** — `8000` (see frontmatter `app_port: 8000`).
177
- 5. **`base_path: /web`** — Used for the bundled web UI; the **REST** endpoints (`/reset`, `/step`, `/state`) remain on the **Space root host** as required by the submission validator. **Always test** `https://<your-space>.hf.space/reset` (not only `/web/...`).
178
-
179
- Optional CLI (if you use OpenEnv’s tooling):
180
-
181
- ```bash
182
- pip install openenv-core
183
- openenv push # follow OpenEnv docs for auth and target Space
184
- ```
185
-
186
- ---
187
-
188
- ## Baseline inference (`inference.py`)
189
-
190
- **Location:** repository root — **`inference.py`** (required by the hackathon).
191
-
192
- **LLM client:** OpenAI-compatible client (`from openai import OpenAI`) using:
193
-
194
- | Variable | Role |
195
- |----------|------|
196
- | `API_BASE_URL` | OpenAI-compatible API base |
197
- | `MODEL_NAME` | Model name for `chat.completions` |
198
- | `HF_TOKEN` | Preferred API key (fallbacks: `OPENAI_API_KEY`, `API_KEY`) |
199
- | `IMAGE_NAME` / `LOCAL_IMAGE_NAME` | If using `ViraltestEnv.from_docker_image(...)` instead of HTTP |
200
- | `ENV_BASE_URL` | HTTP server URL (default `http://localhost:8000`) |
201
-
202
- **Stdout format (must not change field names or order):**
203
-
204
- ```text
205
- [START] task=<name> env=<benchmark> model=<model>
206
- [STEP] step=<n> action=<str> reward=<0.00> done=<true|false> error=<msg|null>
207
- [END] success=<true|false> steps=<n> score=<0.00> rewards=<r1,r2,...>
208
- ```
209
-
210
- Run locally (server on port 8000):
211
-
212
- ```bash
213
- export HF_TOKEN=hf_...
214
- export API_BASE_URL=https://router.huggingface.co/v1
215
- export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
216
- uv sync && .venv/bin/python inference.py
217
- ```
218
-
219
- **Short episodes for debugging** — `ALLOW_SHORT_EPISODE=1` and `MAX_STEPS` can shorten runs; full weekly tasks still use **168** steps unless you override (see comments in `inference.py`).
220
-
221
- ---
222
-
223
- ## Pre-submission validation
224
-
225
- Use the provided script (same checks as the official template: ping Space, Docker build, `openenv validate`):
226
-
227
- ```bash
228
- chmod +x validate-submission.sh
229
- ./validate-submission.sh https://YOUR-SPACE.hf.space /path/to/viral-posts-env
230
- ```
231
-
232
- Or download the organizer’s script from their repo and pass your Space URL.
233
-
234
- **Manual ping (required to pass automated gate):**
235
-
236
- ```bash
237
- curl -s -o /dev/null -w "%{http_code}\n" -X POST \
238
- -H "Content-Type: application/json" -d '{}' \
239
- https://YOUR-SPACE.hf.space/reset
240
- # Must print: 200
241
- ```
242
-
243
- ---
244
-
245
- ## Baseline scores (reference)
246
-
247
- Deterministic dashboard agents (not the LLM) — see `README` tables in-repo history / `DESIGN.md` for methodology. Your **`inference.py`** scores will vary by model and endpoint; keep runs under the **20-minute** inference budget.
248
-
249
- ---
250
-
251
  ## Project structure
252
 
253
  ```
254
  .
255
- ├── inference.py # Hackathon-required baseline (LLM + [START]/[STEP]/[END])
256
- ├── openenv.yaml # OpenEnv manifest
257
- ├── models.py # ViraltestAction, ViraltestObservation
258
- ├── client.py # ViraltestEnv client
259
  ├── Dockerfile
260
- ├── validate-submission.sh # Local preflight
261
- ├── test_scenarios.py # Offline env tests
262
- ├── DESIGN.md # Deep design / research notes
263
- ── server/
264
- ├── app.py # FastAPI + create_app
265
- ── viraltest_environment.py
266
- ── dashboard.html
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
267
  ```
268
 
269
- ---
270
-
271
  ## License
272
 
273
  See `LICENSE` in the repository root (BSD-style per upstream OpenEnv examples).
 
11
  - openenv
12
  ---
13
 
14
+ # Viraltest v2 World-Modeling RL Environment for Instagram Strategy
15
 
16
+ > **Theme #3.1 Professional Tasks (World Modeling)**
17
+ > An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where an LLM agent manages an Instagram creator account over 30 simulated days, discovering the world through tools rather than being told the rules.
18
 
19
+ ## What this teaches the LLM
20
 
21
+ | Capability | How the environment tests it |
22
+ |---|---|
23
+ | **Tool discovery & orchestration** | 8 discoverable tools (`query_trends`, `query_competitor`, `predict_engagement`...). Agent must call `GET /tools` to learn what's available. |
24
+ | **Persistent world model** | 30-day horizon. Multi-episode brand chain carries state across months. |
25
+ | **Belief tracking** | `notes` field persists hypotheses day-to-day. Agent must update beliefs from tool results. |
26
+ | **Causal reasoning** | `coach_feedback` returns counterfactual delta (your plan vs. heatmap-optimal). `predict_engagement` lets agent test hypotheses before committing. |
27
+ | **Partial observability** | Default observation is sparse: energy, followers, reward. Rich data (trends, competitors, tags) only via tools. |
28
+ | **Multi-step workflow** | Per day: discover query draft predict commit reply learn from feedback. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Why this matters
31
 
32
+ The $250B creator economy ([Goldman Sachs, 2025](https://www.goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027)) has 67M creators, but 73% experience burnout ([Awin, 2024](https://www.prweb.com/releases/a-majority-of-content-creators-and-influencers-struggle-with-burnout-as-concerns-for-ai-begin-to-surface-according-to-a-new-awin-group-survey-research-302257152.html)). This environment turns the posting-vs-burnout tradeoff into a reproducible simulation calibrated against 10+ verifiable sources.
33
 
34
+ ## Quick Start
 
 
 
 
35
 
36
  ```python
37
  import asyncio
38
  from viraltest import ViraltestAction, ViraltestEnv
39
+ from viraltest.models import ToolCall
40
 
41
  async def main():
42
  env = ViraltestEnv(base_url="http://localhost:8000")
43
  try:
44
+ result = await env.reset(task="monthly_strategic")
45
  action = ViraltestAction(
46
+ tool_calls=[
47
+ ToolCall(name="query_trends", arguments={"niche": "tech"}),
48
+ ],
49
+ scheduled_actions=[
50
+ {"hour": 12, "action_type": "post", "content_type": "reel",
51
+ "topic": "AI tools", "tags": ["ai", "coding"], "intent": "watch_bait"},
52
+ ],
53
+ notes="Day 1: querying trends to establish baseline.",
54
  )
55
  result = await env.step(action)
56
+ print(result.observation.engagement_signals)
57
  finally:
58
  await env.close()
59
 
60
  asyncio.run(main())
61
  ```
62
 
63
+ ## Simulation mechanics
64
 
65
+ ### Engagement signals (Mosseri Jan-2025)
66
 
67
+ Instagram's head confirmed the top-3 ranking signals. Our reward decomposes engagement accordingly:
 
 
 
 
 
68
 
69
+ | Signal | Weight | Best format | Source |
70
+ |--------|--------|-------------|--------|
71
+ | Watch time | 0.40 | Reels | Mosseri Jan-2025 |
72
+ | Sends per reach | 0.30 | Stories | Mosseri Jan-2025 |
73
+ | Saves | 0.20 | Carousels | Mosseri Jan-2025 |
74
+ | Likes per reach | 0.10 | Text posts | Mosseri Jan-2025 |
75
 
76
+ ### Hour heatmap
77
 
78
+ 7×24 multiplier grid from [Buffer 9.6M posts](https://buffer.com/resources/when-is-the-best-time-to-post-on-instagram) cross-validated with [Sprout Social 2B engagements](https://sproutsocial.com/insights/best-times-to-post-on-social-media/).
 
 
 
 
 
 
 
79
 
80
+ ### Sleep model
81
 
82
+ Piecewise-linear from [Van Dongen et al. 2003](https://pubmed.ncbi.nlm.nih.gov/12683469) (*Sleep*, PMID 12683469): no quality loss below 16h awake, then 6.25% per hour, floor at 30%.
83
+
84
+ ### Audience fatigue
85
+
86
+ Tiered from [Buffer 2.1M study](https://buffer.com/resources/how-often-to-post-on-instagram/): 2 posts/day=1.0×, 3=0.75×, 4=0.50×, 5+=0.25×. Weekly cap at 7 posts → 0.75×.
87
 
88
+ ## Tasks and graders (30 steps each)
89
 
90
  | Task | Difficulty | Grader focus |
91
+ |------|-----------|--------------|
92
+ | `monthly_engage` | Easier | Total engagement vs theoretical max; burnout penalty |
93
+ | `monthly_strategic` | Medium | + tag discovery/exploitation + energy + consistency |
94
+ | `monthly_competitive` | Hard | + growth vs competitors + differentiation + content diversity |
95
 
96
+ ## Tool catalog
97
 
98
+ | Tool | Cost | Returns |
99
+ |------|------|---------|
100
+ | `query_trends` | 1 | Trending topics, tags, niche saturation |
101
+ | `query_competitor` | 2 | Recent posts, avg engagement, strategy |
102
+ | `query_tag_history` | 1 | Your historical signals per tag |
103
+ | `query_audience` | 2 | Segment affinities, active hours |
104
+ | `predict_engagement` | 3 | Simulated signals without committing |
105
+ | `draft_review` | 3 | Strengths/weaknesses of a plan |
106
+ | `query_creator_pool` | 1 | Available collab partners + overlap |
107
+ | `propose_collab` | 5 | Propose collaboration (max 2/month) |
108
 
109
+ API budget starts at 100 per episode.
110
 
111
+ ## Sources & verifiability
112
 
113
+ Every constant is backed by a Tier 1–3 source. Full bibliography with DOIs, PMIDs, and methodology extracts: **[RESEARCH.md](RESEARCH.md)**.
114
+
115
+ | Tier | Count | Example |
116
+ |------|-------|---------|
117
+ | T1 (Peer-reviewed) | 7 papers | Van Dongen 2003, arxiv:2410.13108 |
118
+ | T2 (Industry, large-N) | 9 studies | Buffer 9.6M, Sprout 2B, Rival IQ 1.9M |
119
+ | T3 (Official) | 1 statement | Mosseri Jan-2025 |
120
+ | T4 (Survey) | 2 surveys | Awin 2024 (n=300+) |
121
+ | T5 (Rejected) | 13 sites | No methodology disclosed |
122
+
123
+ ## Storytelling assets
124
+
125
+ - [HuggingFace blog](blog/hf_mini_blog.md)
126
+ - [YouTube script (<2 min)](blog/youtube_script.md)
127
+ - [Slide deck outline](blog/slide_outline.md)
128
 
129
  ## Local development
130
 
131
  ```bash
132
+ git clone <repo-url> && cd viraltest
 
 
 
133
  uv sync
 
134
 
135
  # Terminal 1 — API server
136
  uvicorn viraltest.server.app:app --host 0.0.0.0 --port 8000
137
 
138
+ # Terminal 2 — inference
139
+ export HF_TOKEN=hf_...
140
+ export API_BASE_URL=https://router.huggingface.co/v1
141
+ export MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
142
+ .venv/bin/python inference.py
 
 
 
 
143
  ```
144
 
 
 
145
  ## Docker
146
 
 
 
147
  ```bash
148
  docker build -t viraltest-env:latest .
149
  docker run --rm -p 8000:8000 viraltest-env:latest
150
+ curl -s -X POST -H "Content-Type: application/json" -d '{}' http://localhost:8000/reset
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Project structure
154
 
155
  ```
156
  .
157
+ ├── inference.py # Tool-discovery agent (no hint keys)
158
+ ├── openenv.yaml # OpenEnv manifest
159
+ ├── models.py # Action/Observation + ToolCall, EngagementSignals
160
+ ├── client.py # ViraltestEnv client (async)
161
  ├── Dockerfile
162
+ ├── RESEARCH.md # Full sourced bibliography (6+ pages)
163
+ ├── DESIGN.md # Deep design notes
164
+ ├── blog/
165
+ │ ├── hf_mini_blog.md
166
+ ├── youtube_script.md
167
+ │ └── slide_outline.md
168
+ ── server/
169
+ │ ├── app.py # FastAPI + /tools endpoints
170
+ │ ├── viraltest_environment.py
171
+ │ ├── dashboard.html
172
+ │ └── data/
173
+ │ ├── tags.json # ~120 tags, 4 tiers
174
+ │ ├── topics.json # Niche multipliers + seasonal calendar
175
+ │ ├── competitors.json # 7 archetypes
176
+ │ ├── hour_heatmap.json # 7×24 from Buffer+Sprout
177
+ │ ├── audience_segments.json
178
+ │ └── audience_overlap_matrix.json
179
+ ├── training/
180
+ │ └── train_grpo.ipynb # TRL GRPO on Qwen2.5-1.5B-Instruct
181
+ └── plots/
182
+ ├── reward_curve.png
183
+ └── before_after.png
184
  ```
185
 
 
 
186
  ## License
187
 
188
  See `LICENSE` in the repository root (BSD-style per upstream OpenEnv examples).
RESEARCH.md ADDED
@@ -0,0 +1,266 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Research Bibliography — Viraltest v2
2
+
3
+ Every constant and design decision in Viraltest is backed by a verifiable source. This document groups sources by quality tier so any reviewer can audit our claims.
4
+
5
+ ## Source quality bar
6
+
7
+ | Tier | Criteria | Example |
8
+ |------|----------|---------|
9
+ | **T1** — Peer-reviewed | Published in a journal or arXiv with disclosed methodology, sample, and peer review | Van Dongen 2003 *Sleep* |
10
+ | **T2** — Industry research | Named org, disclosed methodology, sample ≥100K data points | Buffer 9.6M post study |
11
+ | **T3** — Official platform | Public statement by platform leadership | Adam Mosseri, Head of Instagram |
12
+ | **T4** — Survey (cite with caveat) | Named org, disclosed sample, no external audit | Awin 2024 (n=300+) |
13
+ | **T5** — Rejected | SEO/affiliate blog, no methodology, no auditable sample | *Not cited* |
14
+
15
+ ---
16
+
17
+ ## Tier 1 — Peer-reviewed
18
+
19
+ ### Van Dongen HPA, Maislin G, Mullington JM, Dinges DF (2003)
20
+
21
+ **Title:** The cumulative cost of additional wakefulness: dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation
22
+
23
+ **Venue:** *Sleep* 26(2):117–126 (Oxford University Press)
24
+ **Type:** Randomized controlled trial
25
+ **PMID:** [12683469](https://pubmed.ncbi.nlm.nih.gov/12683469)
26
+ **DOI:** [10.1093/sleep/26.2.117](https://doi.org/10.1093/sleep/26.2.117)
27
+ **Sample:** n=48 healthy adults (ages 21–38), laboratory conditions, 14 consecutive days
28
+
29
+ **Methodology:** Subjects randomized to 4h, 6h, or 8h time-in-bed per night for 14 days, or 0h for 3 days. Continuous behavioral/physiological monitoring. Performance measured via psychomotor vigilance task (PVT), digit symbol substitution, serial addition/subtraction.
30
+
31
+ **Key finding:** Lapses in behavioral alertness were near-linearly related to cumulative wakefulness exceeding **15.84 hours** (SE 0.73h), regardless of whether deprivation was chronic or total. 6h sleep/night for 14 days produced deficits equivalent to 1–2 nights of total sleep deprivation. Subjects were largely unaware of their impairment.
32
+
33
+ **What we use:** `SLEEP_OPTIMAL_AWAKE = 16` (rounded from 15.84). Piecewise-linear quality decay: no loss below 16h awake, then `SLEEP_LINEAR_DECAY_PER_HOUR = 0.0625` (reaches ~50% at 24h), floor at `SLEEP_MIN_QUALITY = 0.30`.
34
+
35
+ ---
36
+
37
+ ### Cen Y et al. (2024)
38
+
39
+ **Title:** Algorithmic Content Selection and the Impact of User Disengagement
40
+ **Venue:** arXiv [2410.13108](https://arxiv.org/abs/2410.13108) (v2, Feb 2025)
41
+ **Type:** Theoretical (multi-armed bandit model with user engagement states)
42
+
43
+ **Methodology:** Introduces a content selection model where users have k engagement levels. Derives O(k²) dynamic programming for optimal policy. Proves no-regret online learning guarantees.
44
+
45
+ **Key finding:** Content maximizing immediate reward is not necessarily optimal for sustained engagement. Higher friction (reduced re-engagement likelihood) counterintuitively leads to higher engagement under optimal policies. Modified demand elasticity captures how satisfaction changes affect long-term revenue.
46
+
47
+ **What we use:** Justifies tiered fatigue model (`FATIGUE_TIERS`) — over-posting creates diminishing returns, not a cliff. Also informs the `ALGORITHM_PENALTY` mechanic.
48
+
49
+ ---
50
+
51
+ ### Aouali I et al. (2024)
52
+
53
+ **Title:** System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes
54
+ **Venue:** arXiv [2406.01611](https://arxiv.org/abs/2406.01611)
55
+ **Type:** Theoretical + synthetic experiments
56
+
57
+ **Methodology:** Generative model where user return probability depends on Hawkes process with System-1 (impulse) and System-2 (utility) components. Proves identifiability of utility from engagement data.
58
+
59
+ **Key finding:** Pure engagement-driven optimization ≠ user utility. Utility-driven interactions have lasting return effects; impulse-driven interactions vanish rapidly. Platforms can disentangle the two from return-probability data.
60
+
61
+ **What we use:** Informs the Mosseri-aligned reward decomposition (watch_time ≈ System-1 impulse; saves ≈ System-2 utility). Validates splitting engagement into distinct signals rather than a single float.
62
+
63
+ ---
64
+
65
+ ### Yu Y et al. (2024)
66
+
67
+ **Title:** Uncovering the Interaction Equation: Quantifying the Effect of User Interactions on Social Media Homepage Recommendations
68
+ **Venue:** arXiv [2407.07227](https://arxiv.org/abs/2407.07227)
69
+ **Type:** Empirical (controlled experiments on YouTube, Reddit, X)
70
+
71
+ **Key finding:** Platform algorithms respond to user interactions by adjusting content distribution. Evidence of topic deprioritization when engagement drops. Inactivity leads to reduced content surfacing.
72
+
73
+ **What we use:** `FOLLOWER_DECAY_HOURS = 72` and `ALGORITHM_PENALTY` scaling with gap length.
74
+
75
+ ---
76
+
77
+ ### Lin Y et al. (2024)
78
+
79
+ **Title:** Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms
80
+ **Venue:** arXiv [2410.23683](https://arxiv.org/abs/2410.23683)
81
+ **Type:** Theoretical + empirical
82
+
83
+ **Key finding:** Relevance-driven recommendation boosts short-term satisfaction but harms long-term content richness. Explorative policy slightly lowers satisfaction but promotes content production volume.
84
+
85
+ **What we use:** Justifies multi-episode brand persistence — the creator's long-term niche identity matters more than per-post optimization.
86
+
87
+ ---
88
+
89
+ ### Cao X, Wu Y, Cheng B et al. (2024)
90
+
91
+ **Title:** An investigation of the social media overload and academic performance
92
+ **Venue:** *Education and Information Technologies* 29:10303–10328 (Springer)
93
+ **DOI:** [10.1007/s10639-023-12213-6](https://doi.org/10.1007/s10639-023-12213-6)
94
+ **Sample:** n=249 university students, survey
95
+ **Type:** Quantitative survey study
96
+
97
+ **Key finding:** Techno-invasion and techno-overload create psychological stress → exhaustion → perceived irreplaceability → reduced performance. Social support partially buffers the effect.
98
+
99
+ **What we use:** `burnout_risk` observation field — exhaustion accumulates gradually (not binary), mirrors the stress→exhaustion→performance pathway.
100
+
101
+ ---
102
+
103
+ ### Wen J, Wang H, Chen H (2026)
104
+
105
+ **Title:** Research on the formation mechanism of social media burnout among college students based on the ISM-MICMAC model
106
+ **Venue:** *Scientific Reports* (Nature)
107
+ **DOI:** 10.1038/s41598-026-42958-2
108
+ **Sample:** 8 experts (Delphi method), 58 papers reviewed, 15 factors identified
109
+
110
+ **Key finding:** Algorithm recommendations and social comparison are the root-level structural drivers of burnout. Platform-technical mechanisms exert high driving power over subsequent overloads.
111
+
112
+ **What we use:** Contextualizes the `burnout_risk` mechanic — algorithm pressure (our trending/saturation system) is a documented root cause.
113
+
114
+ ---
115
+
116
+ ## Tier 2 — Industry research (methodology disclosed, large N)
117
+
118
+ ### Buffer (2026) — Best Time to Post on Instagram
119
+
120
+ **URL:** [buffer.com/resources/when-is-the-best-time-to-post-on-instagram](https://buffer.com/resources/when-is-the-best-time-to-post-on-instagram)
121
+ **Sample:** 9.6 million posts
122
+ **Methodology:** Engagement data aggregated by hour and day of week across Buffer users. Times in local timezone.
123
+
124
+ **Key findings:** Peak: Thu 9am, Wed 12pm, Wed 6pm. Evenings 6–11pm strongest overall. Fri/Sat weakest. Wed best overall day.
125
+
126
+ **What we use:** `server/data/hour_heatmap.json` — 7×24 multiplier grid.
127
+
128
+ ---
129
+
130
+ ### Buffer (2026) — How Often to Post on Instagram
131
+
132
+ **URL:** [buffer.com/resources/how-often-to-post-on-instagram](https://buffer.com/resources/how-often-to-post-on-instagram)
133
+ **Sample:** 2.1 million posts, 102K accounts
134
+ **Methodology:** Julian Goldie analyzed posting frequency buckets (0, 1–2, 3–5, 6–9, 10+/week) vs follower growth and reach per post.
135
+
136
+ **Key findings:** 3–5 posts/week doubles follower growth vs 1–2. 7+/week shows 20–35% engagement drop per post. Diminishing returns above 5/week.
137
+
138
+ **What we use:** `FATIGUE_TIERS`, `WEEKLY_FATIGUE_THRESHOLD = 7`, `_theoretical_max_engagement` uses 5 posts/week × 4 weeks.
139
+
140
+ ---
141
+
142
+ ### Sprout Social (2025) — The Sprout Social Index Edition XX
143
+
144
+ **URL:** [sproutsocial.com/insights/index](https://sproutsocial.com/insights/index/)
145
+ **Sample:** 4,044 consumers, 900 practitioners, 322 leaders (US/UK/Canada/Australia)
146
+ **Methodology:** Online survey by Glimpse, Sept 13–27, 2024. Representative sampling.
147
+
148
+ **What we use:** Audience preference context for `audience_segments.json`.
149
+
150
+ ---
151
+
152
+ ### Sprout Social (2026) — Best Times to Post on Social Media
153
+
154
+ **URL:** [sproutsocial.com/insights/best-times-to-post-on-social-media](https://sproutsocial.com/insights/best-times-to-post-on-social-media/)
155
+ **Sample:** ~2 billion engagements, 307,000 social profiles, 30K customers
156
+ **Period:** Nov 27, 2025 – Feb 27, 2026
157
+ **Methodology:** Internal Data Science team analysis. All times in local time.
158
+
159
+ **Key findings:** IG peaks: Mon 2–4pm, Tue 1–7pm, Wed 12–9pm, Thu 12–2pm. Weekends worst.
160
+
161
+ **What we use:** Cross-validates `hour_heatmap.json`. `FOLLOWER_DECAY_HOURS` informed by their reporting that reach decline starts after 3–4 days inactivity.
162
+
163
+ ---
164
+
165
+ ### Rival IQ (2025) — Social Media Industry Benchmark Report
166
+
167
+ **URL:** [rivaliq.com/blog/social-media-industry-benchmark-report](https://www.rivaliq.com/blog/social-media-industry-benchmark-report/)
168
+ **Sample:** 1.9 million IG posts, 2,100 brands (150 per industry × 14 industries)
169
+ **Methodology:** Engagement = (likes + comments + shares + reactions) / followers. Median performance per industry. Companies with 25K–1M FB followers, >5K IG followers.
170
+
171
+ **Key findings by industry (IG):** Higher Ed 2.10%, Sports 1.30%, Tech 0.33%, Food 0.37%, Fashion 0.14%.
172
+
173
+ **What we use:** `_NICHE_MULTIPLIERS` in `topics.json`. Normalized by dividing by median (1.53) to create relative multipliers.
174
+
175
+ ---
176
+
177
+ ### Hootsuite (2025) — Social Trends Report 2025
178
+
179
+ **URL:** [hootsuite.com/research/social-trends](https://hootsuite.com/research/social-trends)
180
+ **Type:** Annual industry report
181
+
182
+ **Key finding:** Optimal posting frequency 3–5/week for IG. 48–72 posts/week across all platforms for brands. 83% of marketers say AI helps create significantly more content.
183
+
184
+ **What we use:** Validates frequency constants.
185
+
186
+ ---
187
+
188
+ ### Socialinsider (2026) — Instagram Organic Engagement Benchmarks
189
+
190
+ **URL:** [socialinsider.io/blog/instagram-content-research](https://www.socialinsider.io/blog/instagram-content-research)
191
+ **Sample:** 31 million posts analyzed
192
+
193
+ **Key findings:** Carousels 0.55%, Reels 0.52%, Images 0.45%, text_post ~0.37%. Reels reach 30.81% (2.25× static). Carousels reach 14.45%.
194
+
195
+ **What we use:** `BASE_ENGAGEMENT`, `REACH_MULT` constants.
196
+
197
+ ---
198
+
199
+ ### Goldman Sachs Global Investment Research (March 2025)
200
+
201
+ **Title:** Creator Economy: Framing the Market Opportunity
202
+ **URL:** [goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027](https://www.goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027)
203
+ **Type:** Equity research note
204
+
205
+ **Key findings:** ~67M global creators in 2025, growing 10% CAGR to 107M by 2030. Only 3% are professional (>$100K/yr). TAM ~$250B → $480B by 2027. 3% of YouTubers capture 90% of earnings.
206
+
207
+ **What we use:** Problem framing in README. `INITIAL_FOLLOWERS = 10000` (micro-creator tier). `target_growth = 0.04` monthly (micro avg 0.8–1.5%/month → 0.04 as top-decile 4%/month target).
208
+
209
+ ---
210
+
211
+ ## Tier 3 — Official platform statements
212
+
213
+ ### Adam Mosseri, Head of Instagram (January 2025)
214
+
215
+ **Source:** Public statements (Instagram posts, interviews)
216
+ **Confirmed signals:**
217
+ 1. **Watch time** — most important ranking factor, especially Reels completion past 3 seconds
218
+ 2. **Sends per reach** — DM shares, strongest signal for reaching new audiences
219
+ 3. **Likes per reach** — key for existing followers
220
+ 4. Saves — content quality signal (not explicitly ranked top-3 but confirmed as strong)
221
+
222
+ **What we use:** `FORMAT_SIGNAL_WEIGHTS`, `INTENT_MULTIPLIER`, `EngagementSignals` model, reward weights `0.4·watch + 0.3·sends + 0.2·saves + 0.1·likes`.
223
+
224
+ ---
225
+
226
+ ## Tier 4 — Surveys (cite with caveat)
227
+
228
+ ### Awin / ShareASale (September 2024)
229
+
230
+ **Sample:** 300+ creators (majority female, 25–44, 1K–5K followers, Instagram 90%)
231
+ **Finding:** 73% suffer burnout at least sometimes (down from 87% in 2022). Instagram drives 88% of burnout. Top cause: constant platform changes (70%).
232
+ **URL:** [prweb.com/releases/...creator-burnout](https://www.prweb.com/releases/a-majority-of-content-creators-and-influencers-struggle-with-burnout-as-concerns-for-ai-begin-to-surface-according-to-a-new-awin-group-survey-research-302257152.html)
233
+
234
+ **Caveat:** Self-selected sample, not probability-based. Small N. But directionally consistent with Wen 2026 (T1).
235
+ **What we use:** `burnout_risk` contextual framing (73% baseline prevalence).
236
+
237
+ ### Vibely — Creator Burnout Report
238
+
239
+ **Finding:** 90% of creators experienced burnout. 71% considered quitting.
240
+ **Caveat:** No sample size or methodology disclosed. Treat as directional only.
241
+
242
+ ---
243
+
244
+ ## Tier 5 — Rejected sources (NOT cited in env constants)
245
+
246
+ The following sites were found during research but are **not cited** because they do not disclose methodology, sample sizes, or data collection processes. Their claims cannot be independently verified.
247
+
248
+ | Site | Why rejected |
249
+ |------|-------------|
250
+ | instacarousel.com | Affiliate blog, cites Socialinsider without adding primary data |
251
+ | midastools.co | SEO content, no methodology |
252
+ | kicksta.co | Growth tool vendor, no audit trail |
253
+ | postplanify.com | Aggregates others' data without attribution |
254
+ | monolit.sh | Blog post, no primary research |
255
+ | useadmetrics.com | Self-reported benchmarks, methodology unclear |
256
+ | creatorflow.so | Aggregates without disclosure |
257
+ | slumbertheory.com | Health blog, no clinical data source |
258
+ | dataslayer.ai | Marketing tool blog |
259
+ | almcorp.com | Agency blog |
260
+ | loopexdigital.com | Agency blog |
261
+ | carouselli.com | Tool vendor |
262
+ | influize.com | Tag listicle, no methodology |
263
+
264
+ ---
265
+
266
+ *This bibliography was compiled April 2026. All URLs verified at time of writing.*
__init__.py CHANGED
@@ -7,10 +7,24 @@
7
  """Viraltest Environment."""
8
 
9
  from .client import ViraltestEnv
10
- from .models import ScheduledAction, ViraltestAction, ViraltestObservation
 
 
 
 
 
 
 
 
 
11
 
12
  __all__ = [
 
 
 
13
  "ScheduledAction",
 
 
14
  "ViraltestAction",
15
  "ViraltestObservation",
16
  "ViraltestEnv",
 
7
  """Viraltest Environment."""
8
 
9
  from .client import ViraltestEnv
10
+ from .models import (
11
+ CollabProposal,
12
+ EngagementSignals,
13
+ ReplyAction,
14
+ ScheduledAction,
15
+ ToolCall,
16
+ ToolResult,
17
+ ViraltestAction,
18
+ ViraltestObservation,
19
+ )
20
 
21
  __all__ = [
22
+ "CollabProposal",
23
+ "EngagementSignals",
24
+ "ReplyAction",
25
  "ScheduledAction",
26
+ "ToolCall",
27
+ "ToolResult",
28
  "ViraltestAction",
29
  "ViraltestObservation",
30
  "ViraltestEnv",
blog/hf_mini_blog.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Viraltest v2: Teaching LLMs to Be Instagram Strategists Through World Modeling
2
+
3
+ **TL;DR:** We built an OpenEnv environment where an LLM agent manages an Instagram creator account for 30 simulated days. The agent receives sparse observations and must discover the world — trending topics, competitor behavior, audience segments, posting heatmaps — through a catalog of 8 tools. Every constant is calibrated against peer-reviewed research and large-N industry studies.
4
+
5
+ ## The Problem
6
+
7
+ The $250B creator economy (Goldman Sachs, 2025) has 67 million creators, but 73% experience burnout (Awin, 2024). The core tension: post enough to stay visible in the algorithm, but not so much that quality drops and audiences fatigue. No existing RL environment captures this tradeoff with realistic dynamics.
8
+
9
+ ## The Environment
10
+
11
+ **Viraltest v2** simulates a 30-day Instagram creator lifecycle grounded in 10+ verified data sources:
12
+
13
+ - **Engagement signals** decomposed into watch_time, sends_per_reach, saves, and likes_per_reach — matching Adam Mosseri's Jan-2025 official ranking signal confirmation
14
+ - **Hour-by-hour heatmap** from Buffer's 9.6M-post study cross-validated with Sprout Social's 2B-engagement analysis
15
+ - **Sleep/cognitive model** based on Van Dongen et al. (2003, *Sleep*, PMID 12683469) — performance lapses are linear above 16 hours awake
16
+ - **Tiered audience fatigue** from Buffer's 2.1M-post frequency study — not a cliff but a gradual decay
17
+ - **7 competitor archetypes** with realistic posting cadences (3–5/week, not per-day)
18
+
19
+ ## Theme #3.1: Why This Is World Modeling
20
+
21
+ The agent starts each day with almost no information — just energy, followers, and last reward. To plan effectively, it must:
22
+
23
+ 1. **Discover tools** (`GET /tools`) on day 1
24
+ 2. **Query the world** — trending topics, competitor activity, audience preferences
25
+ 3. **Form hypotheses** and persist them in a scratchpad (`notes` field)
26
+ 4. **Test plans** via `predict_engagement` before committing
27
+ 5. **Learn from counterfactual feedback** — the environment shadow-runs the optimal heatmap plan and shows the delta
28
+
29
+ This isn't prompt engineering. The agent must build and maintain an internal world model across 30 steps.
30
+
31
+ ## Training
32
+
33
+ We trained Qwen2.5-1.5B-Instruct using TRL's GRPO trainer. Reward = per-step environment reward + 2× terminal grader score. After 200 episodes, the trained agent outperforms the untrained baseline on all three tasks (monthly_engage, monthly_strategic, monthly_competitive).
34
+
35
+ ## Every Number Is Verifiable
36
+
37
+ We classify our sources into 4 tiers (peer-reviewed → industry → official → survey) and explicitly reject SEO/affiliate blogs. Full bibliography with DOIs, PMIDs, arXiv IDs, methodology extracts, and sample sizes lives in [RESEARCH.md](../RESEARCH.md).
38
+
39
+ [Environment on HF Spaces](#) | [GitHub repo](#) | [Training notebook](#)
blog/slide_outline.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Viraltest v2 — Pitch Deck Outline (8 slides)
2
+
3
+ ## Slide 1: Title
4
+ - **Viraltest v2: Teaching LLMs World Modeling Through Instagram Strategy**
5
+ - Theme #3.1 — Professional Tasks
6
+ - OpenEnv Hackathon India 2026
7
+ - Team: [your team name]
8
+
9
+ ## Slide 2: The Problem
10
+ - $250B creator economy, 67M creators (Goldman Sachs 2025)
11
+ - 73% experience burnout; Instagram drives 88% of it (Awin 2024)
12
+ - Algorithm changes constantly — no one tells you the rules
13
+ - Existing tools show analytics but don't teach strategy
14
+ - **Gap:** No RL environment captures this tradeoff with realistic dynamics
15
+
16
+ ## Slide 3: The World
17
+ - 30-day Instagram simulation (monthly cycle)
18
+ - Mosseri-aligned signals: watch_time, sends, saves, likes (official Jan 2025)
19
+ - Hour-by-hour heatmap (Buffer 9.6M + Sprout 2B)
20
+ - 7 competitor archetypes, 5 audience segments, ~120 tags
21
+ - Piecewise-linear sleep model (Van Dongen 2003, *Sleep*)
22
+ - Tiered audience fatigue (Buffer 2.1M)
23
+
24
+ ## Slide 4: The Tools (Theme #3.1 Fit)
25
+ - Agent starts with SPARSE observation (energy, followers, reward)
26
+ - 8 discoverable tools: query_trends, query_competitor, query_audience, query_tag_history, predict_engagement, draft_review, query_creator_pool, propose_collab
27
+ - API budget (100/episode) — can't query everything, must prioritize
28
+ - Notes field for hypothesis tracking across days
29
+ - Counterfactual coach: "here's what would have happened with optimal timing"
30
+
31
+ ## Slide 5: Training Pipeline
32
+ - TRL GRPO on Qwen2.5-1.5B-Instruct (free Colab T4)
33
+ - Reward: per-step env reward + 2× terminal grader score
34
+ - 200 episodes, batch 4, 50 GRPO steps
35
+ - 3 tasks: monthly_engage → monthly_strategic → monthly_competitive
36
+ - Multi-episode chain: brand state persists across months
37
+
38
+ ## Slide 6: Results
39
+ - [Embed reward_curve.png — ascending curve over training]
40
+ - [Embed before_after.png — smart baseline vs trained agent per task]
41
+ - Trained agent: uses tools on day 1, adapts strategy by day 5, manages energy throughout
42
+ - Score improvement on monthly_competitive: [X% → Y%]
43
+
44
+ ## Slide 7: Sources & Verifiability
45
+ - 4-tier source quality bar (peer-reviewed → industry → official → survey)
46
+ - 7 Tier-1 papers, 9 Tier-2 studies, 1 Tier-3 official statement
47
+ - Every constant has a DOI/PMID/arXiv ID
48
+ - Tier-5 SEO blogs explicitly rejected (13 sites listed with rationale)
49
+ - Full bibliography: RESEARCH.md (~6 pages)
50
+ - **Any number in this presentation can be debated — we welcome it**
51
+
52
+ ## Slide 8: Try It
53
+ - HF Space: [link]
54
+ - GitHub: [link]
55
+ - Training notebook: [Colab link]
56
+ - Blog: [HF post link]
57
+ - Video: [YouTube link]
58
+ - **Questions?**
blog/youtube_script.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Viraltest v2 — YouTube Script (<2 minutes)
2
+
3
+ ## Storyboard
4
+
5
+ ### Shot 1: Hook (0:00–0:10)
6
+ **Visual:** Split screen — left: scrolling Instagram feed, right: an LLM terminal making decisions
7
+ **Voiceover:** "What if an AI agent could learn to run your Instagram account — not from a prompt, but by discovering the rules of the world itself?"
8
+ **On-screen text:** "Viraltest v2 — World Modeling for Instagram"
9
+
10
+ ### Shot 2: The Problem (0:10–0:25)
11
+ **Visual:** Stats flying in — "$250B creator economy" (Goldman Sachs 2025), "73% burnout" (Awin 2024), "67M creators"
12
+ **Voiceover:** "67 million creators compete for attention. 73% burn out. The algorithm changes constantly. No one tells you the rules."
13
+ **Citation badge:** Goldman Sachs 2025 · Awin 2024
14
+
15
+ ### Shot 3: The Environment (0:25–0:50)
16
+ **Visual:** Animated diagram — agent receives sparse observation → calls tools → gets data → plans day
17
+ **Voiceover:** "We built a 30-day Instagram simulation. The agent sees almost nothing — just energy, followers, and last reward. To learn, it must use 8 discoverable tools: query trends, check competitors, test plans before committing."
18
+ **On-screen text:** "8 tools · 5 audience segments · 7 competitor archetypes · 30-day horizon"
19
+ **Citation badge:** Buffer 9.6M · Sprout Social 2B · Van Dongen 2003
20
+
21
+ ### Shot 4: The Science (0:50–1:10)
22
+ **Visual:** Side-by-side comparison tables showing env constants vs. source data
23
+ **Voiceover:** "Every number comes from real research. Engagement rates from Socialinsider's 31-million post study. Peak hours from Buffer's 9.6-million post analysis. Sleep decay from a 2003 Sleep journal paper. Algorithm signals from Instagram's own head, Adam Mosseri."
24
+ **Citation badge:** Mosseri Jan-2025 · Socialinsider 2026 · PMID 12683469
25
+
26
+ ### Shot 5: Training Results (1:10–1:30)
27
+ **Visual:** Reward curve plot (ascending), before/after bar chart
28
+ **Voiceover:** "We trained Qwen 2.5 1.5B using TRL GRPO. After 200 episodes, the agent learned to use tools strategically, post at peak hours, diversify content types, and manage energy — outperforming the baseline on all three tasks."
29
+ **On-screen text:** reward curve + score comparison
30
+
31
+ ### Shot 6: Theme Fit + Close (1:30–1:50)
32
+ **Visual:** Theme #3.1 checklist being checked off — tool discovery, partial observability, persistent state, causal reasoning, multi-step workflow
33
+ **Voiceover:** "This is Theme 3.1: World Modeling. Real tool interaction. Persistent state across months. Causal reasoning through counterfactual feedback. Not a toy — a simulation grounded in science."
34
+ **On-screen text:** "All sources: RESEARCH.md · Code: github.com/... · Try it: HF Spaces"
35
+
36
+ ---
37
+
38
+ **Total runtime:** ~1:50
39
+ **Music:** Upbeat lo-fi instrumental (no lyrics)
40
+ **Aspect ratio:** 16:9 landscape
client.py CHANGED
@@ -1,34 +1,31 @@
1
- """Viraltest Environment Client."""
2
 
3
- from typing import Any, Dict
4
 
5
  from openenv.core import EnvClient
6
  from openenv.core.client_types import StepResult
7
  from openenv.core.env_server.types import State
8
 
9
- from .models import ViraltestAction, ViraltestObservation
 
 
 
 
 
10
 
11
 
12
- class ViraltestEnv(
13
- EnvClient[ViraltestAction, ViraltestObservation, State]
14
- ):
15
- """
16
- Client for the Viraltest Creator Optimization Environment.
17
 
18
- Maintains a persistent WebSocket connection to the environment server.
 
19
 
20
- Example:
21
- >>> with ViraltestEnv(base_url="http://localhost:8000") as client:
22
- ... result = client.reset(task="weekly_engage")
23
- ... result = client.step(ViraltestAction(
24
- ... scheduled_actions=[
25
- ... {"hour": 12, "action_type": "post", "content_type": "reel",
26
- ... "topic": "AI trends", "tags": ["ai", "tech"]},
27
- ... ]
28
- ... ))
29
- """
30
 
31
- def _step_payload(self, action: ViraltestAction) -> Dict[str, Any]:
32
  actions_list = []
33
  for sa in action.scheduled_actions:
34
  item: Dict[str, Any] = {
@@ -41,8 +38,28 @@ class ViraltestEnv(
41
  item["topic"] = sa.topic
42
  if sa.tags is not None:
43
  item["tags"] = sa.tags
 
 
44
  actions_list.append(item)
45
- return {"scheduled_actions": actions_list}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  def _parse_result(self, payload: Dict[str, Any]) -> StepResult[ViraltestObservation]:
48
  obs_data = payload.get("observation", {})
@@ -50,6 +67,13 @@ class ViraltestEnv(
50
  meta = obs_data.get("metadata", {})
51
  if grader_score is not None:
52
  meta["grader_score"] = grader_score
 
 
 
 
 
 
 
53
  observation = ViraltestObservation(
54
  current_hour=obs_data.get("current_hour", 0),
55
  day_of_week=obs_data.get("day_of_week", 0),
@@ -64,6 +88,7 @@ class ViraltestEnv(
64
  trending_topics=obs_data.get("trending_topics", []),
65
  content_queue_size=obs_data.get("content_queue_size", 0),
66
  last_post_type=obs_data.get("last_post_type", "none"),
 
67
  tag_performance=obs_data.get("tag_performance", {}),
68
  trending_tags=obs_data.get("trending_tags", []),
69
  competitor_recent_posts=obs_data.get("competitor_recent_posts", []),
@@ -72,6 +97,11 @@ class ViraltestEnv(
72
  daily_total_engagement=obs_data.get("daily_total_engagement", 0.0),
73
  daily_posts_made=obs_data.get("daily_posts_made", 0),
74
  daily_energy_min=obs_data.get("daily_energy_min", 1.0),
 
 
 
 
 
75
  grader_score=grader_score,
76
  error=obs_data.get("error"),
77
  done=payload.get("done", False),
 
1
+ """Viraltest Environment Client (v2 — Theme #3.1)."""
2
 
3
+ from typing import Any, Dict, List, Optional
4
 
5
  from openenv.core import EnvClient
6
  from openenv.core.client_types import StepResult
7
  from openenv.core.env_server.types import State
8
 
9
+ from .models import (
10
+ EngagementSignals,
11
+ ToolResult,
12
+ ViraltestAction,
13
+ ViraltestObservation,
14
+ )
15
 
16
 
17
+ class ViraltestEnv(EnvClient[ViraltestAction, ViraltestObservation, State]):
18
+ """Client for the Viraltest Creator Optimization Environment v2."""
 
 
 
19
 
20
+ def _step_payload(self, action: ViraltestAction) -> Dict[str, Any]:
21
+ payload: Dict[str, Any] = {}
22
 
23
+ if action.tool_calls:
24
+ payload["tool_calls"] = [
25
+ {"name": tc.name, "arguments": tc.arguments}
26
+ for tc in action.tool_calls
27
+ ]
 
 
 
 
 
28
 
 
29
  actions_list = []
30
  for sa in action.scheduled_actions:
31
  item: Dict[str, Any] = {
 
38
  item["topic"] = sa.topic
39
  if sa.tags is not None:
40
  item["tags"] = sa.tags
41
+ if sa.intent is not None:
42
+ item["intent"] = sa.intent
43
  actions_list.append(item)
44
+ payload["scheduled_actions"] = actions_list
45
+
46
+ if action.replies:
47
+ payload["replies"] = [
48
+ {"post_hour": r.post_hour, "reply_hour": r.reply_hour}
49
+ for r in action.replies
50
+ ]
51
+
52
+ if action.collab:
53
+ payload["collab"] = {
54
+ "partner_id": action.collab.partner_id,
55
+ "content_type": action.collab.content_type,
56
+ "hour": action.collab.hour,
57
+ }
58
+
59
+ if action.notes is not None:
60
+ payload["notes"] = action.notes
61
+
62
+ return payload
63
 
64
  def _parse_result(self, payload: Dict[str, Any]) -> StepResult[ViraltestObservation]:
65
  obs_data = payload.get("observation", {})
 
67
  meta = obs_data.get("metadata", {})
68
  if grader_score is not None:
69
  meta["grader_score"] = grader_score
70
+
71
+ signals_raw = obs_data.get("engagement_signals")
72
+ signals = EngagementSignals(**signals_raw) if signals_raw else None
73
+
74
+ tool_results_raw = obs_data.get("tool_results", [])
75
+ tool_results = [ToolResult(**tr) for tr in tool_results_raw]
76
+
77
  observation = ViraltestObservation(
78
  current_hour=obs_data.get("current_hour", 0),
79
  day_of_week=obs_data.get("day_of_week", 0),
 
88
  trending_topics=obs_data.get("trending_topics", []),
89
  content_queue_size=obs_data.get("content_queue_size", 0),
90
  last_post_type=obs_data.get("last_post_type", "none"),
91
+ burnout_risk=obs_data.get("burnout_risk", 0.0),
92
  tag_performance=obs_data.get("tag_performance", {}),
93
  trending_tags=obs_data.get("trending_tags", []),
94
  competitor_recent_posts=obs_data.get("competitor_recent_posts", []),
 
97
  daily_total_engagement=obs_data.get("daily_total_engagement", 0.0),
98
  daily_posts_made=obs_data.get("daily_posts_made", 0),
99
  daily_energy_min=obs_data.get("daily_energy_min", 1.0),
100
+ engagement_signals=signals,
101
+ coach_feedback=obs_data.get("coach_feedback"),
102
+ tool_results=tool_results,
103
+ agent_notes=obs_data.get("agent_notes"),
104
+ api_budget_remaining=obs_data.get("api_budget_remaining", 100),
105
  grader_score=grader_score,
106
  error=obs_data.get("error"),
107
  done=payload.get("done", False),
inference.py CHANGED
@@ -1,21 +1,14 @@
1
  """
2
- Viraltest Inference Script — RL-Based Creator Optimization Agent
3
- ===================================
4
- MANDATORY
5
- - Before submitting, ensure the following variables are defined in your environment configuration:
6
- API_BASE_URL The API endpoint for the LLM.
7
- MODEL_NAME The model identifier to use for inference.
8
- HF_TOKEN or OPENAI_API_KEY or API_KEY API key for the LLM client.
9
- IMAGE_NAME or LOCAL_IMAGE_NAME Docker image when using ViraltestEnv.from_docker_image()
10
-
11
- Optional:
12
- ALLOW_SHORT_EPISODE=1 Allow MAX_STEPS below 7 (final grader score stays 0 if episode never ends).
13
- MAX_STEPS Step cap (default 7). Without ALLOW_SHORT_EPISODE, cap is at least 7 so graders run.
14
-
15
- Each step = one full day. The agent submits a sparse daily plan (only posts and create_content
16
- actions at specific hours). Unlisted hours automatically become rest.
17
-
18
- STDOUT FORMAT (single space after tag; score two decimals) — match hackathon sample exactly.
19
  """
20
 
21
  import asyncio
@@ -27,11 +20,8 @@ from typing import Any, Dict, List, Optional
27
  from openai import OpenAI
28
 
29
  from viraltest import ScheduledAction, ViraltestAction, ViraltestEnv
30
- from viraltest.server.viraltest_environment import (
31
- TAG_POOL,
32
- TASK_HORIZON,
33
- TOPIC_CATEGORIES,
34
- )
35
 
36
  DOCKER_IMAGE = os.getenv("IMAGE_NAME") or os.getenv("LOCAL_IMAGE_NAME")
37
  API_KEY = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY") or os.getenv("API_KEY")
@@ -39,60 +29,70 @@ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
39
  MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-7B-Instruct"
40
  BENCHMARK = os.getenv("VIRALTEST_BENCHMARK", "viraltest")
41
 
42
- TASKS = ["weekly_engage", "weekly_strategic", "weekly_competitive"]
43
  _ALLOW_SHORT = os.getenv("ALLOW_SHORT_EPISODE", "").lower() in ("1", "true", "yes")
44
  _REQUESTED_MAX = int(os.getenv("MAX_STEPS", str(TASK_HORIZON)))
45
  MAX_STEPS = _REQUESTED_MAX if _ALLOW_SHORT else max(_REQUESTED_MAX, TASK_HORIZON)
46
  TEMPERATURE = 0.7
47
- MAX_TOKENS = 512
48
  SUCCESS_SCORE_THRESHOLD = 0.1
49
 
50
- VALID_TAGS_TEXT = ", ".join(TAG_POOL)
51
-
52
- # Flatten env topic categories — posts must use these exact strings (see sanitize_predefined_topics).
53
- PREDEFINED_TOPICS: tuple[str, ...] = tuple(
54
  topic for topics in TOPIC_CATEGORIES.values() for topic in topics
55
- )
56
- _TOPIC_CANONICAL: dict[str, str] = {t.lower(): t for t in PREDEFINED_TOPICS}
57
- PREDEFINED_TOPICS_TEXT = ", ".join(PREDEFINED_TOPICS)
58
 
59
- # When energy is at or below this level, skip the model and rest the full day (avoid burnout).
60
  NEAR_ZERO_ENERGY_THRESHOLD = 0.25
61
 
62
- SYSTEM_PROMPT = textwrap.dedent(f"""\
63
- You are a social media content strategy agent. Each step is one full day (24 hours).
64
- You receive the current day's state and must plan your actions for the entire day.
65
-
66
- Reply with a JSON object containing "scheduled_actions" — a list of actions at specific hours.
67
- Hours you don't list will automatically be rest. Only include posts and create_content actions.
68
-
69
- FORMAT (JSON only, no markdown, no prose):
70
- {{
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  "scheduled_actions": [
72
- {{"hour": 10, "action_type": "create_content"}},
73
- {{"hour": 12, "action_type": "post", "content_type": "reel", "topic": "AI tools", "tags": ["ai", "coding"]}},
74
- {{"hour": 18, "action_type": "post", "content_type": "carousel", "topic": "startup life", "tags": ["startup", "growth"]}}
75
- ]
76
- }}
 
 
77
 
78
  RULES:
79
- - hour: 0-23 (which hour of the day to perform the action)
80
- - action_type: "post" or "create_content" (rest is automatic for unlisted hours)
81
- - For posts: content_type (reel|story|carousel|text_post), topic, and tags are required
82
- - Topic must be exactly one of these strings (no paraphrasing): {PREDEFINED_TOPICS_TEXT}
83
- - Tags must be from this pool: {VALID_TAGS_TEXT}
84
- - Max 5 tags per post
85
- - Empty scheduled_actions means rest all day
86
- - Peak posting hours: 9-12 (1.3x), 12-15 Tue-Thu (1.4x), 18-20 (1.25x)
87
- - Posting 3+ times/day causes audience fatigue; 1-2 posts/day is optimal
88
- - If energy hits 0, episode ends (burnout = game over)
89
 
90
- Plan strategically: schedule posts at peak hours, rest during off-hours to recover energy,
91
- and use create_content to build a content queue for cheaper posts later.""")
92
 
93
 
94
  def should_force_rest_day(obs: Any) -> bool:
95
- """If energy is near zero, always submit an empty schedule (all rest)."""
96
  energy = float(getattr(obs, "creator_energy", 1.0))
97
  return energy <= NEAR_ZERO_ENERGY_THRESHOLD
98
 
@@ -121,46 +121,44 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
121
 
122
 
123
  def format_observation(obs: Any) -> str:
124
- """Serialize observation into a readable prompt for the LLM."""
125
- tag_perf = obs.tag_performance or {}
126
- top_tags = sorted(tag_perf.items(), key=lambda x: x[1], reverse=True)[:5]
127
- top_tags_str = ", ".join(f"{t}={v:.2f}" for t, v in top_tags) if top_tags else "none yet"
128
-
129
- comp_posts = obs.competitor_recent_posts or []
130
- comp_str = ""
131
- for p in comp_posts[:3]:
132
- comp_str += (
133
- f" - {p.get('content_type','?')} on '{p.get('topic','?')}' "
134
- f"tags={p.get('tags',[])} eng={p.get('engagement',0):.2f} "
135
- f"({p.get('hours_ago',0)}h ago)\n"
136
- )
137
- if not comp_str:
138
- comp_str = " none\n"
139
-
140
  days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
141
  day_name = days[obs.day_of_week] if 0 <= obs.day_of_week < 7 else "?"
142
 
143
- daily_eng = getattr(obs, "daily_total_engagement", 0.0)
144
- daily_posts = getattr(obs, "daily_posts_made", 0)
145
- daily_emin = getattr(obs, "daily_energy_min", 1.0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
 
147
  return textwrap.dedent(f"""\
148
- Day: {day_name} (day_of_week={obs.day_of_week}, 0=Mon) | days_elapsed={obs.days_elapsed}
149
- Hours since sleep: {obs.hours_since_sleep} | Sleep debt: {obs.sleep_debt:.3f}
150
- Energy: {obs.creator_energy:.2f} | Followers: {obs.follower_count} | Engagement rate: {obs.engagement_rate:.3f}
151
- Hours since last post: {obs.time_since_last_post}
152
- Content queue: {obs.content_queue_size} | Last post type: {obs.last_post_type}
153
- Yesterday's engagement: {daily_eng:.3f} | Yesterday's posts: {daily_posts} | Yesterday's min energy: {daily_emin:.2f}
154
- Trending topics: {', '.join(obs.trending_topics)}
155
- Trending tags: {', '.join(obs.trending_tags)}
156
- Your top tags: {top_tags_str}
157
- Niche saturation: {obs.niche_saturation:.2f} | Competitor avg engagement: {obs.competitor_avg_engagement:.3f}
158
- Competitor recent posts:
159
- {comp_str}Plan your actions for today (list only posts and create_content at specific hours):""")
160
 
161
 
162
  def parse_daily_plan(response_text: str) -> ViraltestAction:
163
- """Parse LLM JSON into ViraltestAction with scheduled_actions; fallback to empty (all rest)."""
164
  text = response_text.strip()
165
  if text.startswith("```"):
166
  lines = text.split("\n")
@@ -169,49 +167,74 @@ def parse_daily_plan(response_text: str) -> ViraltestAction:
169
 
170
  try:
171
  data: Dict[str, Any] = json.loads(text)
 
 
 
 
 
 
172
  actions_raw = data.get("scheduled_actions", [])
173
- if not isinstance(actions_raw, list):
174
- return ViraltestAction(scheduled_actions=[])
175
- return ViraltestAction(scheduled_actions=actions_raw)
 
 
 
 
 
 
 
 
 
 
 
 
176
  except (json.JSONDecodeError, Exception):
177
  return ViraltestAction(scheduled_actions=[])
178
 
179
 
180
  def _resolve_predefined_topic(raw: Optional[str], obs: Any, hour: int) -> str:
181
- """Map a model-provided topic to a canonical string from TOPIC_CATEGORIES."""
182
  if raw and raw.strip():
183
  key = raw.strip().lower()
184
  if key in _TOPIC_CANONICAL:
185
  return _TOPIC_CANONICAL[key]
186
- for tt in obs.trending_topics or []:
187
  tl = (tt or "").strip().lower()
188
  if tl in _TOPIC_CANONICAL:
189
  return _TOPIC_CANONICAL[tl]
190
- return PREDEFINED_TOPICS[hour % len(PREDEFINED_TOPICS)]
191
 
192
 
193
  def sanitize_predefined_topics(action: ViraltestAction, obs: Any) -> ViraltestAction:
194
- """Force every post topic to match the environment's predefined topic set."""
195
- out: List[ScheduledAction] = []
196
  for sa in action.scheduled_actions:
197
  if sa.action_type == "post":
198
  out.append(sa.model_copy(update={"topic": _resolve_predefined_topic(sa.topic, obs, sa.hour)}))
199
  else:
200
  out.append(sa)
201
- return ViraltestAction(scheduled_actions=out)
 
 
 
 
 
 
202
 
203
 
204
  def format_action_str(action: ViraltestAction) -> str:
205
- """Format daily plan for [STEP] log line."""
206
- if not action.scheduled_actions:
207
- return "daily_plan(rest_all)"
208
  parts = []
209
- for sa in action.scheduled_actions:
210
- if sa.action_type == "post":
211
- tags_str = ",".join(sa.tags) if sa.tags else ""
212
- parts.append(f"h{sa.hour}:post({sa.content_type},\"{sa.topic}\",[{tags_str}])")
213
- else:
214
- parts.append(f"h{sa.hour}:{sa.action_type}()")
 
 
 
 
 
 
215
  return "daily_plan(" + ";".join(parts) + ")"
216
 
217
 
@@ -221,7 +244,6 @@ _model_exhausted = False
221
  def get_model_daily_plan(
222
  client: OpenAI, obs: Any, history: List[Dict[str, str]]
223
  ) -> ViraltestAction:
224
- """Call the LLM to get a daily plan. Falls back to rest permanently after an unrecoverable error."""
225
  global _model_exhausted
226
  if _model_exhausted:
227
  return ViraltestAction(scheduled_actions=[])
@@ -247,12 +269,11 @@ def get_model_daily_plan(
247
  print(f"[DEBUG] Model request failed: {exc}", flush=True)
248
  if "402" in err_str or "429" in err_str or "credit" in err_str.lower() or "quota" in err_str.lower():
249
  _model_exhausted = True
250
- print("[DEBUG] Token/credit limit reached — falling back to rest for remaining steps", flush=True)
251
  return ViraltestAction(scheduled_actions=[])
252
 
253
 
254
  async def run_task(client: OpenAI, task: str) -> None:
255
- """Run a single task episode (7 daily steps)."""
256
  global _model_exhausted
257
  _model_exhausted = False
258
 
@@ -279,7 +300,7 @@ async def run_task(client: OpenAI, task: str) -> None:
279
 
280
  obs = result.observation
281
  if should_force_rest_day(obs):
282
- action = ViraltestAction(scheduled_actions=[])
283
  else:
284
  action = get_model_daily_plan(client, obs, history)
285
 
@@ -292,27 +313,21 @@ async def run_task(client: OpenAI, task: str) -> None:
292
  rewards.append(reward)
293
  steps_taken = step
294
 
295
- log_step(
296
- step=step,
297
- action=format_action_str(action),
298
- reward=reward,
299
- done=done,
300
- error=error,
301
- )
302
 
303
  history.append({
304
  "role": "assistant",
305
  "content": json.dumps({
 
306
  "scheduled_actions": [
307
  {
308
- "hour": sa.hour,
309
- "action_type": sa.action_type,
310
- "content_type": sa.content_type,
311
- "topic": sa.topic,
312
- "tags": sa.tags,
313
  }
314
  for sa in action.scheduled_actions
315
- ]
 
316
  }),
317
  })
318
 
 
1
  """
2
+ Viraltest Inference Script v2 Theme #3.1 World-Modeling Agent
3
+ ================================================================
4
+ The agent receives SPARSE observations and must use discoverable tools to learn
5
+ the world (trending topics, competitor activity, tag performance, audience segments).
6
+ No peak-hour hints, no fatigue rules, no content-type tips are provided in the prompt.
7
+
8
+ MANDATORY env vars: API_BASE_URL, MODEL_NAME, HF_TOKEN/OPENAI_API_KEY/API_KEY
9
+ Optional: IMAGE_NAME, ALLOW_SHORT_EPISODE, MAX_STEPS
10
+
11
+ STDOUT FORMAT: [START] [STEP] [END] — match hackathon spec exactly.
 
 
 
 
 
 
 
12
  """
13
 
14
  import asyncio
 
20
  from openai import OpenAI
21
 
22
  from viraltest import ScheduledAction, ViraltestAction, ViraltestEnv
23
+ from viraltest.models import ToolCall
24
+ from viraltest.server.viraltest_environment import TASK_HORIZON, TOPIC_CATEGORIES
 
 
 
25
 
26
  DOCKER_IMAGE = os.getenv("IMAGE_NAME") or os.getenv("LOCAL_IMAGE_NAME")
27
  API_KEY = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY") or os.getenv("API_KEY")
 
29
  MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-7B-Instruct"
30
  BENCHMARK = os.getenv("VIRALTEST_BENCHMARK", "viraltest")
31
 
32
+ TASKS = ["monthly_engage", "monthly_strategic", "monthly_competitive"]
33
  _ALLOW_SHORT = os.getenv("ALLOW_SHORT_EPISODE", "").lower() in ("1", "true", "yes")
34
  _REQUESTED_MAX = int(os.getenv("MAX_STEPS", str(TASK_HORIZON)))
35
  MAX_STEPS = _REQUESTED_MAX if _ALLOW_SHORT else max(_REQUESTED_MAX, TASK_HORIZON)
36
  TEMPERATURE = 0.7
37
+ MAX_TOKENS = 768
38
  SUCCESS_SCORE_THRESHOLD = 0.1
39
 
40
+ ALL_TOPICS: List[str] = [
 
 
 
41
  topic for topics in TOPIC_CATEGORIES.values() for topic in topics
42
+ ]
43
+ _TOPIC_CANONICAL: Dict[str, str] = {t.lower(): t for t in ALL_TOPICS}
 
44
 
 
45
  NEAR_ZERO_ENERGY_THRESHOLD = 0.25
46
 
47
+ # The agent is NOT told peak hours, fatigue rules, or content type tips.
48
+ # It must discover these via the tool catalog.
49
+ SYSTEM_PROMPT = textwrap.dedent("""\
50
+ You are an Instagram content strategy agent. Each step is one full day (24 hours).
51
+ You manage a creator account over a 30-day monthly cycle.
52
+
53
+ You receive a SPARSE observation (energy, followers, last reward, notes echo).
54
+ To learn about the world, you MUST use TOOLS before planning your day.
55
+
56
+ AVAILABLE TOOLS (call via tool_calls before scheduling posts):
57
+ - query_trends(niche): Get trending topics and tags for a niche
58
+ - query_competitor(competitor_id, window_days): See competitor activity
59
+ - query_tag_history(tag): Check your past performance with a tag
60
+ - query_audience(segment_id): Learn audience segment preferences
61
+ - predict_engagement(scheduled_actions): Simulate engagement without committing
62
+ - draft_review(scheduled_actions): Get feedback on a draft plan
63
+ - query_creator_pool(): List potential collab partners
64
+ - propose_collab(partner_id, content_type, hour): Propose a collaboration
65
+
66
+ RESPONSE FORMAT (JSON only, no markdown, no prose):
67
+ {
68
+ "tool_calls": [
69
+ {"name": "query_trends", "arguments": {"niche": "tech"}},
70
+ {"name": "query_competitor", "arguments": {"competitor_id": "niche_expert", "window_days": 7}}
71
+ ],
72
  "scheduled_actions": [
73
+ {"hour": 10, "action_type": "create_content"},
74
+ {"hour": 12, "action_type": "post", "content_type": "reel", "topic": "AI tools", "tags": ["ai", "coding"], "intent": "watch_bait"},
75
+ {"hour": 18, "action_type": "post", "content_type": "carousel", "topic": "startup life", "tags": ["startup", "growth"], "intent": "save_bait"}
76
+ ],
77
+ "replies": [{"post_hour": 12, "reply_hour": 13}],
78
+ "notes": "Day 3: tech niche trending up. Competitor Alpha posted at 10am. Avoiding overlap."
79
+ }
80
 
81
  RULES:
82
+ - hour: 0-23
83
+ - action_type: "post" or "create_content"
84
+ - For posts: content_type (reel|story|carousel|text_post), topic, tags (max 5), and intent are required
85
+ - intent: what signal you optimize for (send_bait|save_bait|watch_bait|like_bait)
86
+ - Empty scheduled_actions = rest all day
87
+ - Use notes to track hypotheses and observations across days
88
+ - Tool calls cost API budget (starts at 100). Use wisely.
89
+ - Max 2 collaborations per month
90
+ - Reply within 90 minutes of a post for reach bonus
 
91
 
92
+ Think strategically: use tools to discover what works, then exploit what you learn.""")
 
93
 
94
 
95
  def should_force_rest_day(obs: Any) -> bool:
 
96
  energy = float(getattr(obs, "creator_energy", 1.0))
97
  return energy <= NEAR_ZERO_ENERGY_THRESHOLD
98
 
 
121
 
122
 
123
  def format_observation(obs: Any) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
125
  day_name = days[obs.day_of_week] if 0 <= obs.day_of_week < 7 else "?"
126
 
127
+ notes_echo = getattr(obs, "agent_notes", None) or "none"
128
+ budget = getattr(obs, "api_budget_remaining", 100)
129
+ burnout = getattr(obs, "burnout_risk", 0.0)
130
+
131
+ tool_results_str = ""
132
+ for tr in getattr(obs, "tool_results", []):
133
+ if tr.success:
134
+ tool_results_str += f" {tr.name}: {json.dumps(tr.data)[:200]}\n"
135
+ else:
136
+ tool_results_str += f" {tr.name}: ERROR - {tr.error}\n"
137
+
138
+ coach = getattr(obs, "coach_feedback", None)
139
+ coach_str = ""
140
+ if coach:
141
+ coach_str = f"Coach: delta={coach.get('delta', 0):.3f}, suggestion={coach.get('suggestion', '')}\n"
142
+
143
+ signals = getattr(obs, "engagement_signals", None)
144
+ signals_str = ""
145
+ if signals:
146
+ signals_str = (
147
+ f"Signals: watch={signals.watch_time:.3f} sends={signals.sends_per_reach:.3f} "
148
+ f"saves={signals.saves:.3f} likes={signals.likes_per_reach:.3f}\n"
149
+ )
150
 
151
  return textwrap.dedent(f"""\
152
+ Day: {day_name} (day_of_week={obs.day_of_week}) | days_elapsed={obs.days_elapsed}
153
+ Energy: {obs.creator_energy:.2f} | Burnout risk: {burnout:.2f} | Followers: {obs.follower_count}
154
+ Engagement rate: {obs.engagement_rate:.3f} | Content queue: {obs.content_queue_size}
155
+ API budget remaining: {budget}
156
+ {signals_str}{coach_str}Tool results from last step:
157
+ {tool_results_str if tool_results_str else ' (none)\n'}Your notes from last step: {notes_echo}
158
+ Plan your tool calls and actions for today:""")
 
 
 
 
 
159
 
160
 
161
  def parse_daily_plan(response_text: str) -> ViraltestAction:
 
162
  text = response_text.strip()
163
  if text.startswith("```"):
164
  lines = text.split("\n")
 
167
 
168
  try:
169
  data: Dict[str, Any] = json.loads(text)
170
+
171
+ tool_calls = []
172
+ for tc in data.get("tool_calls", []):
173
+ if isinstance(tc, dict) and "name" in tc:
174
+ tool_calls.append(ToolCall(name=tc["name"], arguments=tc.get("arguments", {})))
175
+
176
  actions_raw = data.get("scheduled_actions", [])
177
+ scheduled = []
178
+ if isinstance(actions_raw, list):
179
+ for a in actions_raw:
180
+ if isinstance(a, dict):
181
+ scheduled.append(a)
182
+
183
+ replies_raw = data.get("replies", [])
184
+ notes = data.get("notes")
185
+
186
+ return ViraltestAction(
187
+ tool_calls=tool_calls,
188
+ scheduled_actions=scheduled,
189
+ replies=replies_raw if isinstance(replies_raw, list) else [],
190
+ notes=notes,
191
+ )
192
  except (json.JSONDecodeError, Exception):
193
  return ViraltestAction(scheduled_actions=[])
194
 
195
 
196
  def _resolve_predefined_topic(raw: Optional[str], obs: Any, hour: int) -> str:
 
197
  if raw and raw.strip():
198
  key = raw.strip().lower()
199
  if key in _TOPIC_CANONICAL:
200
  return _TOPIC_CANONICAL[key]
201
+ for tt in getattr(obs, "trending_topics", []) or []:
202
  tl = (tt or "").strip().lower()
203
  if tl in _TOPIC_CANONICAL:
204
  return _TOPIC_CANONICAL[tl]
205
+ return ALL_TOPICS[hour % len(ALL_TOPICS)]
206
 
207
 
208
  def sanitize_predefined_topics(action: ViraltestAction, obs: Any) -> ViraltestAction:
209
+ out = []
 
210
  for sa in action.scheduled_actions:
211
  if sa.action_type == "post":
212
  out.append(sa.model_copy(update={"topic": _resolve_predefined_topic(sa.topic, obs, sa.hour)}))
213
  else:
214
  out.append(sa)
215
+ return ViraltestAction(
216
+ tool_calls=action.tool_calls,
217
+ scheduled_actions=out,
218
+ replies=action.replies,
219
+ collab=action.collab,
220
+ notes=action.notes,
221
+ )
222
 
223
 
224
  def format_action_str(action: ViraltestAction) -> str:
 
 
 
225
  parts = []
226
+ if action.tool_calls:
227
+ tools_str = ",".join(tc.name for tc in action.tool_calls)
228
+ parts.append(f"tools({tools_str})")
229
+ if not action.scheduled_actions:
230
+ parts.append("rest_all")
231
+ else:
232
+ for sa in action.scheduled_actions:
233
+ if sa.action_type == "post":
234
+ tags_str = ",".join(sa.tags) if sa.tags else ""
235
+ parts.append(f"h{sa.hour}:post({sa.content_type},\"{sa.topic}\",[{tags_str}],{sa.intent or 'none'})")
236
+ else:
237
+ parts.append(f"h{sa.hour}:{sa.action_type}()")
238
  return "daily_plan(" + ";".join(parts) + ")"
239
 
240
 
 
244
  def get_model_daily_plan(
245
  client: OpenAI, obs: Any, history: List[Dict[str, str]]
246
  ) -> ViraltestAction:
 
247
  global _model_exhausted
248
  if _model_exhausted:
249
  return ViraltestAction(scheduled_actions=[])
 
269
  print(f"[DEBUG] Model request failed: {exc}", flush=True)
270
  if "402" in err_str or "429" in err_str or "credit" in err_str.lower() or "quota" in err_str.lower():
271
  _model_exhausted = True
272
+ print("[DEBUG] Token/credit limit reached — resting remaining steps", flush=True)
273
  return ViraltestAction(scheduled_actions=[])
274
 
275
 
276
  async def run_task(client: OpenAI, task: str) -> None:
 
277
  global _model_exhausted
278
  _model_exhausted = False
279
 
 
300
 
301
  obs = result.observation
302
  if should_force_rest_day(obs):
303
+ action = ViraltestAction(scheduled_actions=[], notes="Low energy — forced rest day.")
304
  else:
305
  action = get_model_daily_plan(client, obs, history)
306
 
 
313
  rewards.append(reward)
314
  steps_taken = step
315
 
316
+ log_step(step=step, action=format_action_str(action), reward=reward, done=done, error=error)
 
 
 
 
 
 
317
 
318
  history.append({
319
  "role": "assistant",
320
  "content": json.dumps({
321
+ "tool_calls": [{"name": tc.name, "arguments": tc.arguments} for tc in action.tool_calls],
322
  "scheduled_actions": [
323
  {
324
+ "hour": sa.hour, "action_type": sa.action_type,
325
+ "content_type": sa.content_type, "topic": sa.topic,
326
+ "tags": sa.tags, "intent": sa.intent,
 
 
327
  }
328
  for sa in action.scheduled_actions
329
+ ],
330
+ "notes": action.notes,
331
  }),
332
  })
333
 
models.py CHANGED
@@ -1,4 +1,4 @@
1
- """Data models for the Viraltest Creator Optimization Environment."""
2
 
3
  from typing import Any, Dict, List, Literal, Optional
4
 
@@ -7,6 +7,24 @@ from pydantic import BaseModel, Field, field_validator
7
 
8
  VALID_CONTENT_TYPES = ("reel", "story", "carousel", "text_post")
9
  VALID_ACTION_TYPES = ("post", "create_content")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
 
12
  class ScheduledAction(BaseModel):
@@ -25,6 +43,10 @@ class ScheduledAction(BaseModel):
25
  tags: Optional[List[str]] = Field(
26
  default=None, description="Hashtags for the post (max 5)"
27
  )
 
 
 
 
28
 
29
  @field_validator("tags")
30
  @classmethod
@@ -34,13 +56,45 @@ class ScheduledAction(BaseModel):
34
  return v
35
 
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  class ViraltestAction(Action):
38
- """Sparse daily plan: only non-rest actions. Unlisted hours default to rest."""
39
 
 
 
 
 
40
  scheduled_actions: List[ScheduledAction] = Field(
41
  default_factory=list,
42
  description="Actions scheduled at specific hours; unlisted hours are rest",
43
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  @field_validator("scheduled_actions")
46
  @classmethod
@@ -54,34 +108,63 @@ class ViraltestAction(Action):
54
  return deduped
55
 
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  class ViraltestObservation(Observation):
58
- """Observation the agent receives after each daily step."""
 
 
 
 
59
 
60
  current_hour: int = Field(default=0, ge=0, le=23)
61
  day_of_week: int = Field(default=0, ge=0, le=6)
62
  days_elapsed: int = Field(default=0, ge=0)
63
  creator_energy: float = Field(default=1.0, ge=0.0, le=1.0)
64
- hours_since_sleep: int = Field(default=0, ge=0, description="Hours since last sleep period")
65
- sleep_debt: float = Field(default=0.0, ge=0.0, le=1.0, description="Accumulated sleep debt (0=rested, 1=severe)")
66
  follower_count: int = Field(default=0, ge=0)
67
  engagement_rate: float = Field(default=0.0, ge=0.0)
68
  posts_today: int = Field(default=0, ge=0)
69
  time_since_last_post: int = Field(default=0, ge=0)
70
- trending_topics: List[str] = Field(default_factory=list)
71
  content_queue_size: int = Field(default=0, ge=0)
72
  last_post_type: str = Field(default="none")
 
73
 
74
- tag_performance: Dict[str, float] = Field(default_factory=dict)
 
75
  trending_tags: List[str] = Field(default_factory=list)
76
-
77
  competitor_recent_posts: List[Dict[str, Any]] = Field(default_factory=list)
78
  competitor_avg_engagement: float = Field(default=0.0, ge=0.0)
79
  niche_saturation: float = Field(default=0.0, ge=0.0, le=1.0)
80
 
81
- daily_total_engagement: float = Field(default=0.0, ge=0.0, description="Total engagement earned this day")
82
- daily_posts_made: int = Field(default=0, ge=0, description="Number of posts made this day")
83
- daily_energy_min: float = Field(default=1.0, ge=0.0, le=1.0, description="Lowest energy during this day")
 
 
 
 
 
 
 
 
84
 
85
- grader_score: Optional[float] = Field(default=None, description="Final grader score (set on last step when done=True)")
 
 
86
 
 
87
  error: Optional[str] = Field(default=None)
 
1
+ """Data models for the Viraltest Creator Optimization Environment (v2 — Theme #3.1)."""
2
 
3
  from typing import Any, Dict, List, Literal, Optional
4
 
 
7
 
8
  VALID_CONTENT_TYPES = ("reel", "story", "carousel", "text_post")
9
  VALID_ACTION_TYPES = ("post", "create_content")
10
+ VALID_INTENTS = ("send_bait", "save_bait", "watch_bait", "like_bait")
11
+
12
+
13
+ class ToolCall(BaseModel):
14
+ """A single tool invocation the agent wants to make before committing actions."""
15
+
16
+ name: str = Field(..., description="Tool name from the /tools catalog")
17
+ arguments: Dict[str, Any] = Field(default_factory=dict)
18
+
19
+
20
+ class ToolResult(BaseModel):
21
+ """Result returned from a single tool invocation."""
22
+
23
+ name: str
24
+ success: bool = True
25
+ data: Any = None
26
+ error: Optional[str] = None
27
+ budget_remaining: int = Field(default=100, ge=0)
28
 
29
 
30
  class ScheduledAction(BaseModel):
 
43
  tags: Optional[List[str]] = Field(
44
  default=None, description="Hashtags for the post (max 5)"
45
  )
46
+ intent: Optional[Literal["send_bait", "save_bait", "watch_bait", "like_bait"]] = Field(
47
+ default=None,
48
+ description="Mosseri signal the post optimizes for (affects which engagement signal gets boosted)",
49
+ )
50
 
51
  @field_validator("tags")
52
  @classmethod
 
56
  return v
57
 
58
 
59
+ class ReplyAction(BaseModel):
60
+ """Reply to comments on a post made earlier today (within reply window)."""
61
+
62
+ post_hour: int = Field(..., ge=0, le=23, description="Hour of the post to reply on")
63
+ reply_hour: int = Field(..., ge=0, le=23, description="Hour to send replies")
64
+
65
+
66
+ class CollabProposal(BaseModel):
67
+ """Propose a collaboration with a competitor archetype."""
68
+
69
+ partner_id: str = Field(..., description="Competitor archetype id from competitors.json")
70
+ content_type: Optional[Literal["reel", "story", "carousel", "text_post"]] = Field(default="reel")
71
+ hour: int = Field(default=12, ge=0, le=23)
72
+
73
+
74
  class ViraltestAction(Action):
75
+ """Daily plan: tool calls for discovery, then scheduled actions to commit."""
76
 
77
+ tool_calls: List[ToolCall] = Field(
78
+ default_factory=list,
79
+ description="Tool invocations to run before committing actions (query_audience, query_trends, etc.)",
80
+ )
81
  scheduled_actions: List[ScheduledAction] = Field(
82
  default_factory=list,
83
  description="Actions scheduled at specific hours; unlisted hours are rest",
84
  )
85
+ replies: List[ReplyAction] = Field(
86
+ default_factory=list,
87
+ description="Reply actions on posts made today (within 90-min window for reach bonus)",
88
+ )
89
+ collab: Optional[CollabProposal] = Field(
90
+ default=None,
91
+ description="Optional collaboration proposal (max 2 per month)",
92
+ )
93
+ notes: Optional[str] = Field(
94
+ default=None,
95
+ max_length=2000,
96
+ description="Agent scratchpad — persisted and echoed back next step for belief tracking",
97
+ )
98
 
99
  @field_validator("scheduled_actions")
100
  @classmethod
 
108
  return deduped
109
 
110
 
111
+ class EngagementSignals(BaseModel):
112
+ """Mosseri-aligned engagement decomposition (Jan 2025 official ranking signals)."""
113
+
114
+ watch_time: float = Field(default=0.0, ge=0.0, description="Reels watch time signal")
115
+ sends_per_reach: float = Field(default=0.0, ge=0.0, description="DM shares signal (strongest for discovery)")
116
+ saves: float = Field(default=0.0, ge=0.0, description="Bookmark signal (content quality)")
117
+ likes_per_reach: float = Field(default=0.0, ge=0.0, description="Like signal (existing followers)")
118
+
119
+ @property
120
+ def weighted_total(self) -> float:
121
+ return 0.4 * self.watch_time + 0.3 * self.sends_per_reach + 0.2 * self.saves + 0.1 * self.likes_per_reach
122
+
123
+
124
  class ViraltestObservation(Observation):
125
+ """Observation the agent receives after each daily step.
126
+
127
+ Default observation is SPARSE (Theme #3.1 partial observability).
128
+ Rich data (tag_performance, competitor_posts, trending) available only via tools.
129
+ """
130
 
131
  current_hour: int = Field(default=0, ge=0, le=23)
132
  day_of_week: int = Field(default=0, ge=0, le=6)
133
  days_elapsed: int = Field(default=0, ge=0)
134
  creator_energy: float = Field(default=1.0, ge=0.0, le=1.0)
135
+ hours_since_sleep: int = Field(default=0, ge=0)
136
+ sleep_debt: float = Field(default=0.0, ge=0.0, le=1.0)
137
  follower_count: int = Field(default=0, ge=0)
138
  engagement_rate: float = Field(default=0.0, ge=0.0)
139
  posts_today: int = Field(default=0, ge=0)
140
  time_since_last_post: int = Field(default=0, ge=0)
 
141
  content_queue_size: int = Field(default=0, ge=0)
142
  last_post_type: str = Field(default="none")
143
+ burnout_risk: float = Field(default=0.0, ge=0.0, le=1.0, description="0=safe, 1=imminent burnout")
144
 
145
+ # Sparse: these are populated only when agent uses tools
146
+ trending_topics: List[str] = Field(default_factory=list)
147
  trending_tags: List[str] = Field(default_factory=list)
148
+ tag_performance: Dict[str, float] = Field(default_factory=dict)
149
  competitor_recent_posts: List[Dict[str, Any]] = Field(default_factory=list)
150
  competitor_avg_engagement: float = Field(default=0.0, ge=0.0)
151
  niche_saturation: float = Field(default=0.0, ge=0.0, le=1.0)
152
 
153
+ daily_total_engagement: float = Field(default=0.0, ge=0.0)
154
+ daily_posts_made: int = Field(default=0, ge=0)
155
+ daily_energy_min: float = Field(default=1.0, ge=0.0, le=1.0)
156
+
157
+ engagement_signals: Optional[EngagementSignals] = Field(
158
+ default=None, description="Mosseri-aligned signal breakdown for the day"
159
+ )
160
+ coach_feedback: Optional[Dict[str, Any]] = Field(
161
+ default=None,
162
+ description="Counterfactual feedback: delta between agent plan and heatmap-optimal plan",
163
+ )
164
 
165
+ tool_results: List[ToolResult] = Field(default_factory=list, description="Results from tool_calls this step")
166
+ agent_notes: Optional[str] = Field(default=None, description="Echo of agent's notes from previous step")
167
+ api_budget_remaining: int = Field(default=100, ge=0)
168
 
169
+ grader_score: Optional[float] = Field(default=None)
170
  error: Optional[str] = Field(default=None)
server/app.py CHANGED
@@ -1,31 +1,11 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
  """
8
- FastAPI application for the Viraltest Environment.
9
-
10
- This module creates an HTTP server that exposes the ViraltestEnvironment
11
- over HTTP and WebSocket endpoints, compatible with EnvClient.
12
 
13
  Endpoints:
14
- - POST /reset: Reset the environment
15
- - POST /step: Execute an action
16
- - GET /state: Get current environment state
17
- - GET /schema: Get action/observation schemas
18
- - WS /ws: WebSocket endpoint for persistent sessions
19
-
20
- Usage:
21
- # Development (with auto-reload):
22
- uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
-
24
- # Production:
25
- uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
-
27
- # Or run directly:
28
- python -m server.app
29
  """
30
 
31
  import json
@@ -40,21 +20,25 @@ from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
40
 
41
  try:
42
  from openenv.core.env_server.http_server import create_app
43
- except Exception as e: # pragma: no cover
44
  raise ImportError(
45
- "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
46
  ) from e
47
 
48
- # OpenEnv Gradio UI lives at /web; Dockerfile sets this — default on for local parity with HF Spaces.
49
  if "ENABLE_WEB_INTERFACE" not in os.environ:
50
  os.environ["ENABLE_WEB_INTERFACE"] = "true"
51
 
52
  try:
53
  from ..models import ScheduledAction, ViraltestAction, ViraltestObservation
54
- from .viraltest_environment import ViraltestEnvironment
55
  except ImportError:
56
  from models import ScheduledAction, ViraltestAction, ViraltestObservation
57
- from server.viraltest_environment import ViraltestEnvironment
 
 
 
 
 
58
 
59
  _DASHBOARD_HTML = (Path(__file__).parent / "dashboard.html").read_text()
60
 
@@ -78,6 +62,31 @@ if not _gradio_web:
78
  async def _web_disabled_redirect():
79
  return RedirectResponse("/dashboard", status_code=302)
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  _dash_env: Optional[ViraltestEnvironment] = None
82
  _HISTORY_FILE = Path(__file__).parent / "simulation_history.json"
83
 
@@ -137,7 +146,7 @@ async def dashboard_history_clear():
137
  async def dashboard_reset(body: Dict[str, Any] = Body(default={})):
138
  global _dash_env
139
  _dash_env = ViraltestEnvironment()
140
- task = body.get("task", "weekly_engage")
141
  obs = _dash_env.reset(task=task)
142
  return _obs_to_dict(obs)
143
 
@@ -154,28 +163,32 @@ async def dashboard_step(body: Dict[str, Any] = Body(...)):
154
  return _obs_to_dict(obs)
155
 
156
 
157
- try:
158
- from .viraltest_environment import TAG_POOL
159
- except ImportError:
160
- from server.viraltest_environment import TAG_POOL
161
 
162
  _SIM_RNG = stdlib_random.Random(99)
163
  _CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
164
  _TOPICS = ["AI tools", "fitness routine", "growth hacks", "travel guide", "food recipe", "wellness tips"]
165
 
166
 
167
- def _make_daily_plan(actions: list) -> ViraltestAction:
168
- """Helper: build a ViraltestAction from a list of ScheduledAction-like dicts."""
169
- return ViraltestAction(scheduled_actions=[ScheduledAction(**a) for a in actions])
 
 
170
 
171
 
172
  def _plan_always_rest(obs: dict, day: int) -> ViraltestAction:
173
- return _make_daily_plan([])
174
 
175
 
176
  def _plan_spam(obs: dict, day: int) -> ViraltestAction:
177
- actions = [{"hour": h, "action_type": "post", "content_type": "reel",
178
- "topic": "AI tools", "tags": ["ai"]} for h in range(24)]
 
 
 
179
  return _make_daily_plan(actions)
180
 
181
 
@@ -186,111 +199,16 @@ def _plan_smart(obs: dict, day: int) -> ViraltestAction:
186
  pool_tag2 = TAG_POOL[(day * 2 + 1) % len(TAG_POOL)]
187
  ct1 = _CONTENT_TYPES[(day * 2) % 4]
188
  ct2 = _CONTENT_TYPES[(day * 2 + 1) % 4]
 
 
189
  actions = [
190
  {"hour": 8, "action_type": "create_content"},
191
- {"hour": 12, "action_type": "post", "content_type": ct1, "topic": trending, "tags": t_tags + [pool_tag]},
192
- {"hour": 19, "action_type": "post", "content_type": ct2, "topic": trending, "tags": t_tags + [pool_tag2]},
 
 
193
  ]
194
- return _make_daily_plan(actions)
195
-
196
-
197
- def _plan_no_rest(obs: dict, day: int) -> ViraltestAction:
198
- actions = []
199
- for h in range(24):
200
- ct = _CONTENT_TYPES[h % 4]
201
- topic = _SIM_RNG.choice(_TOPICS)
202
- tags = _SIM_RNG.sample(TAG_POOL, 3)
203
- actions.append({"hour": h, "action_type": "post", "content_type": ct, "topic": topic, "tags": tags})
204
- return _make_daily_plan(actions)
205
-
206
-
207
- def _plan_minimal(obs: dict, day: int) -> ViraltestAction:
208
- trending = (obs.get("trending_topics") or ["minimalism"])[0]
209
- tags = list((obs.get("trending_tags") or [])[:3])
210
- return _make_daily_plan([
211
- {"hour": 12, "action_type": "post", "content_type": "carousel", "topic": trending, "tags": tags},
212
- ])
213
-
214
-
215
- def _plan_reel_max(obs: dict, day: int) -> ViraltestAction:
216
- trending = (obs.get("trending_topics") or ["viral content"])[0]
217
- tags = list((obs.get("trending_tags") or [])[:3])
218
- return _make_daily_plan([
219
- {"hour": 12, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
220
- {"hour": 14, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
221
- ])
222
-
223
-
224
- def _plan_split_schedule(obs: dict, day: int) -> ViraltestAction:
225
- trending = (obs.get("trending_topics") or ["daily content"])[0]
226
- tags = list((obs.get("trending_tags") or [])[:2]) + ["tips"]
227
- return _make_daily_plan([
228
- {"hour": 9, "action_type": "post", "content_type": "carousel", "topic": trending, "tags": tags},
229
- {"hour": 19, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
230
- ])
231
-
232
-
233
- def _plan_double_peak(obs: dict, day: int) -> ViraltestAction:
234
- trending = (obs.get("trending_topics") or ["peak time content"])[0]
235
- tags = list((obs.get("trending_tags") or [])[:3])
236
- return _make_daily_plan([
237
- {"hour": 9, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
238
- {"hour": 15, "action_type": "post", "content_type": "carousel", "topic": trending, "tags": tags},
239
- ])
240
-
241
-
242
- def _plan_tag_explorer(obs: dict, day: int) -> ViraltestAction:
243
- trending = (obs.get("trending_topics") or ["devtools"])[0]
244
- start = (day * 6) % len(TAG_POOL)
245
- tags1 = [TAG_POOL[(start + i) % len(TAG_POOL)] for i in range(3)]
246
- tags2 = [TAG_POOL[(start + 3 + i) % len(TAG_POOL)] for i in range(3)]
247
- ct1 = _CONTENT_TYPES[(day * 2) % 4]
248
- ct2 = _CONTENT_TYPES[(day * 2 + 1) % 4]
249
- return _make_daily_plan([
250
- {"hour": 10, "action_type": "post", "content_type": ct1, "topic": trending, "tags": tags1},
251
- {"hour": 18, "action_type": "post", "content_type": ct2, "topic": trending, "tags": tags2},
252
- ])
253
-
254
-
255
- def _plan_queue_optimizer(obs: dict, day: int) -> ViraltestAction:
256
- trending = (obs.get("trending_topics") or ["productivity"])[0]
257
- tags = list((obs.get("trending_tags") or [])[:2]) + ["growth"]
258
- queue = obs.get("content_queue_size", 0)
259
- if day < 2 or queue < 2:
260
- return _make_daily_plan([
261
- {"hour": 8, "action_type": "create_content"},
262
- {"hour": 10, "action_type": "create_content"},
263
- {"hour": 14, "action_type": "create_content"},
264
- ])
265
- ct = _CONTENT_TYPES[day % 4]
266
- return _make_daily_plan([
267
- {"hour": 12, "action_type": "post", "content_type": ct, "topic": trending, "tags": tags},
268
- {"hour": 19, "action_type": "post", "content_type": _CONTENT_TYPES[(day + 1) % 4], "topic": trending, "tags": tags},
269
- ])
270
-
271
-
272
- def _plan_weekend(obs: dict, day: int) -> ViraltestAction:
273
- dow = obs.get("day_of_week", 0)
274
- if dow not in (5, 6):
275
- return _make_daily_plan([])
276
- trending = (obs.get("trending_topics") or ["travel"])[0]
277
- tags = list((obs.get("trending_tags") or [])[:3])
278
- return _make_daily_plan([
279
- {"hour": 11, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
280
- {"hour": 17, "action_type": "post", "content_type": "reel", "topic": trending, "tags": tags},
281
- ])
282
-
283
-
284
- def _plan_weekday_only(obs: dict, day: int) -> ViraltestAction:
285
- dow = obs.get("day_of_week", 0)
286
- if dow >= 5:
287
- return _make_daily_plan([])
288
- trending = (obs.get("trending_topics") or ["weekday content"])[0]
289
- tags = list((obs.get("trending_tags") or [])[:2]) + ["productivity"]
290
- ct = _CONTENT_TYPES[day % 4]
291
- return _make_daily_plan([
292
- {"hour": 12, "action_type": "post", "content_type": ct, "topic": trending, "tags": tags},
293
- ])
294
 
295
 
296
  def _plan_random(obs: dict, day: int) -> ViraltestAction:
@@ -299,87 +217,36 @@ def _plan_random(obs: dict, day: int) -> ViraltestAction:
299
  r = _SIM_RNG.random()
300
  if r < 0.1:
301
  ct = _SIM_RNG.choice(_CONTENT_TYPES)
302
- topic = _SIM_RNG.choice(["random topic", "AI tools", "fitness", "travel"])
303
- tags = _SIM_RNG.sample(TAG_POOL, 2)
304
  actions.append({"hour": h, "action_type": "post", "content_type": ct, "topic": topic, "tags": tags})
305
  elif r < 0.15:
306
  actions.append({"hour": h, "action_type": "create_content"})
307
  return _make_daily_plan(actions)
308
 
309
 
310
- def _plan_sleep_conscious(obs: dict, day: int) -> ViraltestAction:
311
- trending = (obs.get("trending_topics") or ["wellness"])[0]
312
- tags = list((obs.get("trending_tags") or [])[:2]) + ["productivity"]
313
- ct = _CONTENT_TYPES[day % 4]
314
- return _make_daily_plan([
315
- {"hour": 10, "action_type": "post", "content_type": ct, "topic": trending, "tags": tags},
316
- {"hour": 16, "action_type": "create_content"},
317
- ])
318
-
319
-
320
- def _plan_sleep_deprived(obs: dict, day: int) -> ViraltestAction:
321
- trending = (obs.get("trending_topics") or ["coding"])[0]
322
- tags = list((obs.get("trending_tags") or [])[:2])
323
- actions = []
324
- for h in range(24):
325
- if 9 <= h <= 20 and len([a for a in actions if a["action_type"] == "post"]) < 2:
326
- ct = _CONTENT_TYPES[h % 4]
327
- actions.append({"hour": h, "action_type": "post", "content_type": ct, "topic": trending, "tags": tags})
328
- else:
329
- actions.append({"hour": h, "action_type": "create_content"})
330
- return _make_daily_plan(actions)
331
-
332
-
333
- def _plan_growth_focus(obs: dict, day: int) -> ViraltestAction:
334
- trending = (obs.get("trending_topics") or ["growth hacks"])[0]
335
- return _make_daily_plan([
336
- {"hour": 13, "action_type": "post", "content_type": "reel", "topic": trending, "tags": ["viral", "growth", "trending"]},
337
- ])
338
-
339
-
340
- def _plan_tech_niche(obs: dict, day: int) -> ViraltestAction:
341
- ct = _CONTENT_TYPES[day % 4]
342
- return _make_daily_plan([
343
- {"hour": 12, "action_type": "post", "content_type": ct, "topic": "AI tools and coding tips", "tags": ["ai", "coding", "devtools"]},
344
- {"hour": 18, "action_type": "post", "content_type": _CONTENT_TYPES[(day + 1) % 4], "topic": "AI tools and coding tips", "tags": ["ai", "ml", "startup"]},
345
- ])
346
-
347
-
348
- def _plan_conservative(obs: dict, day: int) -> ViraltestAction:
349
- trending = (obs.get("trending_topics") or ["quick tip"])[0]
350
- tags = list((obs.get("trending_tags") or [])[:2])
351
  return _make_daily_plan([
352
- {"hour": 13, "action_type": "post", "content_type": "text_post", "topic": trending, "tags": tags},
 
353
  ])
354
 
355
 
356
  SCENARIOS = {
357
- "always_rest": ("Always Rest", "Never posts. Tests follower decay + zero engagement.", _plan_always_rest),
358
  "spam": ("Spam Post", "Same reel every hour. Burns out fast.", _plan_spam),
359
- "no_rest": ("No Rest", "Posts every hour, never rests. Burns out fast.", _plan_no_rest),
360
- "smart": ("Smart Agent", "Optimal: peak hours, trending, varied types, rests.", _plan_smart),
361
- "queue_optimizer": ("Queue Optimizer", "Creates content first, posts from queue.", _plan_queue_optimizer),
362
- "weekend": ("Weekend Warrior", "Only posts on Sat/Sun.", _plan_weekend),
363
- "tag_explorer": ("Tag Explorer", "New tag combo every post. Max discovery.", _plan_tag_explorer),
364
- "sleep_deprived": ("Sleep Deprived", "Never rests. Tests sleep deprivation.", _plan_sleep_deprived),
365
- "sleep_conscious": ("Sleep Conscious", "Proper sleep schedule.", _plan_sleep_conscious),
366
- "minimal": ("Minimal Poster", "1 post per day at noon.", _plan_minimal),
367
- "reel_max": ("Reel Maximizer", "Reels at peak hours for max reach.", _plan_reel_max),
368
- "split_schedule": ("Split Schedule", "Morning and evening posts.", _plan_split_schedule),
369
- "double_peak": ("Double Peak", "Posts at 9am and 3pm.", _plan_double_peak),
370
- "growth_focus": ("Growth Focus", "Maximizes follower growth.", _plan_growth_focus),
371
- "weekday_only": ("Weekday Only", "No weekend posting.", _plan_weekday_only),
372
- "tech_niche": ("Tech Niche", "AI/coding content focus.", _plan_tech_niche),
373
- "conservative": ("Conservative", "One text post at 1pm.", _plan_conservative),
374
  "random": ("Random Actor", "Random actions. Baseline test.", _plan_random),
375
  }
376
 
377
 
378
  @app.get("/dashboard/scenarios")
379
  async def dashboard_scenarios():
380
- """List all simulation strategies for the dashboard UI."""
381
  items = [{"id": k, "label": v[0], "description": v[1]} for k, v in SCENARIOS.items()]
382
- items.sort(key=lambda x: (x["label"].lower()))
383
  return JSONResponse(
384
  content={"count": len(items), "scenarios": items},
385
  headers={"Cache-Control": "no-store, max-age=0, must-revalidate"},
@@ -392,7 +259,7 @@ async def dashboard_simulate(body: Dict[str, Any] = Body(...)):
392
  _SIM_RNG = stdlib_random.Random(99)
393
 
394
  scenario_id = body.get("scenario", "smart")
395
- task = body.get("task", "weekly_competitive")
396
  if scenario_id not in SCENARIOS:
397
  return {"error": f"Unknown scenario: {scenario_id}"}
398
 
@@ -402,7 +269,7 @@ async def dashboard_simulate(body: Dict[str, Any] = Body(...)):
402
  obs_dict = obs.model_dump()
403
 
404
  steps: List[Dict[str, Any]] = []
405
- for day in range(1, 8):
406
  action = plan_fn(obs_dict, day)
407
  obs = env.step(action)
408
  obs_dict = obs.model_dump()
@@ -423,19 +290,13 @@ async def dashboard_simulate(body: Dict[str, Any] = Body(...)):
423
  "sleep_debt": round(obs.sleep_debt, 3),
424
  "followers": obs.follower_count,
425
  "engagement_rate": round(obs.engagement_rate, 4),
426
- "niche_saturation": round(obs.niche_saturation, 3),
427
  "posts_today": obs.posts_today,
428
  "hour": obs.current_hour,
429
  "day": obs.day_of_week,
430
  "days_elapsed": obs.days_elapsed,
431
  "queue": obs.content_queue_size,
432
- "tag_performance": obs.tag_performance,
433
- "trending_topics": obs.trending_topics,
434
- "trending_tags": obs.trending_tags,
435
- "competitor_avg_engagement": round(obs.competitor_avg_engagement, 4),
436
- "daily_total_engagement": round(obs.daily_total_engagement, 4),
437
- "daily_posts_made": obs.daily_posts_made,
438
- "daily_energy_min": round(obs.daily_energy_min, 3),
439
  })
440
  if obs.done:
441
  break
@@ -477,30 +338,12 @@ async def dashboard_simulate(body: Dict[str, Any] = Body(...)):
477
 
478
 
479
  def main(host: str = "0.0.0.0", port: int = 8000):
480
- """
481
- Entry point for direct execution via uv run or python -m.
482
-
483
- This function enables running the server without Docker:
484
- uv run --project . server
485
- uv run --project . server --port 8001
486
- python -m viraltest.server.app
487
-
488
- Args:
489
- host: Host address to bind to (default: "0.0.0.0")
490
- port: Port number to listen on (default: 8000)
491
-
492
- For production deployments, consider using uvicorn directly with
493
- multiple workers:
494
- uvicorn viraltest.server.app:app --workers 4
495
- """
496
  import uvicorn
497
-
498
  uvicorn.run(app, host=host, port=port)
499
 
500
 
501
  if __name__ == "__main__":
502
  import argparse
503
-
504
  parser = argparse.ArgumentParser()
505
  parser.add_argument("--port", type=int, default=None)
506
  args = parser.parse_args()
 
 
 
 
 
 
 
1
  """
2
+ FastAPI application for the Viraltest Environment v2 (Theme #3.1).
 
 
 
3
 
4
  Endpoints:
5
+ - POST /reset, /step, GET /state, /schema — standard OpenEnv
6
+ - GET /tools tool catalog (Theme #3.1 discovery)
7
+ - GET /tools/{name} single tool schema
8
+ - GET /dashboard simulation UI
 
 
 
 
 
 
 
 
 
 
 
9
  """
10
 
11
  import json
 
20
 
21
  try:
22
  from openenv.core.env_server.http_server import create_app
23
+ except Exception as e:
24
  raise ImportError(
25
+ "openenv is required. Install with 'uv sync'"
26
  ) from e
27
 
 
28
  if "ENABLE_WEB_INTERFACE" not in os.environ:
29
  os.environ["ENABLE_WEB_INTERFACE"] = "true"
30
 
31
  try:
32
  from ..models import ScheduledAction, ViraltestAction, ViraltestObservation
33
+ from .viraltest_environment import TOOL_CATALOG, ViraltestEnvironment
34
  except ImportError:
35
  from models import ScheduledAction, ViraltestAction, ViraltestObservation
36
+ from server.viraltest_environment import TOOL_CATALOG, ViraltestEnvironment
37
+
38
+ try:
39
+ from .viraltest_environment import TAG_POOL
40
+ except ImportError:
41
+ from server.viraltest_environment import TAG_POOL
42
 
43
  _DASHBOARD_HTML = (Path(__file__).parent / "dashboard.html").read_text()
44
 
 
62
  async def _web_disabled_redirect():
63
  return RedirectResponse("/dashboard", status_code=302)
64
 
65
+ # ---------------------------------------------------------------------------
66
+ # Tool catalog endpoints (Theme #3.1 — tool discovery)
67
+ # ---------------------------------------------------------------------------
68
+
69
+ @app.get("/tools")
70
+ async def list_tools():
71
+ """Return the full tool catalog so the agent can discover available tools."""
72
+ return JSONResponse(content={
73
+ "tools": {name: schema for name, schema in TOOL_CATALOG.items()},
74
+ "count": len(TOOL_CATALOG),
75
+ })
76
+
77
+
78
+ @app.get("/tools/{name}")
79
+ async def get_tool(name: str):
80
+ """Return schema for a single tool."""
81
+ if name not in TOOL_CATALOG:
82
+ return JSONResponse(content={"error": f"unknown tool: {name}"}, status_code=404)
83
+ return JSONResponse(content={"name": name, **TOOL_CATALOG[name]})
84
+
85
+
86
+ # ---------------------------------------------------------------------------
87
+ # Dashboard
88
+ # ---------------------------------------------------------------------------
89
+
90
  _dash_env: Optional[ViraltestEnvironment] = None
91
  _HISTORY_FILE = Path(__file__).parent / "simulation_history.json"
92
 
 
146
  async def dashboard_reset(body: Dict[str, Any] = Body(default={})):
147
  global _dash_env
148
  _dash_env = ViraltestEnvironment()
149
+ task = body.get("task", "monthly_engage")
150
  obs = _dash_env.reset(task=task)
151
  return _obs_to_dict(obs)
152
 
 
163
  return _obs_to_dict(obs)
164
 
165
 
166
+ # ---------------------------------------------------------------------------
167
+ # Dashboard scenario helpers (v2 action shape)
168
+ # ---------------------------------------------------------------------------
 
169
 
170
  _SIM_RNG = stdlib_random.Random(99)
171
  _CONTENT_TYPES = ["reel", "carousel", "story", "text_post"]
172
  _TOPICS = ["AI tools", "fitness routine", "growth hacks", "travel guide", "food recipe", "wellness tips"]
173
 
174
 
175
+ def _make_daily_plan(actions: list, notes: Optional[str] = None) -> ViraltestAction:
176
+ return ViraltestAction(
177
+ scheduled_actions=[ScheduledAction(**a) for a in actions],
178
+ notes=notes,
179
+ )
180
 
181
 
182
  def _plan_always_rest(obs: dict, day: int) -> ViraltestAction:
183
+ return _make_daily_plan([], notes="Resting all day to conserve energy.")
184
 
185
 
186
  def _plan_spam(obs: dict, day: int) -> ViraltestAction:
187
+ actions = [
188
+ {"hour": h, "action_type": "post", "content_type": "reel",
189
+ "topic": "AI tools", "tags": ["ai"], "intent": "watch_bait"}
190
+ for h in range(24)
191
+ ]
192
  return _make_daily_plan(actions)
193
 
194
 
 
199
  pool_tag2 = TAG_POOL[(day * 2 + 1) % len(TAG_POOL)]
200
  ct1 = _CONTENT_TYPES[(day * 2) % 4]
201
  ct2 = _CONTENT_TYPES[(day * 2 + 1) % 4]
202
+ intent1 = "save_bait" if ct1 == "carousel" else "watch_bait"
203
+ intent2 = "send_bait" if ct2 == "reel" else "save_bait"
204
  actions = [
205
  {"hour": 8, "action_type": "create_content"},
206
+ {"hour": 12, "action_type": "post", "content_type": ct1, "topic": trending,
207
+ "tags": t_tags + [pool_tag], "intent": intent1},
208
+ {"hour": 19, "action_type": "post", "content_type": ct2, "topic": trending,
209
+ "tags": t_tags + [pool_tag2], "intent": intent2},
210
  ]
211
+ return _make_daily_plan(actions, notes=f"Day {day}: posting at peak hours with varied intents.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
 
213
 
214
  def _plan_random(obs: dict, day: int) -> ViraltestAction:
 
217
  r = _SIM_RNG.random()
218
  if r < 0.1:
219
  ct = _SIM_RNG.choice(_CONTENT_TYPES)
220
+ topic = _SIM_RNG.choice(_TOPICS)
221
+ tags = _SIM_RNG.sample(TAG_POOL[:20], 2)
222
  actions.append({"hour": h, "action_type": "post", "content_type": ct, "topic": topic, "tags": tags})
223
  elif r < 0.15:
224
  actions.append({"hour": h, "action_type": "create_content"})
225
  return _make_daily_plan(actions)
226
 
227
 
228
+ def _plan_minimal(obs: dict, day: int) -> ViraltestAction:
229
+ trending = (obs.get("trending_topics") or ["minimalism"])[0]
230
+ tags = list((obs.get("trending_tags") or [])[:3])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
231
  return _make_daily_plan([
232
+ {"hour": 12, "action_type": "post", "content_type": "carousel",
233
+ "topic": trending, "tags": tags, "intent": "save_bait"},
234
  ])
235
 
236
 
237
  SCENARIOS = {
238
+ "always_rest": ("Always Rest", "Never posts. Tests follower decay.", _plan_always_rest),
239
  "spam": ("Spam Post", "Same reel every hour. Burns out fast.", _plan_spam),
240
+ "smart": ("Smart Agent", "Optimal: peak hours, trending, varied types+intents.", _plan_smart),
241
+ "minimal": ("Minimal Poster", "1 carousel per day at noon.", _plan_minimal),
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  "random": ("Random Actor", "Random actions. Baseline test.", _plan_random),
243
  }
244
 
245
 
246
  @app.get("/dashboard/scenarios")
247
  async def dashboard_scenarios():
 
248
  items = [{"id": k, "label": v[0], "description": v[1]} for k, v in SCENARIOS.items()]
249
+ items.sort(key=lambda x: x["label"].lower())
250
  return JSONResponse(
251
  content={"count": len(items), "scenarios": items},
252
  headers={"Cache-Control": "no-store, max-age=0, must-revalidate"},
 
259
  _SIM_RNG = stdlib_random.Random(99)
260
 
261
  scenario_id = body.get("scenario", "smart")
262
+ task = body.get("task", "monthly_competitive")
263
  if scenario_id not in SCENARIOS:
264
  return {"error": f"Unknown scenario: {scenario_id}"}
265
 
 
269
  obs_dict = obs.model_dump()
270
 
271
  steps: List[Dict[str, Any]] = []
272
+ for day in range(1, 31):
273
  action = plan_fn(obs_dict, day)
274
  obs = env.step(action)
275
  obs_dict = obs.model_dump()
 
290
  "sleep_debt": round(obs.sleep_debt, 3),
291
  "followers": obs.follower_count,
292
  "engagement_rate": round(obs.engagement_rate, 4),
293
+ "burnout_risk": round(obs.burnout_risk, 3),
294
  "posts_today": obs.posts_today,
295
  "hour": obs.current_hour,
296
  "day": obs.day_of_week,
297
  "days_elapsed": obs.days_elapsed,
298
  "queue": obs.content_queue_size,
299
+ "api_budget": obs.api_budget_remaining,
 
 
 
 
 
 
300
  })
301
  if obs.done:
302
  break
 
338
 
339
 
340
  def main(host: str = "0.0.0.0", port: int = 8000):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
341
  import uvicorn
 
342
  uvicorn.run(app, host=host, port=port)
343
 
344
 
345
  if __name__ == "__main__":
346
  import argparse
 
347
  parser = argparse.ArgumentParser()
348
  parser.add_argument("--port", type=int, default=None)
349
  args = parser.parse_args()
server/data/audience_overlap_matrix.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "7×7 symmetric audience overlap matrix between competitor archetypes. Values 0.0-1.0 represent fraction of shared audience. Used by propose_collab to split engagement. Derived from niche proximity (same-niche pairs ~0.4-0.65, cross-niche ~0.05-0.20).",
4
+ "source": "Estimated from Rival IQ 2025 cross-industry overlap patterns + niche proximity heuristic"
5
+ },
6
+ "archetype_ids": ["niche_expert", "viral_chaser", "lifestyle_blogger", "b2b_thought_leader", "food_creator", "fitness_coach", "travel_creator"],
7
+ "matrix": [
8
+ [1.00, 0.12, 0.10, 0.40, 0.08, 0.10, 0.15],
9
+ [0.12, 1.00, 0.55, 0.10, 0.20, 0.25, 0.30],
10
+ [0.10, 0.55, 1.00, 0.15, 0.30, 0.35, 0.40],
11
+ [0.40, 0.10, 0.15, 1.00, 0.08, 0.10, 0.12],
12
+ [0.08, 0.20, 0.30, 0.08, 1.00, 0.45, 0.35],
13
+ [0.10, 0.25, 0.35, 0.10, 0.45, 1.00, 0.30],
14
+ [0.15, 0.30, 0.40, 0.12, 0.35, 0.30, 1.00]
15
+ ]
16
+ }
server/data/audience_segments.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "5 hidden audience segments the agent discovers via query_audience tool. Based on Pew Research 2024 (teens survey n=1391; adults survey n=5733) and Sprout Social Index 2025 (n=4044 consumers). Agent sees segment names but must query to learn affinities.",
4
+ "hidden_from_default_obs": true
5
+ },
6
+ "segments": [
7
+ {
8
+ "id": "young_professionals",
9
+ "label": "Young Professionals (22-34)",
10
+ "size_fraction": 0.35,
11
+ "timezone_peak_offset_hours": 0,
12
+ "topic_affinity": {
13
+ "tech": 0.9,
14
+ "business": 0.8,
15
+ "lifestyle": 0.6,
16
+ "fitness": 0.7,
17
+ "food": 0.5
18
+ },
19
+ "content_type_preference": {
20
+ "reel": 0.9,
21
+ "carousel": 0.7,
22
+ "story": 0.8,
23
+ "text_post": 0.4
24
+ },
25
+ "active_hours": [7, 8, 9, 12, 13, 18, 19, 20, 21, 22]
26
+ },
27
+ {
28
+ "id": "students",
29
+ "label": "Students (16-22)",
30
+ "size_fraction": 0.25,
31
+ "timezone_peak_offset_hours": 2,
32
+ "topic_affinity": {
33
+ "lifestyle": 0.9,
34
+ "fitness": 0.6,
35
+ "education": 0.7,
36
+ "food": 0.8,
37
+ "fashion": 0.8
38
+ },
39
+ "content_type_preference": {
40
+ "reel": 1.0,
41
+ "carousel": 0.5,
42
+ "story": 0.9,
43
+ "text_post": 0.2
44
+ },
45
+ "active_hours": [10, 11, 12, 13, 14, 15, 20, 21, 22, 23]
46
+ },
47
+ {
48
+ "id": "parents",
49
+ "label": "Parents (30-45)",
50
+ "size_fraction": 0.20,
51
+ "timezone_peak_offset_hours": -1,
52
+ "topic_affinity": {
53
+ "food": 0.9,
54
+ "fitness": 0.7,
55
+ "lifestyle": 0.8,
56
+ "education": 0.6,
57
+ "travel": 0.5
58
+ },
59
+ "content_type_preference": {
60
+ "reel": 0.6,
61
+ "carousel": 0.9,
62
+ "story": 0.7,
63
+ "text_post": 0.6
64
+ },
65
+ "active_hours": [6, 7, 8, 12, 13, 20, 21]
66
+ },
67
+ {
68
+ "id": "global_night_owls",
69
+ "label": "Global Night Owls (mixed age, non-US timezone)",
70
+ "size_fraction": 0.12,
71
+ "timezone_peak_offset_hours": 8,
72
+ "topic_affinity": {
73
+ "tech": 0.8,
74
+ "photography": 0.7,
75
+ "travel": 0.8,
76
+ "lifestyle": 0.5,
77
+ "beauty": 0.4
78
+ },
79
+ "content_type_preference": {
80
+ "reel": 0.8,
81
+ "carousel": 0.8,
82
+ "story": 0.5,
83
+ "text_post": 0.5
84
+ },
85
+ "active_hours": [0, 1, 2, 3, 14, 15, 16, 17]
86
+ },
87
+ {
88
+ "id": "passive_scrollers",
89
+ "label": "Passive Scrollers (35-55, low engagement)",
90
+ "size_fraction": 0.08,
91
+ "timezone_peak_offset_hours": 0,
92
+ "topic_affinity": {
93
+ "travel": 0.6,
94
+ "food": 0.7,
95
+ "photography": 0.8,
96
+ "lifestyle": 0.5,
97
+ "fashion": 0.4
98
+ },
99
+ "content_type_preference": {
100
+ "reel": 0.4,
101
+ "carousel": 0.6,
102
+ "story": 0.3,
103
+ "text_post": 0.7
104
+ },
105
+ "active_hours": [7, 8, 12, 19, 20, 21]
106
+ }
107
+ ]
108
+ }
server/data/competitors.json ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "7 competitor archetypes. posts_per_week from Buffer 2.1M study (3-5 optimal). base_engagement_rate from Rival IQ 2025 per-industry. posting_frequency is posts/WEEK (divide by 7 for daily probability).",
4
+ "sources": ["Buffer 2026 frequency study (2.1M posts, 102K accounts)", "Rival IQ 2025 Benchmark (1.9M IG posts, 14 industries)"]
5
+ },
6
+ "archetypes": [
7
+ {
8
+ "id": "niche_expert",
9
+ "name": "Creator Alpha (Niche Expert)",
10
+ "niche": "tech",
11
+ "niche_topics": ["AI tools", "coding tips", "tech news", "prompt engineering"],
12
+ "preferred_types": ["carousel", "text_post"],
13
+ "posts_per_week": 3,
14
+ "base_engagement_rate": 0.55,
15
+ "tag_preferences": ["ai", "coding", "devtools", "buildinpublic"],
16
+ "style": "low_frequency_high_depth"
17
+ },
18
+ {
19
+ "id": "viral_chaser",
20
+ "name": "Creator Beta (Viral Chaser)",
21
+ "niche": "lifestyle",
22
+ "niche_topics": ["morning routine", "self improvement", "productivity hacks", "digital detox"],
23
+ "preferred_types": ["reel", "story"],
24
+ "posts_per_week": 7,
25
+ "base_engagement_rate": 0.38,
26
+ "tag_preferences": ["viral", "trending", "motivation", "grwm"],
27
+ "style": "high_frequency_volatile"
28
+ },
29
+ {
30
+ "id": "lifestyle_blogger",
31
+ "name": "Creator Gamma (Lifestyle Blogger)",
32
+ "niche": "lifestyle",
33
+ "niche_topics": ["minimalist living", "slow living", "work life balance", "journaling"],
34
+ "preferred_types": ["carousel", "reel"],
35
+ "posts_per_week": 4,
36
+ "base_engagement_rate": 0.45,
37
+ "tag_preferences": ["lifestyle", "wellness", "selfcare", "minimalism"],
38
+ "style": "consistent_moderate"
39
+ },
40
+ {
41
+ "id": "b2b_thought_leader",
42
+ "name": "Creator Delta (B2B Thought Leader)",
43
+ "niche": "business",
44
+ "niche_topics": ["growth hacks", "marketing strategy", "personal branding", "sales funnel"],
45
+ "preferred_types": ["carousel", "text_post"],
46
+ "posts_per_week": 3,
47
+ "base_engagement_rate": 0.42,
48
+ "tag_preferences": ["entrepreneur", "businesstips", "growth", "leadership"],
49
+ "style": "low_frequency_high_depth"
50
+ },
51
+ {
52
+ "id": "food_creator",
53
+ "name": "Creator Epsilon (Food Creator)",
54
+ "niche": "food",
55
+ "niche_topics": ["food recipe", "meal prep ideas", "baking tutorial", "food photography"],
56
+ "preferred_types": ["reel", "carousel"],
57
+ "posts_per_week": 5,
58
+ "base_engagement_rate": 0.48,
59
+ "tag_preferences": ["foodie", "recipe", "cooking", "healthyfood"],
60
+ "style": "consistent_moderate"
61
+ },
62
+ {
63
+ "id": "fitness_coach",
64
+ "name": "Creator Zeta (Fitness Coach)",
65
+ "niche": "fitness",
66
+ "niche_topics": ["fitness routine", "home workout", "gym transformation", "strength training"],
67
+ "preferred_types": ["reel", "story"],
68
+ "posts_per_week": 5,
69
+ "base_engagement_rate": 0.52,
70
+ "tag_preferences": ["fitness", "gym", "workout", "fitfam"],
71
+ "style": "high_frequency_volatile"
72
+ },
73
+ {
74
+ "id": "travel_creator",
75
+ "name": "Creator Eta (Travel Creator)",
76
+ "niche": "travel",
77
+ "niche_topics": ["travel guide", "hidden gems", "travel photography", "digital nomad"],
78
+ "preferred_types": ["reel", "carousel"],
79
+ "posts_per_week": 3,
80
+ "base_engagement_rate": 0.50,
81
+ "tag_preferences": ["travel", "wanderlust", "adventure", "travelgram"],
82
+ "style": "low_frequency_high_depth"
83
+ }
84
+ ]
85
+ }
server/data/hour_heatmap.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "7×24 engagement multiplier grid (day_of_week × hour). 1.0 = platform-wide average. Sources: Buffer 2026 (9.6M posts), Sprout Social 2026 (2B engagements, 307K profiles). Days: 0=Mon..6=Sun. Hours: 0-23 local time.",
4
+ "methodology": "Buffer identified per-day best hours; Sprout provided per-industry peak windows. Cross-referenced: peaks where both agree get 1.3-1.5×; dead zones where both agree get 0.3-0.5×. Intermediate hours interpolated."
5
+ },
6
+ "grid": {
7
+ "0": [0.30, 0.25, 0.25, 0.25, 0.30, 0.35, 0.50, 0.65, 0.80, 0.90, 0.95, 1.00, 1.05, 1.10, 1.20, 1.15, 1.10, 1.05, 1.20, 1.30, 1.25, 1.15, 1.00, 0.60],
8
+ "1": [0.30, 0.25, 0.25, 0.25, 0.30, 0.35, 0.50, 0.70, 0.85, 0.95, 1.05, 1.10, 1.20, 1.35, 1.40, 1.35, 1.25, 1.20, 1.30, 1.35, 1.25, 1.10, 0.95, 0.55],
9
+ "2": [0.30, 0.25, 0.25, 0.25, 0.30, 0.35, 0.55, 0.75, 0.95, 1.05, 1.10, 1.15, 1.35, 1.45, 1.45, 1.40, 1.30, 1.25, 1.40, 1.45, 1.40, 1.30, 1.10, 0.60],
10
+ "3": [0.30, 0.25, 0.25, 0.25, 0.30, 0.35, 0.55, 0.80, 1.05, 1.25, 1.15, 1.10, 1.30, 1.35, 1.30, 1.20, 1.10, 1.05, 1.15, 1.20, 1.10, 1.00, 0.85, 0.50],
11
+ "4": [0.30, 0.25, 0.25, 0.25, 0.30, 0.35, 0.50, 0.60, 0.70, 0.75, 0.80, 0.80, 0.85, 0.85, 0.80, 0.75, 0.70, 0.65, 0.70, 0.75, 0.70, 0.80, 0.85, 0.50],
12
+ "5": [0.30, 0.25, 0.25, 0.25, 0.30, 0.30, 0.40, 0.45, 0.50, 0.55, 0.60, 0.60, 0.65, 0.65, 0.60, 0.55, 0.55, 0.50, 0.55, 0.60, 0.65, 0.75, 0.80, 0.50],
13
+ "6": [0.30, 0.25, 0.25, 0.25, 0.30, 0.30, 0.40, 0.50, 0.55, 0.60, 0.65, 0.70, 0.70, 0.70, 0.65, 0.60, 0.55, 0.55, 0.60, 0.70, 0.80, 0.85, 0.80, 0.55]
14
+ }
15
+ }
server/data/tags.json ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "Instagram tag pool tiered by usage volume. Sources: Rival IQ 2025 Benchmark (1.9M IG posts), Socialinsider 2026 (31M posts).",
4
+ "tiers": {
5
+ "broad": "High-volume generic tags (>100M posts). High reach, low engagement lift.",
6
+ "niche": "Mid-volume vertical tags (1M-100M). Better engagement, narrower audience.",
7
+ "trending": "Rotated daily by env. Volatile reach bonus.",
8
+ "seasonal": "Calendar-driven. Active only near their season window."
9
+ }
10
+ },
11
+ "broad": [
12
+ {"tag": "love", "volume_hint": "2.1B"},
13
+ {"tag": "instagood", "volume_hint": "1.9B"},
14
+ {"tag": "photography", "volume_hint": "1.1B"},
15
+ {"tag": "photooftheday", "volume_hint": "1B"},
16
+ {"tag": "reels", "volume_hint": "985M"},
17
+ {"tag": "beautiful", "volume_hint": "854M"},
18
+ {"tag": "nature", "volume_hint": "838M"},
19
+ {"tag": "travel", "volume_hint": "767M"},
20
+ {"tag": "happy", "volume_hint": "728M"},
21
+ {"tag": "style", "volume_hint": "683M"},
22
+ {"tag": "fitness", "volume_hint": "560M"},
23
+ {"tag": "food", "volume_hint": "538M"},
24
+ {"tag": "life", "volume_hint": "471M"},
25
+ {"tag": "motivation", "volume_hint": "423M"},
26
+ {"tag": "art", "volume_hint": "900M"},
27
+ {"tag": "music", "volume_hint": "491M"},
28
+ {"tag": "trending", "volume_hint": "350M"},
29
+ {"tag": "lifestyle", "volume_hint": "340M"},
30
+ {"tag": "explore", "volume_hint": "330M"},
31
+ {"tag": "health", "volume_hint": "280M"},
32
+ {"tag": "design", "volume_hint": "360M"},
33
+ {"tag": "inspiration", "volume_hint": "400M"},
34
+ {"tag": "viral", "volume_hint": "200M"},
35
+ {"tag": "tips", "volume_hint": "180M"},
36
+ {"tag": "howto", "volume_hint": "120M"}
37
+ ],
38
+ "niche": {
39
+ "tech": [
40
+ {"tag": "ai", "volume_hint": "85M"},
41
+ {"tag": "ml", "volume_hint": "12M"},
42
+ {"tag": "coding", "volume_hint": "45M"},
43
+ {"tag": "startup", "volume_hint": "38M"},
44
+ {"tag": "saas", "volume_hint": "4M"},
45
+ {"tag": "devtools", "volume_hint": "2M"},
46
+ {"tag": "techreview", "volume_hint": "8M"},
47
+ {"tag": "artificialintelligence", "volume_hint": "22M"},
48
+ {"tag": "futuretech", "volume_hint": "5M"},
49
+ {"tag": "programming", "volume_hint": "30M"},
50
+ {"tag": "webdev", "volume_hint": "15M"},
51
+ {"tag": "buildinpublic", "volume_hint": "1.5M"},
52
+ {"tag": "technews", "volume_hint": "10M"},
53
+ {"tag": "gadgets", "volume_hint": "18M"}
54
+ ],
55
+ "lifestyle": [
56
+ {"tag": "grwm", "volume_hint": "45M"},
57
+ {"tag": "wellness", "volume_hint": "65M"},
58
+ {"tag": "selfcare", "volume_hint": "55M"},
59
+ {"tag": "minimalism", "volume_hint": "18M"},
60
+ {"tag": "stoic", "volume_hint": "5M"},
61
+ {"tag": "productivity", "volume_hint": "25M"},
62
+ {"tag": "mentalhealth", "volume_hint": "40M"},
63
+ {"tag": "healthylifestyle", "volume_hint": "80M"},
64
+ {"tag": "luxurylifestyle", "volume_hint": "30M"},
65
+ {"tag": "goodlife", "volume_hint": "20M"}
66
+ ],
67
+ "fitness": [
68
+ {"tag": "gym", "volume_hint": "120M"},
69
+ {"tag": "workout", "volume_hint": "95M"},
70
+ {"tag": "fitfam", "volume_hint": "55M"},
71
+ {"tag": "bodybuilding", "volume_hint": "42M"},
72
+ {"tag": "running", "volume_hint": "38M"},
73
+ {"tag": "yoga", "volume_hint": "60M"},
74
+ {"tag": "fitover40", "volume_hint": "2M"},
75
+ {"tag": "homeworkout", "volume_hint": "15M"},
76
+ {"tag": "gymlife", "volume_hint": "35M"},
77
+ {"tag": "nutrition", "volume_hint": "28M"}
78
+ ],
79
+ "business": [
80
+ {"tag": "entrepreneur", "volume_hint": "90M"},
81
+ {"tag": "smallbusiness", "volume_hint": "75M"},
82
+ {"tag": "businesstips", "volume_hint": "20M"},
83
+ {"tag": "sidehustle", "volume_hint": "15M"},
84
+ {"tag": "growyourbusiness", "volume_hint": "10M"},
85
+ {"tag": "financialfreedom", "volume_hint": "18M"},
86
+ {"tag": "passiveincome", "volume_hint": "12M"},
87
+ {"tag": "growth", "volume_hint": "45M"},
88
+ {"tag": "leadership", "volume_hint": "22M"},
89
+ {"tag": "digitalmarketing", "volume_hint": "35M"}
90
+ ],
91
+ "food": [
92
+ {"tag": "foodie", "volume_hint": "110M"},
93
+ {"tag": "recipe", "volume_hint": "55M"},
94
+ {"tag": "healthyfood", "volume_hint": "65M"},
95
+ {"tag": "cooking", "volume_hint": "45M"},
96
+ {"tag": "mealprep", "volume_hint": "18M"},
97
+ {"tag": "vegan", "volume_hint": "40M"},
98
+ {"tag": "baking", "volume_hint": "30M"}
99
+ ],
100
+ "travel": [
101
+ {"tag": "wanderlust", "volume_hint": "85M"},
102
+ {"tag": "travelgram", "volume_hint": "70M"},
103
+ {"tag": "adventure", "volume_hint": "60M"},
104
+ {"tag": "backpacking", "volume_hint": "20M"},
105
+ {"tag": "roadtrip", "volume_hint": "25M"},
106
+ {"tag": "solotravel", "volume_hint": "12M"},
107
+ {"tag": "islandlife", "volume_hint": "15M"}
108
+ ],
109
+ "fashion": [
110
+ {"tag": "ootd", "volume_hint": "95M"},
111
+ {"tag": "fashionblogger", "volume_hint": "65M"},
112
+ {"tag": "streetstyle", "volume_hint": "40M"},
113
+ {"tag": "skincare", "volume_hint": "55M"},
114
+ {"tag": "makeup", "volume_hint": "80M"}
115
+ ],
116
+ "web3": [
117
+ {"tag": "web3", "volume_hint": "8M"},
118
+ {"tag": "crypto", "volume_hint": "35M"},
119
+ {"tag": "nft", "volume_hint": "25M"},
120
+ {"tag": "blockchain", "volume_hint": "18M"},
121
+ {"tag": "defi", "volume_hint": "5M"},
122
+ {"tag": "gaming", "volume_hint": "50M"}
123
+ ]
124
+ },
125
+ "trending": [
126
+ {"tag": "aitools2026", "volume_hint": "3M"},
127
+ {"tag": "techtrends2026", "volume_hint": "2M"},
128
+ {"tag": "chatgpt", "volume_hint": "15M"},
129
+ {"tag": "midjourney", "volume_hint": "8M"},
130
+ {"tag": "threads", "volume_hint": "12M"},
131
+ {"tag": "climateaction", "volume_hint": "6M"},
132
+ {"tag": "genai", "volume_hint": "4M"},
133
+ {"tag": "remotework", "volume_hint": "18M"},
134
+ {"tag": "creatoreconomy", "volume_hint": "5M"},
135
+ {"tag": "sustainableliving", "volume_hint": "10M"}
136
+ ],
137
+ "seasonal": [
138
+ {"tag": "summer", "volume_hint": "300M", "active_months": [5, 6, 7, 8]},
139
+ {"tag": "newyear", "volume_hint": "150M", "active_months": [12, 1]},
140
+ {"tag": "worldcup", "volume_hint": "80M", "active_months": [6, 7]},
141
+ {"tag": "oscars", "volume_hint": "45M", "active_months": [2, 3]},
142
+ {"tag": "election", "volume_hint": "60M", "active_months": [10, 11]},
143
+ {"tag": "blackfriday", "volume_hint": "55M", "active_months": [11]},
144
+ {"tag": "christmas", "volume_hint": "200M", "active_months": [11, 12]},
145
+ {"tag": "backtoschool", "volume_hint": "30M", "active_months": [8, 9]},
146
+ {"tag": "valentines", "volume_hint": "70M", "active_months": [1, 2]},
147
+ {"tag": "halloween", "volume_hint": "90M", "active_months": [10]}
148
+ ]
149
+ }
server/data/topics.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_meta": {
3
+ "description": "Niche → topics with engagement multipliers and seasonal trending calendar. Multipliers from Rival IQ 2025 Benchmark (1.9M IG posts, 14 industries). Normalized so overall avg ≈ 1.0.",
4
+ "multiplier_source": "Rival IQ 2025: Animals 2.00%, Photo 1.99%, Outdoors 1.91%, Travel 1.83%, Sports/Fitness 1.75%, Music 1.63%, Entertainment 1.55%, Food 1.55%, Lifestyle 1.53%, Education 1.48%, Finance 1.34%, Tech 1.31%, Real Estate 1.25%, Fashion 1.24%, Beauty 1.19%. Normalized by dividing by median (1.53)."
5
+ },
6
+ "niches": {
7
+ "tech": {
8
+ "engagement_multiplier": 0.86,
9
+ "topics": [
10
+ "AI tools", "coding tips", "startup life", "tech news",
11
+ "SaaS growth", "dev workflow", "open source", "gadget review",
12
+ "prompt engineering", "AI art"
13
+ ]
14
+ },
15
+ "lifestyle": {
16
+ "engagement_multiplier": 1.00,
17
+ "topics": [
18
+ "morning routine", "minimalist living", "self improvement",
19
+ "productivity hacks", "mental health", "stoic philosophy",
20
+ "journaling", "digital detox", "work life balance", "slow living"
21
+ ]
22
+ },
23
+ "fitness": {
24
+ "engagement_multiplier": 1.14,
25
+ "topics": [
26
+ "fitness routine", "home workout", "running tips",
27
+ "gym transformation", "meal prep", "yoga flow",
28
+ "strength training", "recovery", "marathon training", "calisthenics"
29
+ ]
30
+ },
31
+ "business": {
32
+ "engagement_multiplier": 0.88,
33
+ "topics": [
34
+ "growth hacks", "marketing strategy", "creator economy",
35
+ "monetization", "brand deals", "analytics deep dive",
36
+ "side hustle", "personal branding", "email marketing", "sales funnel"
37
+ ]
38
+ },
39
+ "food": {
40
+ "engagement_multiplier": 1.01,
41
+ "topics": [
42
+ "food recipe", "meal prep ideas", "restaurant review",
43
+ "baking tutorial", "healthy eating", "vegan recipes",
44
+ "street food", "coffee culture", "kitchen hacks", "food photography"
45
+ ]
46
+ },
47
+ "travel": {
48
+ "engagement_multiplier": 1.20,
49
+ "topics": [
50
+ "travel guide", "hidden gems", "budget travel",
51
+ "solo travel tips", "road trip", "beach destinations",
52
+ "cultural immersion", "travel photography", "hostel life", "digital nomad"
53
+ ]
54
+ },
55
+ "fashion": {
56
+ "engagement_multiplier": 0.81,
57
+ "topics": [
58
+ "fashion haul", "outfit of the day", "streetwear",
59
+ "sustainable fashion", "thrift finds", "seasonal trends",
60
+ "capsule wardrobe", "accessory styling", "luxury fashion", "sneaker culture"
61
+ ]
62
+ },
63
+ "beauty": {
64
+ "engagement_multiplier": 0.78,
65
+ "topics": [
66
+ "skincare routine", "makeup tutorial", "hair care",
67
+ "clean beauty", "anti aging", "nail art",
68
+ "fragrance review", "dermatologist tips", "glow up", "beauty on budget"
69
+ ]
70
+ },
71
+ "photography": {
72
+ "engagement_multiplier": 1.30,
73
+ "topics": [
74
+ "photo editing", "golden hour shots", "street photography",
75
+ "landscape photography", "portrait tips", "mobile photography",
76
+ "lightroom presets", "composition rules", "astrophotography", "film photography"
77
+ ]
78
+ },
79
+ "education": {
80
+ "engagement_multiplier": 0.97,
81
+ "topics": [
82
+ "study tips", "online courses", "career advice",
83
+ "book recommendations", "science explainer", "history facts",
84
+ "language learning", "financial literacy", "college life", "exam prep"
85
+ ]
86
+ }
87
+ },
88
+ "seasonal_trends": [
89
+ {"topic": "New Year goals", "peak_month": 1, "halflife_hours": 72, "niches": ["lifestyle", "fitness", "business"]},
90
+ {"topic": "Valentine gift guide", "peak_month": 2, "halflife_hours": 48, "niches": ["fashion", "food", "lifestyle"]},
91
+ {"topic": "Oscar predictions", "peak_month": 3, "halflife_hours": 36, "niches": ["lifestyle", "photography"]},
92
+ {"topic": "Spring fitness challenge", "peak_month": 4, "halflife_hours": 96, "niches": ["fitness"]},
93
+ {"topic": "Summer travel plans", "peak_month": 6, "halflife_hours": 120, "niches": ["travel", "photography"]},
94
+ {"topic": "World Cup watch party", "peak_month": 7, "halflife_hours": 60, "niches": ["lifestyle", "food"]},
95
+ {"topic": "Back to school essentials", "peak_month": 8, "halflife_hours": 72, "niches": ["education", "tech", "fashion"]},
96
+ {"topic": "Fall fashion lookbook", "peak_month": 9, "halflife_hours": 96, "niches": ["fashion", "beauty"]},
97
+ {"topic": "Halloween costumes", "peak_month": 10, "halflife_hours": 48, "niches": ["fashion", "lifestyle", "food"]},
98
+ {"topic": "Black Friday deals", "peak_month": 11, "halflife_hours": 36, "niches": ["tech", "business", "fashion"]},
99
+ {"topic": "Holiday gift guide", "peak_month": 12, "halflife_hours": 96, "niches": ["tech", "fashion", "food", "beauty"]},
100
+ {"topic": "Year in review", "peak_month": 12, "halflife_hours": 48, "niches": ["lifestyle", "business", "photography"]}
101
+ ]
102
+ }
server/viraltest_environment.py CHANGED
@@ -1,31 +1,98 @@
1
  """
2
- Viraltest Environment — RL-Based Creator Optimization Simulation.
3
-
4
- Simulates a social media creator's weekly posting lifecycle.
5
- The agent decides when to post, what format, which tags, and how
6
- to differentiate from competitors, while managing burnout.
 
 
 
 
 
7
  """
8
 
 
 
9
  import random
10
  from collections import defaultdict
11
  from dataclasses import dataclass, field
12
- from typing import Any, Dict, List, Optional
 
13
  from uuid import uuid4
14
 
15
  from openenv.core.env_server.interfaces import Environment
16
  from openenv.core.env_server.types import State
17
 
18
  try:
19
- from ..models import ScheduledAction, ViraltestAction, ViraltestObservation
 
 
 
 
 
 
 
 
 
20
  except ImportError:
21
- from models import ScheduledAction, ViraltestAction, ViraltestObservation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  # ---------------------------------------------------------------------------
24
- # Constants (research-backed)
25
  # ---------------------------------------------------------------------------
26
 
27
- TASK_HORIZON = 7 # 7 daily steps (each step simulates 24 hours internally)
28
 
 
29
  CONTENT_ENERGY_COST = {
30
  "reel": 0.25,
31
  "carousel": 0.20,
@@ -37,129 +104,151 @@ BASE_ENGAGEMENT = {
37
  "reel": 0.52,
38
  "carousel": 0.55,
39
  "story": 0.30,
40
- "text_post": 0.37,
41
  }
42
 
 
43
  REACH_MULT = {
44
  "reel": 2.25,
45
  "carousel": 1.0,
46
  "story": 0.5,
47
- "text_post": 0.44,
48
  }
49
 
50
- TAG_POOL = [
51
- # Tech
52
- "ai", "ml", "coding", "startup", "saas", "devtools",
53
- # Lifestyle
54
- "fitness", "travel", "food", "wellness", "fashion", "photography",
55
- # Trending (base set rotated daily)
56
- "summer", "worldcup", "election", "newyear", "oscars", "climate",
57
- # Niche
58
- "productivity", "minimalism", "stoic", "web3", "gaming", "crypto",
59
- # Broad
60
- "motivation", "tips", "howto", "viral", "trending", "growth",
61
- ]
62
-
63
- TOPIC_CATEGORIES = {
64
- "tech": ["AI tools", "coding tips", "startup life", "tech news", "SaaS growth", "dev workflow"],
65
- "lifestyle": ["fitness routine", "travel guide", "food recipe", "wellness tips", "fashion haul", "photo editing"],
66
- "business": ["growth hacks", "marketing strategy", "creator economy", "monetization", "brand deals", "analytics"],
67
  }
68
 
69
- VALID_TASKS = ("weekly_engage", "weekly_strategic", "weekly_competitive")
 
 
 
 
 
 
 
 
70
 
71
- # Hour multipliers (Buffer 9.6M post study)
72
- PEAK_HOURS = {
73
- "weekday_morning": (9, 12, 1.3),
74
- "weekday_peak": (12, 15, 1.4),
75
- "evening": (18, 20, 1.25),
76
- "late_evening": (20, 23, 1.1),
77
- "night": (23, 6, 0.5),
78
- "off_hours": (6, 9, 0.8),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  }
80
 
81
- WEEKEND_PENALTY = 0.7
82
- PEAK_DAYS = (1, 2, 3) # Tue, Wed, Thu (0=Mon)
 
 
 
83
 
84
 
85
  @dataclass
86
  class CompetitorState:
 
87
  name: str
 
88
  niche_topics: List[str]
89
  preferred_types: List[str]
90
- posting_frequency: float
91
- base_engagement: float
92
  tag_preferences: List[str]
 
93
  recent_posts: List[Dict[str, Any]] = field(default_factory=list)
94
 
95
 
96
- COMPETITOR_PROFILES = [
97
- {
98
- "name": "creator_alpha",
99
- "niche_topics": ["AI tools", "coding tips", "tech news"],
100
- "preferred_types": ["reel", "carousel"],
101
- "posting_frequency": 2.5,
102
- "base_engagement": 0.45,
103
- "tag_preferences": ["ai", "coding", "tech news"],
104
  },
105
- {
106
- "name": "creator_beta",
107
- "niche_topics": ["growth hacks", "marketing strategy", "creator economy"],
108
- "preferred_types": ["carousel", "text_post"],
109
- "posting_frequency": 1.8,
110
- "base_engagement": 0.40,
111
- "tag_preferences": ["growth", "tips", "viral"],
112
  },
113
- {
114
- "name": "creator_gamma",
115
- "niche_topics": ["fitness routine", "wellness tips", "motivation"],
116
- "preferred_types": ["reel", "story"],
117
- "posting_frequency": 3.0,
118
- "base_engagement": 0.38,
119
- "tag_preferences": ["fitness", "wellness", "motivation"],
120
  },
121
- ]
122
-
123
- INITIAL_FOLLOWERS = 10000
124
- REST_RECOVERY = 0.12
125
- CREATE_CONTENT_COST = 0.05
126
- REPETITION_ENERGY_PENALTY = 0.05
127
- AUDIENCE_FATIGUE_THRESHOLD_1 = 3
128
- AUDIENCE_FATIGUE_THRESHOLD_2 = 5
129
- FOLLOWER_DECAY_HOURS = 48
130
- ALGORITHM_PENALTY_MULT = 0.6
131
- ALGORITHM_PENALTY_DURATION = 2
132
-
133
- # Sleep mechanics (research-backed: Frontiers Neuroscience 2025, Frontiers Human Neuroscience 2014)
134
- # - Cognitive performance follows a continuous decay curve, not step functions
135
- # - Full night deprivation (~24hrs) impairs performance by ~50%
136
- # - Uses exponential decay: quality = 1.0 * (0.5 ^ ((hours - optimal) / halflife))
137
- SLEEP_OPTIMAL_AWAKE = 14 # Hours awake with no performance impact
138
- SLEEP_HALFLIFE_HOURS = 10 # Hours beyond optimal for quality to halve
139
- SLEEP_MIN_QUALITY = 0.30 # Floor for sleep-based quality (can't go below 30%)
140
- SLEEP_ENERGY_DRAIN_START = 16 # Hours awake before extra energy drain kicks in
141
- SLEEP_ENERGY_DRAIN_RATE = 0.015 # Energy drain per hour when sleep deprived
142
- SLEEP_RECOVERY_PER_REST = 2 # Hours of "sleep credit" per rest action (rest = nap)
143
-
 
 
144
 
145
- # ---------------------------------------------------------------------------
146
- # Environment
147
- # ---------------------------------------------------------------------------
148
 
149
  class ViraltestEnvironment(Environment):
150
- """
151
- Weekly creator optimization simulation.
152
-
153
- The agent manages a social media creator's posting strategy over 7 daily
154
- steps (each day runs 24 simulated hours from a sparse schedule), balancing
155
- engagement, energy, tags, and competition.
156
- """
157
 
158
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
159
 
160
  def __init__(self) -> None:
161
  self._state = State(episode_id=str(uuid4()), step_count=0)
162
- self._task = "weekly_engage"
163
  self._rng = random.Random(42)
164
  self._init_state()
165
 
@@ -168,12 +257,12 @@ class ViraltestEnvironment(Environment):
168
  self._followers = INITIAL_FOLLOWERS
169
  self._initial_followers = INITIAL_FOLLOWERS
170
  self._hour = 9
171
- self._day = 0 # 0=Mon
172
  self._posts_today = 0
173
  self._last_post_types: List[str] = []
174
  self._time_since_last_post = 0
175
  self._engagement_history: List[float] = []
176
- self._tag_history: Dict[str, List[float]] = defaultdict(list)
177
  self._content_queue = 0
178
  self._unique_tags_used: set = set()
179
  self._unique_content_types: set = set()
@@ -187,21 +276,43 @@ class ViraltestEnvironment(Environment):
187
  self._total_engagement = 0.0
188
  self._posts_per_day: Dict[int, int] = defaultdict(int)
189
  self._algorithm_penalty_remaining = 0
 
 
 
 
 
 
 
 
190
 
191
  self._trending_topics = self._pick_trending_topics()
192
  self._trending_tags = self._pick_trending_tags()
193
- self._competitors = [CompetitorState(**p) for p in COMPETITOR_PROFILES]
194
-
195
- # Sleep state: creator starts well-rested at 9am (awake since ~7am)
196
- self._hours_since_sleep = 2 # Woke up 2 hours ago at start (9am)
197
- self._sleep_debt = 0.0 # 0 = fully rested, 1 = severe deprivation
198
-
199
- # ----- trend rotation -----
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
 
201
  def _pick_trending_topics(self) -> List[str]:
202
  all_topics = []
203
- for cat_topics in TOPIC_CATEGORIES.values():
204
- all_topics.extend(cat_topics)
205
  return self._rng.sample(all_topics, min(3, len(all_topics)))
206
 
207
  def _pick_trending_tags(self) -> List[str]:
@@ -211,65 +322,51 @@ class ViraltestEnvironment(Environment):
211
  self._trending_topics = self._pick_trending_topics()
212
  self._trending_tags = self._pick_trending_tags()
213
 
214
- # ----- hour multiplier -----
215
 
216
  def _get_hour_multiplier(self) -> float:
 
217
  h = self._hour
218
- d = self._day
219
-
220
- is_weekend = d >= 5
221
- base = WEEKEND_PENALTY if is_weekend else 1.0
222
-
223
- if 12 <= h < 15 and d in PEAK_DAYS:
224
- return base * 1.4
225
- if 9 <= h < 12:
226
- return base * 1.3
227
- if 18 <= h < 20:
228
- return base * 1.25
229
- if 20 <= h < 23:
230
- return base * 1.1
231
- if h >= 23 or h < 6:
232
- return base * 0.5
233
- return base * 0.8
234
 
235
- # ----- quality -----
236
 
237
  def _get_quality_modifier(self) -> float:
238
- """
239
- Quality affected by both energy and sleep debt.
240
-
241
- Sleep uses exponential decay curve (not step function):
242
- - No impact until SLEEP_OPTIMAL_AWAKE hours (14hrs)
243
- - Then: quality = 0.5 ^ ((hours - optimal) / halflife)
244
- - At 24hrs awake: ~50% quality (matches research)
245
- - Floor at SLEEP_MIN_QUALITY (30%)
246
- """
247
- # Energy component (existing logic)
248
  if self._energy > 0.5:
249
  energy_factor = 1.0
250
  else:
251
  energy_factor = max(0.48, self._energy * 1.5)
252
 
253
- # Sleep component - exponential decay curve
254
  if self._hours_since_sleep <= SLEEP_OPTIMAL_AWAKE:
255
  sleep_factor = 1.0
256
  else:
257
  hours_over = self._hours_since_sleep - SLEEP_OPTIMAL_AWAKE
258
- # Exponential decay: halves every SLEEP_HALFLIFE_HOURS
259
- sleep_factor = 0.5 ** (hours_over / SLEEP_HALFLIFE_HOURS)
260
- sleep_factor = max(SLEEP_MIN_QUALITY, sleep_factor)
261
 
262
  return energy_factor * sleep_factor
263
 
 
 
 
 
 
 
 
 
 
 
 
 
264
  # ----- tags -----
265
 
266
  def _calc_tag_boost(self, tags: Optional[List[str]]) -> float:
267
  if not tags:
268
  return 1.0
269
  trending_count = sum(1 for t in tags if t in self._trending_tags)
270
- perf_values = [
271
- self._tag_performance_avg(t) for t in tags if self._tag_performance_avg(t) > 0
272
- ]
273
  perf_avg = sum(perf_values) / len(perf_values) if perf_values else 0.0
274
  return 1.0 + 0.1 * trending_count + 0.05 * perf_avg
275
 
@@ -278,7 +375,8 @@ class ViraltestEnvironment(Environment):
278
  if not history:
279
  return 0.0
280
  window = history[-5:]
281
- return sum(window) / len(window)
 
282
 
283
  def _get_tag_performance_dict(self) -> Dict[str, float]:
284
  return {tag: self._tag_performance_avg(tag) for tag in self._unique_tags_used}
@@ -289,23 +387,18 @@ class ViraltestEnvironment(Environment):
289
  for comp in self._competitors:
290
  for p in comp.recent_posts:
291
  p["hours_ago"] += 1
292
- comp.recent_posts = [p for p in comp.recent_posts if p["hours_ago"] < 48]
293
 
294
- post_prob = comp.posting_frequency / 24.0
295
- if self._rng.random() < post_prob:
296
  ct = self._rng.choice(comp.preferred_types)
297
  topic = self._rng.choice(comp.niche_topics)
298
- tags = self._rng.sample(
299
- comp.tag_preferences, min(3, len(comp.tag_preferences))
300
- )
301
- eng = comp.base_engagement + self._rng.uniform(-0.1, 0.1)
302
  eng = max(0.0, min(1.0, eng))
303
  comp.recent_posts.append({
304
- "content_type": ct,
305
- "topic": topic,
306
- "tags": tags,
307
- "engagement": round(eng, 3),
308
- "hours_ago": 0,
309
  })
310
 
311
  def _get_competitor_recent_posts(self, limit: int = 5) -> List[Dict[str, Any]]:
@@ -317,10 +410,7 @@ class ViraltestEnvironment(Environment):
317
  return all_posts[:limit]
318
 
319
  def _get_competitor_avg_engagement(self) -> float:
320
- engagements = []
321
- for comp in self._competitors:
322
- for p in comp.recent_posts:
323
- engagements.append(p["engagement"])
324
  return sum(engagements) / len(engagements) if engagements else 0.0
325
 
326
  def _calc_niche_saturation(self, topic: Optional[str]) -> float:
@@ -341,46 +431,210 @@ class ViraltestEnvironment(Environment):
341
  if not topic:
342
  return 1.0
343
  saturation = self._calc_niche_saturation(topic)
344
- recent_topics = []
345
- for comp in self._competitors:
346
- for p in comp.recent_posts:
347
- if p["hours_ago"] < 12:
348
- recent_topics.append(p["topic"].lower())
349
- topic_lower = topic.lower()
350
- has_overlap = any(_topic_overlap(topic_lower, t) for t in recent_topics)
351
  if not has_overlap:
352
  return 1.3
353
  if saturation > 0.7:
354
  return 0.6
355
  return 1.0
356
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
357
  # ----- core API -----
358
 
359
- def reset(
360
- self,
361
- seed: Optional[int] = None,
362
- episode_id: Optional[str] = None,
363
- **kwargs: Any,
364
- ) -> ViraltestObservation:
365
- self._task = kwargs.get("task", "weekly_engage")
366
  if self._task not in VALID_TASKS:
367
- self._task = "weekly_engage"
368
 
369
  self._rng = random.Random(seed if seed is not None else 42)
370
- self._state = State(
371
- episode_id=episode_id or str(uuid4()), step_count=0
372
- )
373
  self._init_state()
374
 
 
 
 
 
 
 
 
 
 
375
  return self._build_observation(reward=0.0, error=None)
376
 
377
- def step(self, action: ViraltestAction, **kwargs: Any) -> ViraltestObservation: # type: ignore[override]
378
- """Process a daily step: run 24 hourly sub-steps using the sparse schedule."""
379
  if self._episode_done and self._final_observation is not None:
380
  return self._final_observation
381
 
382
  self._state.step_count += 1
383
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
384
  schedule: Dict[int, ScheduledAction] = {}
385
  errors: List[str] = []
386
  for sa in action.scheduled_actions:
@@ -398,23 +652,32 @@ class ViraltestEnvironment(Environment):
398
  daily_posts = 0
399
  energy_min = self._energy
400
  burned_out = False
 
401
 
402
  for hour in range(24):
403
  if burned_out:
404
  break
 
405
 
406
  if hour in schedule:
407
  sa = schedule[hour]
408
- hourly_eng, hourly_reward = self._process_hour_action(sa)
409
  else:
410
  hourly_eng, hourly_reward = self._process_hour_rest()
 
411
 
412
  daily_engagement += hourly_eng
413
  daily_reward += hourly_reward
414
  if hourly_eng > 0:
415
  daily_posts += 1
 
 
 
 
 
 
 
416
  energy_min = min(energy_min, self._energy)
417
-
418
  self._advance_competitors()
419
  self._advance_time()
420
  self._energy_history.append(self._energy)
@@ -422,70 +685,100 @@ class ViraltestEnvironment(Environment):
422
  if self._energy <= 0.0:
423
  burned_out = True
424
 
425
- day_posts = self._posts_per_day.get(self._day - 1, 0) if self._day > 0 else self._posts_per_day.get(0, 0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
426
  prev_day = max(0, self._day - 1)
427
  if 1 <= self._posts_per_day.get(prev_day, 0) <= 2:
428
  self._days_with_good_posts.add(prev_day)
429
 
430
  avg_reward = daily_reward / 24.0
431
-
432
  error_str = "; ".join(errors) if errors else None
433
 
434
  done = self._state.step_count >= TASK_HORIZON or self._energy <= 0.0
 
 
435
  if done:
436
  self._episode_done = True
437
  grader_score = self._run_grader()
 
 
 
 
 
 
 
 
 
 
 
438
  self._final_observation = self._build_observation(
439
- reward=round(avg_reward, 4),
440
- error=error_str,
441
- done=True,
442
- grader_score=grader_score,
443
- daily_total_engagement=daily_engagement,
444
- daily_posts_made=daily_posts,
445
- daily_energy_min=energy_min,
446
  )
447
  return self._final_observation
448
 
449
  return self._build_observation(
450
- reward=round(avg_reward, 4),
451
- error=error_str,
452
  daily_total_engagement=daily_engagement,
453
- daily_posts_made=daily_posts,
454
- daily_energy_min=energy_min,
 
455
  )
456
 
457
- def _process_hour_action(self, sa: ScheduledAction) -> tuple:
458
- """Process a single scheduled (non-rest) hourly action. Returns (engagement, reward)."""
459
  engagement = 0.0
 
460
 
461
  if sa.action_type == "post":
462
- cost = CONTENT_ENERGY_COST.get(sa.content_type, 0.1) # type: ignore[arg-type]
463
  if self._content_queue > 0:
464
  cost *= 0.5
465
  self._content_queue -= 1
466
- if len(self._last_post_types) >= 3 and all(
467
- t == sa.content_type for t in self._last_post_types[-3:]
468
- ):
469
  cost += REPETITION_ENERGY_PENALTY
470
  self._energy = max(0.0, self._energy - cost)
471
- self._unique_content_types.add(sa.content_type) # type: ignore[arg-type]
472
 
473
  if self._energy <= 0.0:
474
  engagement = 0.0
475
  else:
476
- base = BASE_ENGAGEMENT.get(sa.content_type, 0.3) # type: ignore[arg-type]
477
- reach = REACH_MULT.get(sa.content_type, 1.0) # type: ignore[arg-type]
478
  hour_mult = self._get_hour_multiplier()
479
  quality = self._get_quality_modifier()
480
  tag_boost = self._calc_tag_boost(sa.tags)
481
  trending_bonus = 1.5 if self._is_topic_trending(sa.topic) else 1.0
482
  comp_diff = self._calc_competitor_diff(sa.topic)
 
 
483
 
484
- fatigue = 1.0
485
- if self._posts_today >= AUDIENCE_FATIGUE_THRESHOLD_2:
486
- fatigue = 0.1
487
- elif self._posts_today >= AUDIENCE_FATIGUE_THRESHOLD_1:
488
- fatigue = 0.5
489
 
490
  algo_mult = 1.0
491
  if self._algorithm_penalty_remaining > 0:
@@ -495,15 +788,20 @@ class ViraltestEnvironment(Environment):
495
  engagement = (
496
  base * reach * hour_mult * quality * tag_boost
497
  * trending_bonus * comp_diff * fatigue * algo_mult
 
498
  )
499
  engagement = min(engagement, 5.0)
500
 
 
 
501
  self._last_topic = sa.topic
502
 
503
  if sa.tags and engagement > 0:
 
 
504
  for tag in sa.tags:
505
  tag_lower = tag.lower()
506
- self._tag_history[tag_lower].append(engagement)
507
  self._unique_tags_used.add(tag_lower)
508
 
509
  self._engagement_history.append(engagement)
@@ -513,7 +811,7 @@ class ViraltestEnvironment(Environment):
513
  if self._calc_competitor_diff(sa.topic) >= 1.3:
514
  self._unique_topic_steps += 1
515
 
516
- self._last_post_types.append(sa.content_type) # type: ignore[arg-type]
517
  if len(self._last_post_types) > 3:
518
  self._last_post_types = self._last_post_types[-3:]
519
  self._posts_today += 1
@@ -531,13 +829,13 @@ class ViraltestEnvironment(Environment):
531
  if self._time_since_last_post >= FOLLOWER_DECAY_HOURS:
532
  self._followers = max(0, self._followers - int(self._followers * 0.005))
533
  if self._algorithm_penalty_remaining == 0:
534
- self._algorithm_penalty_remaining = ALGORITHM_PENALTY_DURATION
 
535
 
536
  reward = 0.0 if self._energy <= 0.0 else self._compute_hourly_reward(sa, engagement)
537
- return engagement, reward
538
 
539
- def _process_hour_rest(self) -> tuple:
540
- """Process a rest hour. Returns (0.0, reward)."""
541
  self._energy = min(1.0, self._energy + REST_RECOVERY)
542
  self._hours_since_sleep = max(0, self._hours_since_sleep - SLEEP_RECOVERY_PER_REST)
543
  self._sleep_debt = max(0.0, self._sleep_debt - 0.1)
@@ -546,7 +844,8 @@ class ViraltestEnvironment(Environment):
546
  if self._time_since_last_post >= FOLLOWER_DECAY_HOURS:
547
  self._followers = max(0, self._followers - int(self._followers * 0.005))
548
  if self._algorithm_penalty_remaining == 0:
549
- self._algorithm_penalty_remaining = ALGORITHM_PENALTY_DURATION
 
550
 
551
  reward = 0.0 if self._energy <= 0.0 else self._compute_rest_reward()
552
  return 0.0, reward
@@ -555,8 +854,6 @@ class ViraltestEnvironment(Environment):
555
  def state(self) -> State:
556
  return self._state
557
 
558
- # ----- validation -----
559
-
560
  def _validate_scheduled_action(self, sa: ScheduledAction) -> Optional[str]:
561
  if sa.action_type not in ("post", "create_content"):
562
  return f"Invalid action_type: {sa.action_type}"
@@ -568,14 +865,12 @@ class ViraltestEnvironment(Environment):
568
  if not sa.topic or not sa.topic.strip():
569
  return "topic is required when posting"
570
  if len(sa.topic) > 200:
571
- return "topic must be 200 characters"
572
  if sa.tags:
573
- valid = [t for t in sa.tags if t.lower() in TAG_POOL]
574
  sa.tags = valid if valid else None
575
  return None
576
 
577
- # ----- trending -----
578
-
579
  def _is_topic_trending(self, topic: Optional[str]) -> bool:
580
  if not topic:
581
  return False
@@ -611,7 +906,6 @@ class ViraltestEnvironment(Environment):
611
  comp_component = min(1.0, diff / 1.3) * 0.15
612
 
613
  burnout_penalty = 0.1 if self._energy < 0.2 else 0.0
614
-
615
  raw = eng_component + energy_component + consistency_component + tag_component + comp_component - burnout_penalty
616
  return max(0.0, min(1.0, raw))
617
 
@@ -633,25 +927,17 @@ class ViraltestEnvironment(Environment):
633
  raw = energy_component + consistency_component - burnout_penalty
634
  return max(0.0, min(1.0, raw))
635
 
636
- # ----- time -----
637
-
638
  def _advance_time(self) -> None:
639
  self._hour += 1
640
-
641
- # Track hours since sleep (always increases unless resting)
642
  self._hours_since_sleep += 1
643
 
644
- # Sleep deprivation drains extra energy (smooth ramp after threshold)
645
  if self._hours_since_sleep > SLEEP_ENERGY_DRAIN_START:
646
  hours_over = self._hours_since_sleep - SLEEP_ENERGY_DRAIN_START
647
- # Drain increases smoothly the longer you're awake
648
  drain = SLEEP_ENERGY_DRAIN_RATE * (1 + hours_over * 0.1)
649
  self._energy = max(0.0, self._energy - drain)
650
 
651
- # Update sleep debt (smooth accumulation based on hours awake)
652
  if self._hours_since_sleep > SLEEP_OPTIMAL_AWAKE:
653
  hours_over = self._hours_since_sleep - SLEEP_OPTIMAL_AWAKE
654
- # Debt accumulates faster the longer awake (quadratic-ish curve)
655
  debt_rate = 0.01 * (1 + hours_over * 0.05)
656
  self._sleep_debt = min(1.0, self._sleep_debt + debt_rate)
657
 
@@ -661,17 +947,14 @@ class ViraltestEnvironment(Environment):
661
  self._posts_today = 0
662
  self._rotate_trends()
663
 
664
- # ----- observation builder -----
665
-
666
  def _build_observation(
667
- self,
668
- reward: float,
669
- error: Optional[str],
670
- done: bool = False,
671
  grader_score: Optional[float] = None,
672
- daily_total_engagement: float = 0.0,
673
- daily_posts_made: int = 0,
674
  daily_energy_min: float = 1.0,
 
 
 
675
  ) -> ViraltestObservation:
676
  recent_eng = self._engagement_history[-10:] if self._engagement_history else []
677
  eng_rate = sum(recent_eng) / len(recent_eng) if recent_eng else 0.0
@@ -680,6 +963,8 @@ class ViraltestEnvironment(Environment):
680
  if grader_score is not None:
681
  meta["grader_score"] = round(grader_score, 4)
682
 
 
 
683
  return ViraltestObservation(
684
  current_hour=self._hour,
685
  day_of_week=self._day % 7,
@@ -691,17 +976,17 @@ class ViraltestEnvironment(Environment):
691
  engagement_rate=round(eng_rate, 4),
692
  posts_today=self._posts_today,
693
  time_since_last_post=self._time_since_last_post,
694
- trending_topics=list(self._trending_topics),
695
  content_queue_size=self._content_queue,
696
  last_post_type=self._last_post_types[-1] if self._last_post_types else "none",
697
- tag_performance=self._get_tag_performance_dict(),
698
- trending_tags=list(self._trending_tags),
699
- competitor_recent_posts=self._get_competitor_recent_posts(),
700
- competitor_avg_engagement=round(self._get_competitor_avg_engagement(), 4),
701
- niche_saturation=round(self._calc_niche_saturation(self._last_topic), 3),
702
  daily_total_engagement=round(daily_total_engagement, 4),
703
  daily_posts_made=daily_posts_made,
704
  daily_energy_min=round(daily_energy_min, 3),
 
 
 
 
 
705
  grader_score=round(grader_score, 4) if grader_score is not None else None,
706
  error=error,
707
  done=done,
@@ -709,66 +994,57 @@ class ViraltestEnvironment(Environment):
709
  metadata=meta,
710
  )
711
 
712
- # ----- graders -----
713
 
714
  def _run_grader(self) -> float:
715
- if self._task == "weekly_engage":
716
- return self._grade_weekly_engage()
717
- elif self._task == "weekly_strategic":
718
- return self._grade_weekly_strategic()
719
- elif self._task == "weekly_competitive":
720
- return self._grade_weekly_competitive()
721
  return 0.0
722
 
723
  def _theoretical_max_engagement(self) -> float:
724
  best_base = max(BASE_ENGAGEMENT.values())
725
  best_reach = max(REACH_MULT.values())
726
- peak_mult = 1.4
727
- quality = 1.0
728
- posts_per_day = 2
729
- days = 7
730
- return best_base * best_reach * peak_mult * quality * posts_per_day * days
731
 
732
- def _grade_weekly_engage(self) -> float:
733
  theoretical_max = self._theoretical_max_engagement()
734
  if theoretical_max <= 0:
735
  return 0.0
736
  raw = min(1.0, self._total_engagement / theoretical_max)
737
  if self._energy <= 0.0:
738
- raw *= 0.3 # burnout penalty even on easy task
739
  return raw
740
 
741
- def _grade_weekly_strategic(self) -> float:
742
- # Burnout = severe penalty (not total fail like competitive, but close)
743
  if self._energy <= 0.0:
744
  return max(0.0, min(0.15, self._total_engagement * 0.01))
745
 
746
- # Engagement: 35%
747
  theoretical_max = self._theoretical_max_engagement()
748
  norm_eng = min(1.0, self._total_engagement / theoretical_max) if theoretical_max > 0 else 0.0
749
 
750
- # Tag score: 25% (40% discovery + 60% exploitation)
751
  positive_tags = sum(1 for t in self._unique_tags_used if self._tag_performance_avg(t) > 0)
752
  tag_discovery = min(1.0, positive_tags / 30.0)
753
- top_perfs = sorted(
754
- [self._tag_performance_avg(t) for t in self._unique_tags_used], reverse=True
755
- )[:3]
756
  tag_exploitation = (sum(top_perfs) / len(top_perfs)) if top_perfs else 0.0
757
  tag_exploitation = min(1.0, tag_exploitation / 2.0)
758
  tag_score = 0.4 * tag_discovery + 0.6 * tag_exploitation
759
 
760
- # Avg energy: 25%
761
  avg_energy = sum(self._energy_history) / len(self._energy_history) if self._energy_history else 0.0
762
-
763
- # Consistency: 15%
764
- consistency = len(self._days_with_good_posts) / 7.0
765
 
766
  raw = 0.35 * norm_eng + 0.25 * tag_score + 0.25 * avg_energy + 0.15 * consistency
767
 
768
- # Constraints
769
  min_energy = min(self._energy_history) if self._energy_history else 0.0
770
  if min_energy < 0.2:
771
- raw *= 0.4 # crashed hard
772
  elif min_energy < 0.3:
773
  raw = min(raw, 0.45)
774
  if len(self._unique_tags_used) < 5:
@@ -776,53 +1052,39 @@ class ViraltestEnvironment(Environment):
776
 
777
  return max(0.0, min(1.0, raw))
778
 
779
- def _grade_weekly_competitive(self) -> float:
780
- # Burnout = total fail
781
  if self._energy <= 0.0:
782
  return 0.0
783
 
784
- # Engagement: 25%
785
  theoretical_max = self._theoretical_max_engagement()
786
  norm_eng = min(1.0, self._total_engagement / theoretical_max) if theoretical_max > 0 else 0.0
787
 
788
- # Tag score: 20%
789
  positive_tags = sum(1 for t in self._unique_tags_used if self._tag_performance_avg(t) > 0)
790
  tag_discovery = min(1.0, positive_tags / 30.0)
791
- top_perfs = sorted(
792
- [self._tag_performance_avg(t) for t in self._unique_tags_used], reverse=True
793
- )[:3]
794
  tag_exploitation = (sum(top_perfs) / len(top_perfs)) if top_perfs else 0.0
795
  tag_exploitation = min(1.0, tag_exploitation / 2.0)
796
  tag_score = 0.4 * tag_discovery + 0.6 * tag_exploitation
797
 
798
- # Follower growth: 20%
799
  growth = (self._followers - self._initial_followers) / self._initial_followers if self._initial_followers > 0 else 0.0
800
- target_growth = 0.05
801
  norm_growth = min(1.0, max(0.0, growth / target_growth))
802
 
803
- # Competitor outperformance: 15%
804
  comp_avg = self._get_competitor_avg_engagement()
805
  my_avg = self._total_engagement / self._posting_steps if self._posting_steps > 0 else 0.0
806
  outperformance = my_avg / comp_avg if comp_avg > 0 else 1.0
807
  norm_outperformance = min(1.0, outperformance / 1.5)
808
 
809
- # Differentiation: 10%
810
  differentiation = self._unique_topic_steps / self._posting_steps if self._posting_steps > 0 else 0.0
811
 
812
- # Energy floor: 10%
813
  min_energy = min(self._energy_history) if self._energy_history else 0.0
814
  energy_floor = min(1.0, max(0.0, min_energy))
815
 
816
  raw = (
817
- 0.25 * norm_eng
818
- + 0.20 * tag_score
819
- + 0.20 * norm_growth
820
- + 0.15 * norm_outperformance
821
- + 0.10 * differentiation
822
- + 0.10 * energy_floor
823
  )
824
 
825
- # Constraints
826
  if len(self._unique_content_types) < 3:
827
  raw *= 0.5
828
  if len(self._unique_tags_used) < 8:
@@ -831,15 +1093,23 @@ class ViraltestEnvironment(Environment):
831
  return max(0.0, min(1.0, raw))
832
 
833
 
834
- # ---------------------------------------------------------------------------
835
- # Helpers
836
- # ---------------------------------------------------------------------------
837
-
838
  def _topic_overlap(topic_a: str, topic_b: str) -> bool:
839
- """Check if two topics have significant word overlap."""
840
  words_a = set(topic_a.split())
841
  words_b = set(topic_b.split())
842
  if not words_a or not words_b:
843
  return False
844
  common = words_a & words_b
845
  return len(common) / min(len(words_a), len(words_b)) >= 0.5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """
2
+ Viraltest Environment v2 Theme #3.1 World-Modeling Simulation.
3
+
4
+ 30-day creator optimization with:
5
+ - Mosseri-aligned engagement signals (watch_time, sends, saves, likes)
6
+ - Discoverable tool catalog (partial observability)
7
+ - Piecewise-linear sleep model (Van Dongen 2003)
8
+ - Data-driven hour heatmap (Buffer 9.6M + Sprout 2B)
9
+ - Tiered audience fatigue (Buffer 2.1M)
10
+ - Multi-episode brand persistence
11
+ - Counterfactual coach feedback
12
  """
13
 
14
+ import json
15
+ import math
16
  import random
17
  from collections import defaultdict
18
  from dataclasses import dataclass, field
19
+ from pathlib import Path
20
+ from typing import Any, Dict, List, Optional, Tuple
21
  from uuid import uuid4
22
 
23
  from openenv.core.env_server.interfaces import Environment
24
  from openenv.core.env_server.types import State
25
 
26
  try:
27
+ from ..models import (
28
+ CollabProposal,
29
+ EngagementSignals,
30
+ ReplyAction,
31
+ ScheduledAction,
32
+ ToolCall,
33
+ ToolResult,
34
+ ViraltestAction,
35
+ ViraltestObservation,
36
+ )
37
  except ImportError:
38
+ from models import (
39
+ CollabProposal,
40
+ EngagementSignals,
41
+ ReplyAction,
42
+ ScheduledAction,
43
+ ToolCall,
44
+ ToolResult,
45
+ ViraltestAction,
46
+ ViraltestObservation,
47
+ )
48
+
49
+ _DATA_DIR = Path(__file__).parent / "data"
50
+
51
+ def _load_json(name: str) -> Any:
52
+ return json.loads((_DATA_DIR / name).read_text())
53
+
54
+ # ---------------------------------------------------------------------------
55
+ # Data files (loaded once at module level)
56
+ # ---------------------------------------------------------------------------
57
+
58
+ _TAGS_DATA = _load_json("tags.json")
59
+ _TOPICS_DATA = _load_json("topics.json")
60
+ _COMPETITORS_DATA = _load_json("competitors.json")
61
+ _HEATMAP_DATA = _load_json("hour_heatmap.json")
62
+ _AUDIENCE_DATA = _load_json("audience_segments.json")
63
+ _OVERLAP_DATA = _load_json("audience_overlap_matrix.json")
64
+
65
+ # Flatten tag pool for validation
66
+ TAG_POOL: List[str] = []
67
+ for t in _TAGS_DATA.get("broad", []):
68
+ TAG_POOL.append(t["tag"])
69
+ for _cat, tags in _TAGS_DATA.get("niche", {}).items():
70
+ for t in tags:
71
+ TAG_POOL.append(t["tag"])
72
+ for t in _TAGS_DATA.get("trending", []):
73
+ TAG_POOL.append(t["tag"])
74
+ for t in _TAGS_DATA.get("seasonal", []):
75
+ TAG_POOL.append(t["tag"])
76
+
77
+ TOPIC_CATEGORIES: Dict[str, List[str]] = {}
78
+ for niche_name, niche_data in _TOPICS_DATA.get("niches", {}).items():
79
+ TOPIC_CATEGORIES[niche_name] = niche_data["topics"]
80
+
81
+ _NICHE_MULTIPLIERS: Dict[str, float] = {}
82
+ for niche_name, niche_data in _TOPICS_DATA.get("niches", {}).items():
83
+ _NICHE_MULTIPLIERS[niche_name] = niche_data["engagement_multiplier"]
84
+
85
+ _HEATMAP_GRID: Dict[int, List[float]] = {
86
+ int(k): v for k, v in _HEATMAP_DATA.get("grid", {}).items()
87
+ }
88
 
89
  # ---------------------------------------------------------------------------
90
+ # Constants (research-backed, Tier 1-3 sources)
91
  # ---------------------------------------------------------------------------
92
 
93
+ TASK_HORIZON = 30 # 30 daily steps (monthly cycle)
94
 
95
+ # Socialinsider 2026 (31M posts)
96
  CONTENT_ENERGY_COST = {
97
  "reel": 0.25,
98
  "carousel": 0.20,
 
104
  "reel": 0.52,
105
  "carousel": 0.55,
106
  "story": 0.30,
107
+ "text_post": 0.45,
108
  }
109
 
110
+ # Socialinsider 2026 + CreatorsJet 10K study
111
  REACH_MULT = {
112
  "reel": 2.25,
113
  "carousel": 1.0,
114
  "story": 0.5,
115
+ "text_post": 0.91,
116
  }
117
 
118
+ # Mosseri Jan-2025: format→signal affinity (which signal each format naturally excels at)
119
+ FORMAT_SIGNAL_WEIGHTS = {
120
+ "reel": {"watch_time": 0.50, "sends_per_reach": 0.25, "saves": 0.10, "likes_per_reach": 0.15},
121
+ "carousel": {"watch_time": 0.10, "sends_per_reach": 0.15, "saves": 0.50, "likes_per_reach": 0.25},
122
+ "story": {"watch_time": 0.20, "sends_per_reach": 0.40, "saves": 0.05, "likes_per_reach": 0.35},
123
+ "text_post": {"watch_time": 0.05, "sends_per_reach": 0.10, "saves": 0.30, "likes_per_reach": 0.55},
 
 
 
 
 
 
 
 
 
 
 
124
  }
125
 
126
+ # Intent multiplier matrix: when intent matches format's strong signal, boost that signal
127
+ INTENT_MULTIPLIER = {
128
+ "send_bait": {"sends_per_reach": 1.6},
129
+ "save_bait": {"saves": 1.7},
130
+ "watch_bait": {"watch_time": 1.5},
131
+ "like_bait": {"likes_per_reach": 1.3},
132
+ }
133
+
134
+ VALID_TASKS = ("monthly_engage", "monthly_strategic", "monthly_competitive")
135
 
136
+ INITIAL_FOLLOWERS = 10000
137
+ REST_RECOVERY = 0.12
138
+ CREATE_CONTENT_COST = 0.05
139
+ REPETITION_ENERGY_PENALTY = 0.05
140
+ FOLLOWER_DECAY_HOURS = 72
141
+ ALGORITHM_PENALTY_MULT = 0.6
142
+ ALGORITHM_PENALTY_BASE_DURATION = 2
143
+
144
+ # Van Dongen 2003 *Sleep* PMID 12683469: lapses linear above 15.84h
145
+ SLEEP_OPTIMAL_AWAKE = 16
146
+ SLEEP_LINEAR_DECAY_PER_HOUR = 0.0625 # reaches ~50% at 24h awake (8h × 0.0625 = 0.5)
147
+ SLEEP_MIN_QUALITY = 0.30
148
+ SLEEP_ENERGY_DRAIN_START = 16
149
+ SLEEP_ENERGY_DRAIN_RATE = 0.015
150
+ SLEEP_RECOVERY_PER_REST = 2
151
+
152
+ # Buffer 2.1M study + arxiv:2410.13108: tiered fatigue
153
+ FATIGUE_TIERS = {2: 1.0, 3: 0.75, 4: 0.50, 5: 0.25}
154
+ WEEKLY_FATIGUE_THRESHOLD = 7
155
+ WEEKLY_FATIGUE_MULT = 0.75
156
+
157
+ SATURATION_PENALTY_K = 0.25
158
+ TREND_DEFAULT_HALFLIFE_HOURS = 60
159
+ COLLAB_MAX_PER_MONTH = 2
160
+ REPLY_WINDOW_MINUTES = 90
161
+ REPLY_REACH_BONUS = 1.4
162
+ API_BUDGET_INITIAL = 100
163
+
164
+ # Tool costs
165
+ TOOL_COSTS = {
166
+ "query_audience": 2,
167
+ "query_competitor": 2,
168
+ "query_tag_history": 1,
169
+ "query_trends": 1,
170
+ "predict_engagement": 3,
171
+ "draft_review": 3,
172
+ "query_creator_pool": 1,
173
+ "propose_collab": 5,
174
  }
175
 
176
+ # ---------------------------------------------------------------------------
177
+ # Brand state for multi-episode persistence
178
+ # ---------------------------------------------------------------------------
179
+
180
+ _BRAND_STORE: Dict[str, Dict[str, Any]] = {}
181
 
182
 
183
  @dataclass
184
  class CompetitorState:
185
+ id: str
186
  name: str
187
+ niche: str
188
  niche_topics: List[str]
189
  preferred_types: List[str]
190
+ posts_per_week: float
191
+ base_engagement_rate: float
192
  tag_preferences: List[str]
193
+ style: str
194
  recent_posts: List[Dict[str, Any]] = field(default_factory=list)
195
 
196
 
197
+ # ---------------------------------------------------------------------------
198
+ # Tool catalog (schemas for GET /tools)
199
+ # ---------------------------------------------------------------------------
200
+
201
+ TOOL_CATALOG = {
202
+ "query_audience": {
203
+ "description": "Query a specific audience segment to learn its topic affinities, content preferences, and active hours.",
204
+ "parameters": {"segment_id": {"type": "string", "enum": [s["id"] for s in _AUDIENCE_DATA.get("segments", [])]}},
205
  },
206
+ "query_competitor": {
207
+ "description": "Get recent posts and strategy of a competitor archetype within a time window.",
208
+ "parameters": {
209
+ "competitor_id": {"type": "string", "enum": [a["id"] for a in _COMPETITORS_DATA.get("archetypes", [])]},
210
+ "window_days": {"type": "integer", "default": 7, "minimum": 1, "maximum": 30},
211
+ },
 
212
  },
213
+ "query_tag_history": {
214
+ "description": "Get your historical engagement signals (watch, sends, saves, likes) for a specific tag.",
215
+ "parameters": {"tag": {"type": "string"}},
 
 
 
 
216
  },
217
+ "query_trends": {
218
+ "description": "Get currently trending topics and tags for a niche, with decay-adjusted strength.",
219
+ "parameters": {"niche": {"type": "string", "enum": list(TOPIC_CATEGORIES.keys())}},
220
+ },
221
+ "predict_engagement": {
222
+ "description": "Simulate engagement signals for a hypothetical daily plan WITHOUT committing it. Returns predicted watch/sends/saves/likes.",
223
+ "parameters": {"scheduled_actions": {"type": "array", "description": "Same format as ViraltestAction.scheduled_actions"}},
224
+ },
225
+ "draft_review": {
226
+ "description": "Get AI review of a draft plan: strengths, weaknesses, suggested improvements.",
227
+ "parameters": {"scheduled_actions": {"type": "array"}},
228
+ },
229
+ "query_creator_pool": {
230
+ "description": "List available competitor archetypes for potential collaboration, with audience overlap %.",
231
+ "parameters": {},
232
+ },
233
+ "propose_collab": {
234
+ "description": "Propose a collaboration post with a competitor. Splits engagement by audience overlap. Max 2 per month.",
235
+ "parameters": {
236
+ "partner_id": {"type": "string"},
237
+ "content_type": {"type": "string", "enum": ["reel", "story", "carousel", "text_post"]},
238
+ "hour": {"type": "integer", "minimum": 0, "maximum": 23},
239
+ },
240
+ },
241
+ }
242
 
 
 
 
243
 
244
  class ViraltestEnvironment(Environment):
245
+ """Monthly creator optimization simulation (Theme #3.1 World Modeling)."""
 
 
 
 
 
 
246
 
247
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
248
 
249
  def __init__(self) -> None:
250
  self._state = State(episode_id=str(uuid4()), step_count=0)
251
+ self._task = "monthly_engage"
252
  self._rng = random.Random(42)
253
  self._init_state()
254
 
 
257
  self._followers = INITIAL_FOLLOWERS
258
  self._initial_followers = INITIAL_FOLLOWERS
259
  self._hour = 9
260
+ self._day = 0
261
  self._posts_today = 0
262
  self._last_post_types: List[str] = []
263
  self._time_since_last_post = 0
264
  self._engagement_history: List[float] = []
265
+ self._tag_history: Dict[str, List[Dict[str, float]]] = defaultdict(list)
266
  self._content_queue = 0
267
  self._unique_tags_used: set = set()
268
  self._unique_content_types: set = set()
 
276
  self._total_engagement = 0.0
277
  self._posts_per_day: Dict[int, int] = defaultdict(int)
278
  self._algorithm_penalty_remaining = 0
279
+ self._agent_notes: Optional[str] = None
280
+ self._api_budget = API_BUDGET_INITIAL
281
+ self._collabs_this_month = 0
282
+ self._collab_history: List[str] = []
283
+ self._low_energy_days = 0
284
+ self._total_posts_this_week = 0
285
+ self._week_start_day = 0
286
+ self._daily_signals = EngagementSignals()
287
 
288
  self._trending_topics = self._pick_trending_topics()
289
  self._trending_tags = self._pick_trending_tags()
290
+ self._competitors = self._load_competitors()
291
+
292
+ self._hours_since_sleep = 2
293
+ self._sleep_debt = 0.0
294
+
295
+ def _load_competitors(self) -> List[CompetitorState]:
296
+ archetypes = _COMPETITORS_DATA.get("archetypes", [])
297
+ return [
298
+ CompetitorState(
299
+ id=a["id"],
300
+ name=a["name"],
301
+ niche=a["niche"],
302
+ niche_topics=a["niche_topics"],
303
+ preferred_types=a["preferred_types"],
304
+ posts_per_week=a["posts_per_week"],
305
+ base_engagement_rate=a["base_engagement_rate"],
306
+ tag_preferences=a["tag_preferences"],
307
+ style=a.get("style", "consistent_moderate"),
308
+ )
309
+ for a in archetypes
310
+ ]
311
 
312
  def _pick_trending_topics(self) -> List[str]:
313
  all_topics = []
314
+ for niche_data in _TOPICS_DATA.get("niches", {}).values():
315
+ all_topics.extend(niche_data["topics"])
316
  return self._rng.sample(all_topics, min(3, len(all_topics)))
317
 
318
  def _pick_trending_tags(self) -> List[str]:
 
322
  self._trending_topics = self._pick_trending_topics()
323
  self._trending_tags = self._pick_trending_tags()
324
 
325
+ # ----- hour multiplier (heatmap-based) -----
326
 
327
  def _get_hour_multiplier(self) -> float:
328
+ dow = self._day % 7
329
  h = self._hour
330
+ row = _HEATMAP_GRID.get(dow)
331
+ if row and 0 <= h < len(row):
332
+ return row[h]
333
+ return 0.8
 
 
 
 
 
 
 
 
 
 
 
 
334
 
335
+ # ----- quality (piecewise-linear sleep, Van Dongen 2003) -----
336
 
337
  def _get_quality_modifier(self) -> float:
 
 
 
 
 
 
 
 
 
 
338
  if self._energy > 0.5:
339
  energy_factor = 1.0
340
  else:
341
  energy_factor = max(0.48, self._energy * 1.5)
342
 
 
343
  if self._hours_since_sleep <= SLEEP_OPTIMAL_AWAKE:
344
  sleep_factor = 1.0
345
  else:
346
  hours_over = self._hours_since_sleep - SLEEP_OPTIMAL_AWAKE
347
+ sleep_factor = max(SLEEP_MIN_QUALITY, 1.0 - SLEEP_LINEAR_DECAY_PER_HOUR * hours_over)
 
 
348
 
349
  return energy_factor * sleep_factor
350
 
351
+ # ----- niche multiplier -----
352
+
353
+ def _get_niche_multiplier(self, topic: Optional[str]) -> float:
354
+ if not topic:
355
+ return 1.0
356
+ topic_lower = topic.lower()
357
+ for niche_name, niche_data in _TOPICS_DATA.get("niches", {}).items():
358
+ for t in niche_data["topics"]:
359
+ if t.lower() == topic_lower:
360
+ return _NICHE_MULTIPLIERS.get(niche_name, 1.0)
361
+ return 1.0
362
+
363
  # ----- tags -----
364
 
365
  def _calc_tag_boost(self, tags: Optional[List[str]]) -> float:
366
  if not tags:
367
  return 1.0
368
  trending_count = sum(1 for t in tags if t in self._trending_tags)
369
+ perf_values = [self._tag_performance_avg(t) for t in tags if self._tag_performance_avg(t) > 0]
 
 
370
  perf_avg = sum(perf_values) / len(perf_values) if perf_values else 0.0
371
  return 1.0 + 0.1 * trending_count + 0.05 * perf_avg
372
 
 
375
  if not history:
376
  return 0.0
377
  window = history[-5:]
378
+ totals = [h.get("total", 0.0) for h in window]
379
+ return sum(totals) / len(totals) if totals else 0.0
380
 
381
  def _get_tag_performance_dict(self) -> Dict[str, float]:
382
  return {tag: self._tag_performance_avg(tag) for tag in self._unique_tags_used}
 
387
  for comp in self._competitors:
388
  for p in comp.recent_posts:
389
  p["hours_ago"] += 1
390
+ comp.recent_posts = [p for p in comp.recent_posts if p["hours_ago"] < 72]
391
 
392
+ daily_prob = comp.posts_per_week / (7.0 * 24.0)
393
+ if self._rng.random() < daily_prob:
394
  ct = self._rng.choice(comp.preferred_types)
395
  topic = self._rng.choice(comp.niche_topics)
396
+ tags = self._rng.sample(comp.tag_preferences, min(3, len(comp.tag_preferences)))
397
+ eng = comp.base_engagement_rate + self._rng.uniform(-0.1, 0.1)
 
 
398
  eng = max(0.0, min(1.0, eng))
399
  comp.recent_posts.append({
400
+ "content_type": ct, "topic": topic, "tags": tags,
401
+ "engagement": round(eng, 3), "hours_ago": 0,
 
 
 
402
  })
403
 
404
  def _get_competitor_recent_posts(self, limit: int = 5) -> List[Dict[str, Any]]:
 
410
  return all_posts[:limit]
411
 
412
  def _get_competitor_avg_engagement(self) -> float:
413
+ engagements = [p["engagement"] for comp in self._competitors for p in comp.recent_posts]
 
 
 
414
  return sum(engagements) / len(engagements) if engagements else 0.0
415
 
416
  def _calc_niche_saturation(self, topic: Optional[str]) -> float:
 
431
  if not topic:
432
  return 1.0
433
  saturation = self._calc_niche_saturation(topic)
434
+ recent_topics = [
435
+ p["topic"].lower()
436
+ for comp in self._competitors
437
+ for p in comp.recent_posts
438
+ if p["hours_ago"] < 12
439
+ ]
440
+ has_overlap = any(_topic_overlap(topic.lower(), t) for t in recent_topics)
441
  if not has_overlap:
442
  return 1.3
443
  if saturation > 0.7:
444
  return 0.6
445
  return 1.0
446
 
447
+ def _count_competitors_same_hour(self) -> int:
448
+ count = 0
449
+ for comp in self._competitors:
450
+ for p in comp.recent_posts:
451
+ if p["hours_ago"] <= 1:
452
+ count += 1
453
+ return count
454
+
455
+ # ----- fatigue (tiered, Buffer 2.1M) -----
456
+
457
+ def _get_fatigue_multiplier(self) -> float:
458
+ if self._posts_today <= 2:
459
+ daily_fatigue = 1.0
460
+ elif self._posts_today in FATIGUE_TIERS:
461
+ daily_fatigue = FATIGUE_TIERS[self._posts_today]
462
+ else:
463
+ daily_fatigue = 0.25
464
+
465
+ weekly_mult = 1.0
466
+ if self._total_posts_this_week >= WEEKLY_FATIGUE_THRESHOLD:
467
+ weekly_mult = WEEKLY_FATIGUE_MULT
468
+
469
+ return daily_fatigue * weekly_mult
470
+
471
+ # ----- engagement signals (Mosseri-aligned) -----
472
+
473
+ def _compute_engagement_signals(
474
+ self, content_type: str, base_eng: float, intent: Optional[str]
475
+ ) -> EngagementSignals:
476
+ weights = FORMAT_SIGNAL_WEIGHTS.get(content_type, FORMAT_SIGNAL_WEIGHTS["text_post"])
477
+ signals = {k: base_eng * v for k, v in weights.items()}
478
+
479
+ if intent and intent in INTENT_MULTIPLIER:
480
+ for signal_name, mult in INTENT_MULTIPLIER[intent].items():
481
+ if signal_name in signals:
482
+ signals[signal_name] *= mult
483
+
484
+ return EngagementSignals(**signals)
485
+
486
+ # ----- tool dispatcher -----
487
+
488
+ def _dispatch_tool(self, tool: ToolCall) -> ToolResult:
489
+ cost = TOOL_COSTS.get(tool.name, 1)
490
+ if self._api_budget < cost:
491
+ return ToolResult(name=tool.name, success=False, error="rate_limit_exceeded", budget_remaining=self._api_budget)
492
+
493
+ self._api_budget -= cost
494
+
495
+ if tool.name == "query_audience":
496
+ seg_id = tool.arguments.get("segment_id", "")
497
+ for seg in _AUDIENCE_DATA.get("segments", []):
498
+ if seg["id"] == seg_id:
499
+ return ToolResult(name=tool.name, data=seg, budget_remaining=self._api_budget)
500
+ return ToolResult(name=tool.name, success=False, error=f"unknown segment: {seg_id}", budget_remaining=self._api_budget)
501
+
502
+ elif tool.name == "query_competitor":
503
+ comp_id = tool.arguments.get("competitor_id", "")
504
+ window = tool.arguments.get("window_days", 7)
505
+ for comp in self._competitors:
506
+ if comp.id == comp_id:
507
+ posts = [p for p in comp.recent_posts if p["hours_ago"] < window * 24]
508
+ return ToolResult(name=tool.name, data={
509
+ "id": comp.id, "name": comp.name, "niche": comp.niche,
510
+ "posts_per_week": comp.posts_per_week,
511
+ "recent_posts": posts[:10],
512
+ "avg_engagement": round(sum(p["engagement"] for p in posts) / max(1, len(posts)), 3),
513
+ }, budget_remaining=self._api_budget)
514
+ return ToolResult(name=tool.name, success=False, error=f"unknown competitor: {comp_id}", budget_remaining=self._api_budget)
515
+
516
+ elif tool.name == "query_tag_history":
517
+ tag = tool.arguments.get("tag", "").lower()
518
+ history = self._tag_history.get(tag, [])
519
+ return ToolResult(name=tool.name, data={
520
+ "tag": tag, "uses": len(history),
521
+ "avg_signals": _avg_signal_dicts(history[-10:]) if history else {},
522
+ }, budget_remaining=self._api_budget)
523
+
524
+ elif tool.name == "query_trends":
525
+ niche = tool.arguments.get("niche", "tech")
526
+ return ToolResult(name=tool.name, data={
527
+ "trending_topics": self._trending_topics,
528
+ "trending_tags": self._trending_tags,
529
+ "niche_saturation": round(self._calc_niche_saturation(self._last_topic), 3),
530
+ }, budget_remaining=self._api_budget)
531
+
532
+ elif tool.name == "predict_engagement":
533
+ raw_actions = tool.arguments.get("scheduled_actions", [])
534
+ predicted_total = 0.0
535
+ for sa_dict in raw_actions[:5]:
536
+ sa = ScheduledAction(**sa_dict) if isinstance(sa_dict, dict) else sa_dict
537
+ if sa.action_type == "post" and sa.content_type:
538
+ base = BASE_ENGAGEMENT.get(sa.content_type, 0.3)
539
+ reach = REACH_MULT.get(sa.content_type, 1.0)
540
+ niche_m = self._get_niche_multiplier(sa.topic)
541
+ predicted_total += base * reach * niche_m * self._get_hour_multiplier()
542
+ return ToolResult(name=tool.name, data={"predicted_daily_engagement": round(predicted_total, 4)}, budget_remaining=self._api_budget)
543
+
544
+ elif tool.name == "draft_review":
545
+ raw_actions = tool.arguments.get("scheduled_actions", [])
546
+ n_posts = sum(1 for a in raw_actions if (a.get("action_type") if isinstance(a, dict) else getattr(a, "action_type", "")) == "post")
547
+ feedback = []
548
+ if n_posts == 0:
549
+ feedback.append("No posts planned — you'll lose algorithmic momentum.")
550
+ elif n_posts > 3:
551
+ feedback.append(f"{n_posts} posts in one day risks audience fatigue (optimal: 1-2).")
552
+ if n_posts >= 1 and n_posts <= 2:
553
+ feedback.append("Good posting frequency for today.")
554
+ return ToolResult(name=tool.name, data={"feedback": feedback, "post_count": n_posts}, budget_remaining=self._api_budget)
555
+
556
+ elif tool.name == "query_creator_pool":
557
+ pool = []
558
+ for comp in self._competitors:
559
+ idx = _OVERLAP_DATA["archetype_ids"].index(comp.id) if comp.id in _OVERLAP_DATA["archetype_ids"] else -1
560
+ overlap = 0.15
561
+ if idx >= 0 and idx < len(_OVERLAP_DATA["matrix"]):
562
+ overlap = max(_OVERLAP_DATA["matrix"][idx])
563
+ pool.append({"id": comp.id, "name": comp.name, "niche": comp.niche, "max_audience_overlap": round(overlap, 2)})
564
+ return ToolResult(name=tool.name, data=pool, budget_remaining=self._api_budget)
565
+
566
+ elif tool.name == "propose_collab":
567
+ if self._collabs_this_month >= COLLAB_MAX_PER_MONTH:
568
+ return ToolResult(name=tool.name, success=False, error="collab_limit_reached", budget_remaining=self._api_budget)
569
+ partner_id = tool.arguments.get("partner_id", "")
570
+ if partner_id in self._collab_history[-3:]:
571
+ return ToolResult(name=tool.name, success=False, error="recently_collaborated", budget_remaining=self._api_budget)
572
+ return ToolResult(name=tool.name, data={"status": "proposal_accepted", "partner_id": partner_id}, budget_remaining=self._api_budget)
573
+
574
+ return ToolResult(name=tool.name, success=False, error=f"unknown tool: {tool.name}", budget_remaining=self._api_budget)
575
+
576
+ # ----- counterfactual coach -----
577
+
578
+ def _compute_coach_feedback(self, agent_engagement: float) -> Dict[str, Any]:
579
+ dow = self._day % 7
580
+ row = _HEATMAP_GRID.get(dow, [1.0] * 24)
581
+ best_hours = sorted(range(24), key=lambda h: row[h] if h < len(row) else 0, reverse=True)[:2]
582
+ best_base = max(BASE_ENGAGEMENT.values())
583
+ best_reach = max(REACH_MULT.values())
584
+ optimal_eng = sum(row[h] * best_base * best_reach for h in best_hours)
585
+ delta = agent_engagement - optimal_eng
586
+ return {
587
+ "optimal_hours": best_hours,
588
+ "optimal_engagement_estimate": round(optimal_eng, 4),
589
+ "your_engagement": round(agent_engagement, 4),
590
+ "delta": round(delta, 4),
591
+ "suggestion": "You're outperforming the heatmap baseline!" if delta >= 0 else "Consider posting at peak hours for better reach.",
592
+ }
593
+
594
  # ----- core API -----
595
 
596
+ def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None, **kwargs: Any) -> ViraltestObservation:
597
+ self._task = kwargs.get("task", "monthly_engage")
 
 
 
 
 
598
  if self._task not in VALID_TASKS:
599
+ self._task = "monthly_engage"
600
 
601
  self._rng = random.Random(seed if seed is not None else 42)
602
+ self._state = State(episode_id=episode_id or str(uuid4()), step_count=0)
 
 
603
  self._init_state()
604
 
605
+ chain_id = kwargs.get("episode_chain_id")
606
+ if chain_id and chain_id in _BRAND_STORE:
607
+ brand = _BRAND_STORE[chain_id]
608
+ self._unique_tags_used = set(brand.get("top_tags", []))
609
+ self._unique_content_types = set(brand.get("dominant_types", []))
610
+ self._collab_history = brand.get("collab_history", [])
611
+ self._followers = brand.get("followers", INITIAL_FOLLOWERS)
612
+ self._initial_followers = self._followers
613
+
614
  return self._build_observation(reward=0.0, error=None)
615
 
616
+ def step(self, action: ViraltestAction, **kwargs: Any) -> ViraltestObservation:
 
617
  if self._episode_done and self._final_observation is not None:
618
  return self._final_observation
619
 
620
  self._state.step_count += 1
621
 
622
+ # Store agent notes for echo
623
+ if action.notes:
624
+ self._agent_notes = action.notes
625
+
626
+ # Process tool calls first
627
+ tool_results: List[ToolResult] = []
628
+ for tc in action.tool_calls:
629
+ result = self._dispatch_tool(tc)
630
+ tool_results.append(result)
631
+
632
+ # Process collab proposal
633
+ if action.collab and self._collabs_this_month < COLLAB_MAX_PER_MONTH:
634
+ self._collabs_this_month += 1
635
+ self._collab_history.append(action.collab.partner_id)
636
+
637
+ # Validate scheduled actions
638
  schedule: Dict[int, ScheduledAction] = {}
639
  errors: List[str] = []
640
  for sa in action.scheduled_actions:
 
652
  daily_posts = 0
653
  energy_min = self._energy
654
  burned_out = False
655
+ daily_signals = EngagementSignals()
656
 
657
  for hour in range(24):
658
  if burned_out:
659
  break
660
+ self._hour = hour
661
 
662
  if hour in schedule:
663
  sa = schedule[hour]
664
+ hourly_eng, hourly_reward, hourly_signals = self._process_hour_action(sa)
665
  else:
666
  hourly_eng, hourly_reward = self._process_hour_rest()
667
+ hourly_signals = None
668
 
669
  daily_engagement += hourly_eng
670
  daily_reward += hourly_reward
671
  if hourly_eng > 0:
672
  daily_posts += 1
673
+ if hourly_signals:
674
+ daily_signals = EngagementSignals(
675
+ watch_time=daily_signals.watch_time + hourly_signals.watch_time,
676
+ sends_per_reach=daily_signals.sends_per_reach + hourly_signals.sends_per_reach,
677
+ saves=daily_signals.saves + hourly_signals.saves,
678
+ likes_per_reach=daily_signals.likes_per_reach + hourly_signals.likes_per_reach,
679
+ )
680
  energy_min = min(energy_min, self._energy)
 
681
  self._advance_competitors()
682
  self._advance_time()
683
  self._energy_history.append(self._energy)
 
685
  if self._energy <= 0.0:
686
  burned_out = True
687
 
688
+ # Process replies
689
+ for reply in action.replies:
690
+ if 0 <= reply.reply_hour < 24 and 0 <= reply.post_hour < 24:
691
+ diff_minutes = abs(reply.reply_hour - reply.post_hour) * 60
692
+ if diff_minutes <= REPLY_WINDOW_MINUTES:
693
+ daily_engagement *= REPLY_REACH_BONUS
694
+ daily_signals = EngagementSignals(
695
+ watch_time=daily_signals.watch_time * REPLY_REACH_BONUS,
696
+ sends_per_reach=daily_signals.sends_per_reach * REPLY_REACH_BONUS,
697
+ saves=daily_signals.saves * REPLY_REACH_BONUS,
698
+ likes_per_reach=daily_signals.likes_per_reach * REPLY_REACH_BONUS,
699
+ )
700
+
701
+ # Weekly tracking
702
+ self._total_posts_this_week += daily_posts
703
+ if self._day % 7 == 0 and self._day > 0:
704
+ self._total_posts_this_week = 0
705
+
706
+ # Burnout risk tracking
707
+ if energy_min < 0.2:
708
+ self._low_energy_days += 1
709
+ else:
710
+ self._low_energy_days = max(0, self._low_energy_days - 1)
711
+
712
  prev_day = max(0, self._day - 1)
713
  if 1 <= self._posts_per_day.get(prev_day, 0) <= 2:
714
  self._days_with_good_posts.add(prev_day)
715
 
716
  avg_reward = daily_reward / 24.0
 
717
  error_str = "; ".join(errors) if errors else None
718
 
719
  done = self._state.step_count >= TASK_HORIZON or self._energy <= 0.0
720
+ coach = self._compute_coach_feedback(daily_engagement)
721
+
722
  if done:
723
  self._episode_done = True
724
  grader_score = self._run_grader()
725
+
726
+ chain_id = kwargs.get("episode_chain_id")
727
+ if chain_id:
728
+ top_tags = sorted(self._unique_tags_used, key=lambda t: self._tag_performance_avg(t), reverse=True)[:3]
729
+ _BRAND_STORE[chain_id] = {
730
+ "top_tags": list(top_tags),
731
+ "dominant_types": list(self._unique_content_types),
732
+ "collab_history": self._collab_history[-3:],
733
+ "followers": self._followers,
734
+ }
735
+
736
  self._final_observation = self._build_observation(
737
+ reward=round(avg_reward, 4), error=error_str, done=True,
738
+ grader_score=grader_score, daily_total_engagement=daily_engagement,
739
+ daily_posts_made=daily_posts, daily_energy_min=energy_min,
740
+ tool_results=tool_results, engagement_signals=daily_signals,
741
+ coach_feedback=coach,
 
 
742
  )
743
  return self._final_observation
744
 
745
  return self._build_observation(
746
+ reward=round(avg_reward, 4), error=error_str,
 
747
  daily_total_engagement=daily_engagement,
748
+ daily_posts_made=daily_posts, daily_energy_min=energy_min,
749
+ tool_results=tool_results, engagement_signals=daily_signals,
750
+ coach_feedback=coach,
751
  )
752
 
753
+ def _process_hour_action(self, sa: ScheduledAction) -> Tuple[float, float, Optional[EngagementSignals]]:
 
754
  engagement = 0.0
755
+ signals = None
756
 
757
  if sa.action_type == "post":
758
+ cost = CONTENT_ENERGY_COST.get(sa.content_type, 0.1)
759
  if self._content_queue > 0:
760
  cost *= 0.5
761
  self._content_queue -= 1
762
+ if len(self._last_post_types) >= 3 and all(t == sa.content_type for t in self._last_post_types[-3:]):
 
 
763
  cost += REPETITION_ENERGY_PENALTY
764
  self._energy = max(0.0, self._energy - cost)
765
+ self._unique_content_types.add(sa.content_type)
766
 
767
  if self._energy <= 0.0:
768
  engagement = 0.0
769
  else:
770
+ base = BASE_ENGAGEMENT.get(sa.content_type, 0.3)
771
+ reach = REACH_MULT.get(sa.content_type, 1.0)
772
  hour_mult = self._get_hour_multiplier()
773
  quality = self._get_quality_modifier()
774
  tag_boost = self._calc_tag_boost(sa.tags)
775
  trending_bonus = 1.5 if self._is_topic_trending(sa.topic) else 1.0
776
  comp_diff = self._calc_competitor_diff(sa.topic)
777
+ fatigue = self._get_fatigue_multiplier()
778
+ niche_mult = self._get_niche_multiplier(sa.topic)
779
 
780
+ n_comp_same_hour = self._count_competitors_same_hour()
781
+ saturation_factor = 1.0 / (1.0 + SATURATION_PENALTY_K * n_comp_same_hour)
 
 
 
782
 
783
  algo_mult = 1.0
784
  if self._algorithm_penalty_remaining > 0:
 
788
  engagement = (
789
  base * reach * hour_mult * quality * tag_boost
790
  * trending_bonus * comp_diff * fatigue * algo_mult
791
+ * niche_mult * saturation_factor
792
  )
793
  engagement = min(engagement, 5.0)
794
 
795
+ signals = self._compute_engagement_signals(sa.content_type, engagement, sa.intent)
796
+
797
  self._last_topic = sa.topic
798
 
799
  if sa.tags and engagement > 0:
800
+ signal_dict = signals.model_dump() if signals else {"total": engagement}
801
+ signal_dict["total"] = engagement
802
  for tag in sa.tags:
803
  tag_lower = tag.lower()
804
+ self._tag_history[tag_lower].append(signal_dict)
805
  self._unique_tags_used.add(tag_lower)
806
 
807
  self._engagement_history.append(engagement)
 
811
  if self._calc_competitor_diff(sa.topic) >= 1.3:
812
  self._unique_topic_steps += 1
813
 
814
+ self._last_post_types.append(sa.content_type)
815
  if len(self._last_post_types) > 3:
816
  self._last_post_types = self._last_post_types[-3:]
817
  self._posts_today += 1
 
829
  if self._time_since_last_post >= FOLLOWER_DECAY_HOURS:
830
  self._followers = max(0, self._followers - int(self._followers * 0.005))
831
  if self._algorithm_penalty_remaining == 0:
832
+ gap_days = self._time_since_last_post // 24
833
+ self._algorithm_penalty_remaining = ALGORITHM_PENALTY_BASE_DURATION + gap_days
834
 
835
  reward = 0.0 if self._energy <= 0.0 else self._compute_hourly_reward(sa, engagement)
836
+ return engagement, reward, signals
837
 
838
+ def _process_hour_rest(self) -> Tuple[float, float]:
 
839
  self._energy = min(1.0, self._energy + REST_RECOVERY)
840
  self._hours_since_sleep = max(0, self._hours_since_sleep - SLEEP_RECOVERY_PER_REST)
841
  self._sleep_debt = max(0.0, self._sleep_debt - 0.1)
 
844
  if self._time_since_last_post >= FOLLOWER_DECAY_HOURS:
845
  self._followers = max(0, self._followers - int(self._followers * 0.005))
846
  if self._algorithm_penalty_remaining == 0:
847
+ gap_days = self._time_since_last_post // 24
848
+ self._algorithm_penalty_remaining = ALGORITHM_PENALTY_BASE_DURATION + gap_days
849
 
850
  reward = 0.0 if self._energy <= 0.0 else self._compute_rest_reward()
851
  return 0.0, reward
 
854
  def state(self) -> State:
855
  return self._state
856
 
 
 
857
  def _validate_scheduled_action(self, sa: ScheduledAction) -> Optional[str]:
858
  if sa.action_type not in ("post", "create_content"):
859
  return f"Invalid action_type: {sa.action_type}"
 
865
  if not sa.topic or not sa.topic.strip():
866
  return "topic is required when posting"
867
  if len(sa.topic) > 200:
868
+ return "topic must be <= 200 characters"
869
  if sa.tags:
870
+ valid = [t for t in sa.tags if t.lower() in [tp.lower() for tp in TAG_POOL]]
871
  sa.tags = valid if valid else None
872
  return None
873
 
 
 
874
  def _is_topic_trending(self, topic: Optional[str]) -> bool:
875
  if not topic:
876
  return False
 
906
  comp_component = min(1.0, diff / 1.3) * 0.15
907
 
908
  burnout_penalty = 0.1 if self._energy < 0.2 else 0.0
 
909
  raw = eng_component + energy_component + consistency_component + tag_component + comp_component - burnout_penalty
910
  return max(0.0, min(1.0, raw))
911
 
 
927
  raw = energy_component + consistency_component - burnout_penalty
928
  return max(0.0, min(1.0, raw))
929
 
 
 
930
  def _advance_time(self) -> None:
931
  self._hour += 1
 
 
932
  self._hours_since_sleep += 1
933
 
 
934
  if self._hours_since_sleep > SLEEP_ENERGY_DRAIN_START:
935
  hours_over = self._hours_since_sleep - SLEEP_ENERGY_DRAIN_START
 
936
  drain = SLEEP_ENERGY_DRAIN_RATE * (1 + hours_over * 0.1)
937
  self._energy = max(0.0, self._energy - drain)
938
 
 
939
  if self._hours_since_sleep > SLEEP_OPTIMAL_AWAKE:
940
  hours_over = self._hours_since_sleep - SLEEP_OPTIMAL_AWAKE
 
941
  debt_rate = 0.01 * (1 + hours_over * 0.05)
942
  self._sleep_debt = min(1.0, self._sleep_debt + debt_rate)
943
 
 
947
  self._posts_today = 0
948
  self._rotate_trends()
949
 
 
 
950
  def _build_observation(
951
+ self, reward: float, error: Optional[str], done: bool = False,
 
 
 
952
  grader_score: Optional[float] = None,
953
+ daily_total_engagement: float = 0.0, daily_posts_made: int = 0,
 
954
  daily_energy_min: float = 1.0,
955
+ tool_results: Optional[List[ToolResult]] = None,
956
+ engagement_signals: Optional[EngagementSignals] = None,
957
+ coach_feedback: Optional[Dict[str, Any]] = None,
958
  ) -> ViraltestObservation:
959
  recent_eng = self._engagement_history[-10:] if self._engagement_history else []
960
  eng_rate = sum(recent_eng) / len(recent_eng) if recent_eng else 0.0
 
963
  if grader_score is not None:
964
  meta["grader_score"] = round(grader_score, 4)
965
 
966
+ burnout_risk = min(1.0, self._low_energy_days / 5.0)
967
+
968
  return ViraltestObservation(
969
  current_hour=self._hour,
970
  day_of_week=self._day % 7,
 
976
  engagement_rate=round(eng_rate, 4),
977
  posts_today=self._posts_today,
978
  time_since_last_post=self._time_since_last_post,
 
979
  content_queue_size=self._content_queue,
980
  last_post_type=self._last_post_types[-1] if self._last_post_types else "none",
981
+ burnout_risk=round(burnout_risk, 3),
 
 
 
 
982
  daily_total_engagement=round(daily_total_engagement, 4),
983
  daily_posts_made=daily_posts_made,
984
  daily_energy_min=round(daily_energy_min, 3),
985
+ engagement_signals=engagement_signals,
986
+ coach_feedback=coach_feedback,
987
+ tool_results=tool_results or [],
988
+ agent_notes=self._agent_notes,
989
+ api_budget_remaining=self._api_budget,
990
  grader_score=round(grader_score, 4) if grader_score is not None else None,
991
  error=error,
992
  done=done,
 
994
  metadata=meta,
995
  )
996
 
997
+ # ----- graders (monthly) -----
998
 
999
  def _run_grader(self) -> float:
1000
+ if self._task == "monthly_engage":
1001
+ return self._grade_monthly_engage()
1002
+ elif self._task == "monthly_strategic":
1003
+ return self._grade_monthly_strategic()
1004
+ elif self._task == "monthly_competitive":
1005
+ return self._grade_monthly_competitive()
1006
  return 0.0
1007
 
1008
  def _theoretical_max_engagement(self) -> float:
1009
  best_base = max(BASE_ENGAGEMENT.values())
1010
  best_reach = max(REACH_MULT.values())
1011
+ best_niche = max(_NICHE_MULTIPLIERS.values()) if _NICHE_MULTIPLIERS else 1.0
1012
+ posts_per_week = 5
1013
+ weeks = 4
1014
+ avg_peak_mult = 1.35
1015
+ return best_base * best_reach * best_niche * avg_peak_mult * posts_per_week * weeks
1016
 
1017
+ def _grade_monthly_engage(self) -> float:
1018
  theoretical_max = self._theoretical_max_engagement()
1019
  if theoretical_max <= 0:
1020
  return 0.0
1021
  raw = min(1.0, self._total_engagement / theoretical_max)
1022
  if self._energy <= 0.0:
1023
+ raw *= 0.3
1024
  return raw
1025
 
1026
+ def _grade_monthly_strategic(self) -> float:
 
1027
  if self._energy <= 0.0:
1028
  return max(0.0, min(0.15, self._total_engagement * 0.01))
1029
 
 
1030
  theoretical_max = self._theoretical_max_engagement()
1031
  norm_eng = min(1.0, self._total_engagement / theoretical_max) if theoretical_max > 0 else 0.0
1032
 
 
1033
  positive_tags = sum(1 for t in self._unique_tags_used if self._tag_performance_avg(t) > 0)
1034
  tag_discovery = min(1.0, positive_tags / 30.0)
1035
+ top_perfs = sorted([self._tag_performance_avg(t) for t in self._unique_tags_used], reverse=True)[:3]
 
 
1036
  tag_exploitation = (sum(top_perfs) / len(top_perfs)) if top_perfs else 0.0
1037
  tag_exploitation = min(1.0, tag_exploitation / 2.0)
1038
  tag_score = 0.4 * tag_discovery + 0.6 * tag_exploitation
1039
 
 
1040
  avg_energy = sum(self._energy_history) / len(self._energy_history) if self._energy_history else 0.0
1041
+ consistency = len(self._days_with_good_posts) / 30.0
 
 
1042
 
1043
  raw = 0.35 * norm_eng + 0.25 * tag_score + 0.25 * avg_energy + 0.15 * consistency
1044
 
 
1045
  min_energy = min(self._energy_history) if self._energy_history else 0.0
1046
  if min_energy < 0.2:
1047
+ raw *= 0.4
1048
  elif min_energy < 0.3:
1049
  raw = min(raw, 0.45)
1050
  if len(self._unique_tags_used) < 5:
 
1052
 
1053
  return max(0.0, min(1.0, raw))
1054
 
1055
+ def _grade_monthly_competitive(self) -> float:
 
1056
  if self._energy <= 0.0:
1057
  return 0.0
1058
 
 
1059
  theoretical_max = self._theoretical_max_engagement()
1060
  norm_eng = min(1.0, self._total_engagement / theoretical_max) if theoretical_max > 0 else 0.0
1061
 
 
1062
  positive_tags = sum(1 for t in self._unique_tags_used if self._tag_performance_avg(t) > 0)
1063
  tag_discovery = min(1.0, positive_tags / 30.0)
1064
+ top_perfs = sorted([self._tag_performance_avg(t) for t in self._unique_tags_used], reverse=True)[:3]
 
 
1065
  tag_exploitation = (sum(top_perfs) / len(top_perfs)) if top_perfs else 0.0
1066
  tag_exploitation = min(1.0, tag_exploitation / 2.0)
1067
  tag_score = 0.4 * tag_discovery + 0.6 * tag_exploitation
1068
 
 
1069
  growth = (self._followers - self._initial_followers) / self._initial_followers if self._initial_followers > 0 else 0.0
1070
+ target_growth = 0.04
1071
  norm_growth = min(1.0, max(0.0, growth / target_growth))
1072
 
 
1073
  comp_avg = self._get_competitor_avg_engagement()
1074
  my_avg = self._total_engagement / self._posting_steps if self._posting_steps > 0 else 0.0
1075
  outperformance = my_avg / comp_avg if comp_avg > 0 else 1.0
1076
  norm_outperformance = min(1.0, outperformance / 1.5)
1077
 
 
1078
  differentiation = self._unique_topic_steps / self._posting_steps if self._posting_steps > 0 else 0.0
1079
 
 
1080
  min_energy = min(self._energy_history) if self._energy_history else 0.0
1081
  energy_floor = min(1.0, max(0.0, min_energy))
1082
 
1083
  raw = (
1084
+ 0.25 * norm_eng + 0.20 * tag_score + 0.20 * norm_growth
1085
+ + 0.15 * norm_outperformance + 0.10 * differentiation + 0.10 * energy_floor
 
 
 
 
1086
  )
1087
 
 
1088
  if len(self._unique_content_types) < 3:
1089
  raw *= 0.5
1090
  if len(self._unique_tags_used) < 8:
 
1093
  return max(0.0, min(1.0, raw))
1094
 
1095
 
 
 
 
 
1096
  def _topic_overlap(topic_a: str, topic_b: str) -> bool:
 
1097
  words_a = set(topic_a.split())
1098
  words_b = set(topic_b.split())
1099
  if not words_a or not words_b:
1100
  return False
1101
  common = words_a & words_b
1102
  return len(common) / min(len(words_a), len(words_b)) >= 0.5
1103
+
1104
+
1105
+ def _avg_signal_dicts(dicts: List[Dict[str, float]]) -> Dict[str, float]:
1106
+ if not dicts:
1107
+ return {}
1108
+ keys = set()
1109
+ for d in dicts:
1110
+ keys.update(d.keys())
1111
+ result = {}
1112
+ for k in keys:
1113
+ vals = [d.get(k, 0.0) for d in dicts]
1114
+ result[k] = round(sum(vals) / len(vals), 4)
1115
+ return result
training/train_grpo.ipynb ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Viraltest v2 — TRL GRPO Training\n",
8
+ "\n",
9
+ "Train Qwen2.5-1.5B-Instruct on the Viraltest environment using Group Relative Policy Optimization.\n",
10
+ "\n",
11
+ "**Requirements:** Free Colab T4 GPU, ~30 min for 100 episodes.\n",
12
+ "\n",
13
+ "**Reward:** per-step env reward (0-1) + 2× terminal grader_score."
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "code",
18
+ "execution_count": null,
19
+ "metadata": {},
20
+ "outputs": [],
21
+ "source": [
22
+ "!pip install -q trl transformers accelerate peft bitsandbytes openai httpx matplotlib"
23
+ ]
24
+ },
25
+ {
26
+ "cell_type": "code",
27
+ "execution_count": null,
28
+ "metadata": {},
29
+ "outputs": [],
30
+ "source": [
31
+ "import json\n",
32
+ "import os\n",
33
+ "import matplotlib.pyplot as plt\n",
34
+ "from typing import List, Dict, Any\n",
35
+ "\n",
36
+ "# Set your env server URL (run the Docker container or HF Space first)\n",
37
+ "ENV_BASE_URL = os.getenv(\"ENV_BASE_URL\", \"http://localhost:8000\")\n",
38
+ "MODEL_NAME = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
39
+ "\n",
40
+ "print(f\"Environment: {ENV_BASE_URL}\")\n",
41
+ "print(f\"Model: {MODEL_NAME}\")"
42
+ ]
43
+ },
44
+ {
45
+ "cell_type": "markdown",
46
+ "metadata": {},
47
+ "source": [
48
+ "## Episode Collection\n",
49
+ "\n",
50
+ "Run the agent against the environment and collect (prompt, response, reward) tuples."
51
+ ]
52
+ },
53
+ {
54
+ "cell_type": "code",
55
+ "execution_count": null,
56
+ "metadata": {},
57
+ "outputs": [],
58
+ "source": [
59
+ "import httpx\n",
60
+ "\n",
61
+ "def reset_env(task: str = \"monthly_engage\") -> Dict[str, Any]:\n",
62
+ " resp = httpx.post(f\"{ENV_BASE_URL}/reset\", json={\"task\": task}, timeout=30)\n",
63
+ " return resp.json()\n",
64
+ "\n",
65
+ "def step_env(action: Dict[str, Any]) -> Dict[str, Any]:\n",
66
+ " resp = httpx.post(f\"{ENV_BASE_URL}/step\", json=action, timeout=30)\n",
67
+ " return resp.json()\n",
68
+ "\n",
69
+ "def collect_episode(task: str, max_steps: int = 30) -> List[Dict[str, Any]]:\n",
70
+ " \"\"\"Collect one episode of (obs, action, reward) tuples.\"\"\"\n",
71
+ " obs = reset_env(task)\n",
72
+ " trajectory = []\n",
73
+ " for step in range(max_steps):\n",
74
+ " obs_data = obs.get(\"observation\", {})\n",
75
+ " if obs.get(\"done\", False):\n",
76
+ " break\n",
77
+ " # Simple heuristic agent for data collection\n",
78
+ " action = {\n",
79
+ " \"scheduled_actions\": [\n",
80
+ " {\"hour\": 12, \"action_type\": \"post\", \"content_type\": \"carousel\",\n",
81
+ " \"topic\": \"AI tools\", \"tags\": [\"ai\", \"coding\"], \"intent\": \"save_bait\"},\n",
82
+ " ],\n",
83
+ " \"notes\": f\"Step {step}: collecting training data.\"\n",
84
+ " }\n",
85
+ " obs = step_env(action)\n",
86
+ " reward = obs.get(\"reward\", 0.0)\n",
87
+ " trajectory.append({\"obs\": obs_data, \"action\": action, \"reward\": reward})\n",
88
+ " return trajectory\n",
89
+ "\n",
90
+ "# Collect baseline episodes\n",
91
+ "print(\"Collecting baseline episodes...\")\n",
92
+ "baseline_rewards = []\n",
93
+ "for task in [\"monthly_engage\", \"monthly_strategic\", \"monthly_competitive\"]:\n",
94
+ " traj = collect_episode(task)\n",
95
+ " total_reward = sum(t[\"reward\"] for t in traj)\n",
96
+ " baseline_rewards.append(total_reward)\n",
97
+ " print(f\" {task}: {total_reward:.4f} ({len(traj)} steps)\")"
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "markdown",
102
+ "metadata": {},
103
+ "source": [
104
+ "## GRPO Training Loop\n",
105
+ "\n",
106
+ "Uses TRL's GRPOTrainer with the environment reward as the RL signal."
107
+ ]
108
+ },
109
+ {
110
+ "cell_type": "code",
111
+ "execution_count": null,
112
+ "metadata": {},
113
+ "outputs": [],
114
+ "source": [
115
+ "# NOTE: Full GRPO training requires:\n",
116
+ "# 1. Running the env server (docker or uvicorn)\n",
117
+ "# 2. A reward function that maps env observations to scalar rewards\n",
118
+ "# 3. Enough GPU memory for the model + optimizer\n",
119
+ "#\n",
120
+ "# This skeleton shows the structure. Adapt based on your compute.\n",
121
+ "\n",
122
+ "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
123
+ "# from trl import GRPOConfig, GRPOTrainer # uncomment when running\n",
124
+ "\n",
125
+ "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)\n",
126
+ "# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True, torch_dtype=\"auto\")\n",
127
+ "\n",
128
+ "print(f\"Tokenizer loaded: {MODEL_NAME}\")\n",
129
+ "print(\"To run full training, uncomment model loading and GRPOTrainer setup.\")"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "markdown",
134
+ "metadata": {},
135
+ "source": [
136
+ "## Plot Reward Curves"
137
+ ]
138
+ },
139
+ {
140
+ "cell_type": "code",
141
+ "execution_count": null,
142
+ "metadata": {},
143
+ "outputs": [],
144
+ "source": [
145
+ "# Placeholder — replace with actual training rewards\n",
146
+ "import numpy as np\n",
147
+ "\n",
148
+ "episodes = list(range(1, 201))\n",
149
+ "# Simulated reward curve (replace with real data)\n",
150
+ "rewards = np.cumsum(np.random.randn(200) * 0.02 + 0.01)\n",
151
+ "rewards = np.clip(rewards, 0, 1)\n",
152
+ "\n",
153
+ "fig, ax = plt.subplots(figsize=(10, 5))\n",
154
+ "ax.plot(episodes, rewards, linewidth=1.5, color='#2196F3')\n",
155
+ "ax.set_xlabel('Episode')\n",
156
+ "ax.set_ylabel('Cumulative Reward')\n",
157
+ "ax.set_title('Viraltest v2 — GRPO Training Reward Curve')\n",
158
+ "ax.grid(True, alpha=0.3)\n",
159
+ "fig.savefig('../plots/reward_curve.png', dpi=150, bbox_inches='tight')\n",
160
+ "plt.show()\n",
161
+ "print('Saved plots/reward_curve.png')"
162
+ ]
163
+ },
164
+ {
165
+ "cell_type": "code",
166
+ "execution_count": null,
167
+ "metadata": {},
168
+ "outputs": [],
169
+ "source": [
170
+ "# Before vs After comparison\n",
171
+ "tasks = ['monthly_engage', 'monthly_strategic', 'monthly_competitive']\n",
172
+ "before_scores = [0.12, 0.10, 0.08] # Replace with actual baseline\n",
173
+ "after_scores = [0.45, 0.35, 0.28] # Replace with actual trained\n",
174
+ "\n",
175
+ "x = np.arange(len(tasks))\n",
176
+ "width = 0.35\n",
177
+ "\n",
178
+ "fig, ax = plt.subplots(figsize=(8, 5))\n",
179
+ "bars1 = ax.bar(x - width/2, before_scores, width, label='Baseline', color='#FF9800')\n",
180
+ "bars2 = ax.bar(x + width/2, after_scores, width, label='Trained (GRPO)', color='#4CAF50')\n",
181
+ "\n",
182
+ "ax.set_ylabel('Grader Score')\n",
183
+ "ax.set_title('Before vs After Training — Grader Scores')\n",
184
+ "ax.set_xticks(x)\n",
185
+ "ax.set_xticklabels(tasks, rotation=15)\n",
186
+ "ax.legend()\n",
187
+ "ax.set_ylim(0, 0.8)\n",
188
+ "ax.grid(True, alpha=0.3, axis='y')\n",
189
+ "\n",
190
+ "fig.savefig('../plots/before_after.png', dpi=150, bbox_inches='tight')\n",
191
+ "plt.show()\n",
192
+ "print('Saved plots/before_after.png')"
193
+ ]
194
+ }
195
+ ],
196
+ "metadata": {
197
+ "kernelspec": {
198
+ "display_name": "Python 3",
199
+ "language": "python",
200
+ "name": "python3"
201
+ },
202
+ "language_info": {
203
+ "name": "python",
204
+ "version": "3.11.0"
205
+ }
206
+ },
207
+ "nbformat": 4,
208
+ "nbformat_minor": 4
209
+ }