Spaces:
Paused
Paused
File size: 7,681 Bytes
28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 fc3950d 28dd5a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
title: Viraltest — Creator Optimization Agent
emoji: 📊
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Viraltest v2 — World-Modeling RL Environment for Instagram Strategy
> **Theme #3.1 — Professional Tasks (World Modeling)**
> An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where an LLM agent manages an Instagram creator account over 30 simulated days, discovering the world through tools rather than being told the rules.
## What this teaches the LLM
| Capability | How the environment tests it |
|---|---|
| **Tool discovery & orchestration** | 8 discoverable tools (`query_trends`, `query_competitor`, `predict_engagement`...). Agent must call `GET /tools` to learn what's available. |
| **Persistent world model** | 30-day horizon. Multi-episode brand chain carries state across months. |
| **Belief tracking** | `notes` field persists hypotheses day-to-day. Agent must update beliefs from tool results. |
| **Causal reasoning** | `coach_feedback` returns counterfactual delta (your plan vs. heatmap-optimal). `predict_engagement` lets agent test hypotheses before committing. |
| **Partial observability** | Default observation is sparse: energy, followers, reward. Rich data (trends, competitors, tags) only via tools. |
| **Multi-step workflow** | Per day: discover → query → draft → predict → commit → reply → learn from feedback. |
## Why this matters
The $250B creator economy ([Goldman Sachs, 2025](https://www.goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027)) has 67M creators, but 73% experience burnout ([Awin, 2024](https://www.prweb.com/releases/a-majority-of-content-creators-and-influencers-struggle-with-burnout-as-concerns-for-ai-begin-to-surface-according-to-a-new-awin-group-survey-research-302257152.html)). This environment turns the posting-vs-burnout tradeoff into a reproducible simulation calibrated against 10+ verifiable sources.
## Quick Start
```python
import asyncio
from viraltest import ViraltestAction, ViraltestEnv
from viraltest.models import ToolCall
async def main():
env = ViraltestEnv(base_url="http://localhost:8000")
try:
result = await env.reset(task="monthly_strategic")
action = ViraltestAction(
tool_calls=[
ToolCall(name="query_trends", arguments={"niche": "tech"}),
],
scheduled_actions=[
{"hour": 12, "action_type": "post", "content_type": "reel",
"topic": "AI tools", "tags": ["ai", "coding"], "intent": "watch_bait"},
],
notes="Day 1: querying trends to establish baseline.",
)
result = await env.step(action)
print(result.observation.engagement_signals)
finally:
await env.close()
asyncio.run(main())
```
## Simulation mechanics
### Engagement signals (Mosseri Jan-2025)
Instagram's head confirmed the top-3 ranking signals. Our reward decomposes engagement accordingly:
| Signal | Weight | Best format | Source |
|--------|--------|-------------|--------|
| Watch time | 0.40 | Reels | Mosseri Jan-2025 |
| Sends per reach | 0.30 | Stories | Mosseri Jan-2025 |
| Saves | 0.20 | Carousels | Mosseri Jan-2025 |
| Likes per reach | 0.10 | Text posts | Mosseri Jan-2025 |
### Hour heatmap
7×24 multiplier grid from [Buffer 9.6M posts](https://buffer.com/resources/when-is-the-best-time-to-post-on-instagram) cross-validated with [Sprout Social 2B engagements](https://sproutsocial.com/insights/best-times-to-post-on-social-media/).
### Sleep model
Piecewise-linear from [Van Dongen et al. 2003](https://pubmed.ncbi.nlm.nih.gov/12683469) (*Sleep*, PMID 12683469): no quality loss below 16h awake, then 6.25% per hour, floor at 30%.
### Audience fatigue
Tiered from [Buffer 2.1M study](https://buffer.com/resources/how-often-to-post-on-instagram/): 2 posts/day=1.0×, 3=0.75×, 4=0.50×, 5+=0.25×. Weekly cap at 7 posts → 0.75×.
## Tasks and graders (30 steps each)
| Task | Difficulty | Grader focus |
|------|-----------|--------------|
| `monthly_engage` | Easier | Total engagement vs theoretical max; burnout penalty |
| `monthly_strategic` | Medium | + tag discovery/exploitation + energy + consistency |
| `monthly_competitive` | Hard | + growth vs competitors + differentiation + content diversity |
## Tool catalog
| Tool | Cost | Returns |
|------|------|---------|
| `query_trends` | 1 | Trending topics, tags, niche saturation |
| `query_competitor` | 2 | Recent posts, avg engagement, strategy |
| `query_tag_history` | 1 | Your historical signals per tag |
| `query_audience` | 2 | Segment affinities, active hours |
| `predict_engagement` | 3 | Simulated signals without committing |
| `draft_review` | 3 | Strengths/weaknesses of a plan |
| `query_creator_pool` | 1 | Available collab partners + overlap |
| `propose_collab` | 5 | Propose collaboration (max 2/month) |
API budget starts at 100 per episode.
## Sources & verifiability
Every constant is backed by a Tier 1–3 source. Full bibliography with DOIs, PMIDs, and methodology extracts: **[RESEARCH.md](RESEARCH.md)**.
| Tier | Count | Example |
|------|-------|---------|
| T1 (Peer-reviewed) | 7 papers | Van Dongen 2003, arxiv:2410.13108 |
| T2 (Industry, large-N) | 9 studies | Buffer 9.6M, Sprout 2B, Rival IQ 1.9M |
| T3 (Official) | 1 statement | Mosseri Jan-2025 |
| T4 (Survey) | 2 surveys | Awin 2024 (n=300+) |
| T5 (Rejected) | 13 sites | No methodology disclosed |
## Storytelling assets
- [HuggingFace blog](blog/hf_mini_blog.md)
- [YouTube script (<2 min)](blog/youtube_script.md)
- [Slide deck outline](blog/slide_outline.md)
## Local development
```bash
git clone <repo-url> && cd viraltest
uv sync
# Terminal 1 — API server
uvicorn viraltest.server.app:app --host 0.0.0.0 --port 8000
# Terminal 2 — inference
export HF_TOKEN=hf_...
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
.venv/bin/python inference.py
```
## Docker
```bash
docker build -t viraltest-env:latest .
docker run --rm -p 8000:8000 viraltest-env:latest
curl -s -X POST -H "Content-Type: application/json" -d '{}' http://localhost:8000/reset
```
## Project structure
```
.
├── inference.py # Tool-discovery agent (no hint keys)
├── openenv.yaml # OpenEnv manifest
├── models.py # Action/Observation + ToolCall, EngagementSignals
├── client.py # ViraltestEnv client (async)
├── Dockerfile
├── RESEARCH.md # Full sourced bibliography (6+ pages)
├── DESIGN.md # Deep design notes
├── blog/
│ ├── hf_mini_blog.md
│ ├── youtube_script.md
│ └── slide_outline.md
├── server/
│ ├── app.py # FastAPI + /tools endpoints
│ ├── viraltest_environment.py
│ ├── dashboard.html
│ └── data/
│ ├── tags.json # ~120 tags, 4 tiers
│ ├── topics.json # Niche multipliers + seasonal calendar
│ ├── competitors.json # 7 archetypes
│ ├── hour_heatmap.json # 7×24 from Buffer+Sprout
│ ├── audience_segments.json
│ └── audience_overlap_matrix.json
├── training/
│ └── train_grpo.ipynb # TRL GRPO on Qwen2.5-1.5B-Instruct
└── plots/
├── reward_curve.png
└── before_after.png
```
## License
See `LICENSE` in the repository root (BSD-style per upstream OpenEnv examples).
|