Spaces:

ycwhencpp
/

final-iteration

Paused

File size: 7,681 Bytes

28dd5a4
 
 
 
 
 
 
 
 
 
 
 
 
fc3950d
28dd5a4
fc3950d
 
28dd5a4
fc3950d
28dd5a4
fc3950d
 
 
 
 
 
 
 
28dd5a4
 
 
fc3950d
28dd5a4
fc3950d
28dd5a4
 
 
 
fc3950d
28dd5a4
 
 
 
fc3950d
28dd5a4
fc3950d
 
 
 
 
 
 
 
28dd5a4
 
fc3950d
28dd5a4
 
 
 
 
 
fc3950d
28dd5a4
fc3950d
28dd5a4
fc3950d
28dd5a4
fc3950d
 
 
 
 
 
28dd5a4
fc3950d
28dd5a4
fc3950d
28dd5a4
fc3950d
28dd5a4
fc3950d
 
 
 
 
28dd5a4
fc3950d
28dd5a4
 
fc3950d
 
 
 
28dd5a4
fc3950d
28dd5a4
fc3950d
 
 
 
 
 
 
 
 
 
28dd5a4
fc3950d
28dd5a4
fc3950d
28dd5a4
fc3950d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28dd5a4
 
 
 
fc3950d
28dd5a4
 
 
 
 
fc3950d
 
 
 
 
28dd5a4
 
 
 
 
 
 
fc3950d
28dd5a4
 
 
 
 
 
fc3950d
 
 
 
28dd5a4
fc3950d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28dd5a4

---
title: Viraltest — Creator Optimization Agent
emoji: 📊
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Viraltest v2 — World-Modeling RL Environment for Instagram Strategy

> **Theme #3.1 — Professional Tasks (World Modeling)**
> An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where an LLM agent manages an Instagram creator account over 30 simulated days, discovering the world through tools rather than being told the rules.

## What this teaches the LLM

| Capability | How the environment tests it |
|---|---|
| **Tool discovery & orchestration** | 8 discoverable tools (`query_trends`, `query_competitor`, `predict_engagement`...). Agent must call `GET /tools` to learn what's available. |
| **Persistent world model** | 30-day horizon. Multi-episode brand chain carries state across months. |
| **Belief tracking** | `notes` field persists hypotheses day-to-day. Agent must update beliefs from tool results. |
| **Causal reasoning** | `coach_feedback` returns counterfactual delta (your plan vs. heatmap-optimal). `predict_engagement` lets agent test hypotheses before committing. |
| **Partial observability** | Default observation is sparse: energy, followers, reward. Rich data (trends, competitors, tags) only via tools. |
| **Multi-step workflow** | Per day: discover → query → draft → predict → commit → reply → learn from feedback. |

## Why this matters

The $250B creator economy ([Goldman Sachs, 2025](https://www.goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027)) has 67M creators, but 73% experience burnout ([Awin, 2024](https://www.prweb.com/releases/a-majority-of-content-creators-and-influencers-struggle-with-burnout-as-concerns-for-ai-begin-to-surface-according-to-a-new-awin-group-survey-research-302257152.html)). This environment turns the posting-vs-burnout tradeoff into a reproducible simulation calibrated against 10+ verifiable sources.

## Quick Start

```python
import asyncio
from viraltest import ViraltestAction, ViraltestEnv
from viraltest.models import ToolCall

async def main():
    env = ViraltestEnv(base_url="http://localhost:8000")
    try:
        result = await env.reset(task="monthly_strategic")
        action = ViraltestAction(
            tool_calls=[
                ToolCall(name="query_trends", arguments={"niche": "tech"}),
            ],
            scheduled_actions=[
                {"hour": 12, "action_type": "post", "content_type": "reel",
                 "topic": "AI tools", "tags": ["ai", "coding"], "intent": "watch_bait"},
            ],
            notes="Day 1: querying trends to establish baseline.",
        )
        result = await env.step(action)
        print(result.observation.engagement_signals)
    finally:
        await env.close()

asyncio.run(main())
```

## Simulation mechanics

### Engagement signals (Mosseri Jan-2025)

Instagram's head confirmed the top-3 ranking signals. Our reward decomposes engagement accordingly:

| Signal | Weight | Best format | Source |
|--------|--------|-------------|--------|
| Watch time | 0.40 | Reels | Mosseri Jan-2025 |
| Sends per reach | 0.30 | Stories | Mosseri Jan-2025 |
| Saves | 0.20 | Carousels | Mosseri Jan-2025 |
| Likes per reach | 0.10 | Text posts | Mosseri Jan-2025 |

### Hour heatmap

7×24 multiplier grid from [Buffer 9.6M posts](https://buffer.com/resources/when-is-the-best-time-to-post-on-instagram) cross-validated with [Sprout Social 2B engagements](https://sproutsocial.com/insights/best-times-to-post-on-social-media/).

### Sleep model

Piecewise-linear from [Van Dongen et al. 2003](https://pubmed.ncbi.nlm.nih.gov/12683469) (*Sleep*, PMID 12683469): no quality loss below 16h awake, then 6.25% per hour, floor at 30%.

### Audience fatigue

Tiered from [Buffer 2.1M study](https://buffer.com/resources/how-often-to-post-on-instagram/): 2 posts/day=1.0×, 3=0.75×, 4=0.50×, 5+=0.25×. Weekly cap at 7 posts → 0.75×.

## Tasks and graders (30 steps each)

| Task | Difficulty | Grader focus |
|------|-----------|--------------|
| `monthly_engage` | Easier | Total engagement vs theoretical max; burnout penalty |
| `monthly_strategic` | Medium | + tag discovery/exploitation + energy + consistency |
| `monthly_competitive` | Hard | + growth vs competitors + differentiation + content diversity |

## Tool catalog

| Tool | Cost | Returns |
|------|------|---------|
| `query_trends` | 1 | Trending topics, tags, niche saturation |
| `query_competitor` | 2 | Recent posts, avg engagement, strategy |
| `query_tag_history` | 1 | Your historical signals per tag |
| `query_audience` | 2 | Segment affinities, active hours |
| `predict_engagement` | 3 | Simulated signals without committing |
| `draft_review` | 3 | Strengths/weaknesses of a plan |
| `query_creator_pool` | 1 | Available collab partners + overlap |
| `propose_collab` | 5 | Propose collaboration (max 2/month) |

API budget starts at 100 per episode.

## Sources & verifiability

Every constant is backed by a Tier 1–3 source. Full bibliography with DOIs, PMIDs, and methodology extracts: **[RESEARCH.md](RESEARCH.md)**.

| Tier | Count | Example |
|------|-------|---------|
| T1 (Peer-reviewed) | 7 papers | Van Dongen 2003, arxiv:2410.13108 |
| T2 (Industry, large-N) | 9 studies | Buffer 9.6M, Sprout 2B, Rival IQ 1.9M |
| T3 (Official) | 1 statement | Mosseri Jan-2025 |
| T4 (Survey) | 2 surveys | Awin 2024 (n=300+) |
| T5 (Rejected) | 13 sites | No methodology disclosed |

## Storytelling assets

- [HuggingFace blog](blog/hf_mini_blog.md)
- [YouTube script (<2 min)](blog/youtube_script.md)
- [Slide deck outline](blog/slide_outline.md)

## Local development

```bash
git clone <repo-url> && cd viraltest
uv sync

# Terminal 1 — API server
uvicorn viraltest.server.app:app --host 0.0.0.0 --port 8000

# Terminal 2 — inference
export HF_TOKEN=hf_...
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
.venv/bin/python inference.py
```

## Docker

```bash
docker build -t viraltest-env:latest .
docker run --rm -p 8000:8000 viraltest-env:latest
curl -s -X POST -H "Content-Type: application/json" -d '{}' http://localhost:8000/reset
```

## Project structure

```
.
├── inference.py                # Tool-discovery agent (no hint keys)
├── openenv.yaml                # OpenEnv manifest
├── models.py                   # Action/Observation + ToolCall, EngagementSignals
├── client.py                   # ViraltestEnv client (async)
├── Dockerfile
├── RESEARCH.md                 # Full sourced bibliography (6+ pages)
├── DESIGN.md                   # Deep design notes
├── blog/
│   ├── hf_mini_blog.md
│   ├── youtube_script.md
│   └── slide_outline.md
├── server/
│   ├── app.py                  # FastAPI + /tools endpoints
│   ├── viraltest_environment.py
│   ├── dashboard.html
│   └── data/
│       ├── tags.json           # ~120 tags, 4 tiers
│       ├── topics.json         # Niche multipliers + seasonal calendar
│       ├── competitors.json    # 7 archetypes
│       ├── hour_heatmap.json   # 7×24 from Buffer+Sprout
│       ├── audience_segments.json
│       └── audience_overlap_matrix.json
├── training/
│   └── train_grpo.ipynb        # TRL GRPO on Qwen2.5-1.5B-Instruct
└── plots/
    ├── reward_curve.png
    └── before_after.png
```

## License

See `LICENSE` in the repository root (BSD-style per upstream OpenEnv examples).