Spaces:

ycwhencpp
/

train-new

Paused

App Files Files Community

train-new / blog /hf_mini_blog.md

anuragredbus

Viraltest env snapshot for HF Space (single root commit; plots as normal files, no LFS).

0813516 13 days ago

preview code

raw

history blame contribute delete

2.87 kB

	# Viraltest v2: Teaching LLMs to Be Instagram Strategists Through World Modeling

	TL;DR: We built an OpenEnv environment where an LLM agent manages an Instagram creator account for 30 simulated days. The agent receives sparse observations and must discover the world — trending topics, competitor behavior, audience segments, posting heatmaps — through a catalog of 8 tools. Every constant is calibrated against peer-reviewed research and large-N industry studies.

	## The Problem

	The $250B creator economy (Goldman Sachs, 2025) has 67 million creators, but 73% experience burnout (Awin, 2024). The core tension: post enough to stay visible in the algorithm, but not so much that quality drops and audiences fatigue. No existing RL environment captures this tradeoff with realistic dynamics.

	## The Environment

	Viraltest v2 simulates a 30-day Instagram creator lifecycle grounded in 10+ verified data sources:

	- Engagement signals decomposed into watch_time, sends_per_reach, saves, and likes_per_reach — matching Adam Mosseri's Jan-2025 official ranking signal confirmation
	- Hour-by-hour heatmap from Buffer's 9.6M-post study cross-validated with Sprout Social's 2B-engagement analysis
	- Sleep/cognitive model based on Van Dongen et al. (2003, Sleep, PMID 12683469) — performance lapses are linear above 16 hours awake
	- Tiered audience fatigue from Buffer's 2.1M-post frequency study — not a cliff but a gradual decay
	- 7 competitor archetypes with realistic posting cadences (3–5/week, not per-day)

	## Theme #3.1: Why This Is World Modeling

	The agent starts each day with almost no information — just energy, followers, and last reward. To plan effectively, it must:

	1. Discover tools (`GET /tools`) on day 1
	2. Query the world — trending topics, competitor activity, audience preferences
	3. Form hypotheses and persist them in a scratchpad (`notes` field)
	4. Test plans via `predict_engagement` before committing
	5. Learn from counterfactual feedback — the environment shadow-runs the optimal heatmap plan and shows the delta

	This isn't prompt engineering. The agent must build and maintain an internal world model across 30 steps.

	## Training

	We trained Qwen2.5-1.5B-Instruct using TRL's GRPO trainer. Reward = per-step environment reward + 2× terminal grader score. After 200 episodes, the trained agent outperforms the untrained baseline on all three tasks (monthly_engage, monthly_strategic, monthly_competitive).

	## Every Number Is Verifiable

	We classify our sources into 4 tiers (peer-reviewed → industry → official → survey) and explicitly reject SEO/affiliate blogs. Full bibliography with DOIs, PMIDs, arXiv IDs, methodology extracts, and sample sizes lives in [RESEARCH.md](../RESEARCH.md).

	[Environment on HF Spaces](#) \| [GitHub repo](#) \| [Training notebook](#)

	# Viraltest v2: Teaching LLMs to Be Instagram Strategists Through World Modeling

	TL;DR: We built an OpenEnv environment where an LLM agent manages an Instagram creator account for 30 simulated days. The agent receives sparse observations and must discover the world — trending topics, competitor behavior, audience segments, posting heatmaps — through a catalog of 8 tools. Every constant is calibrated against peer-reviewed research and large-N industry studies.

	## The Problem

	The $250B creator economy (Goldman Sachs, 2025) has 67 million creators, but 73% experience burnout (Awin, 2024). The core tension: post enough to stay visible in the algorithm, but not so much that quality drops and audiences fatigue. No existing RL environment captures this tradeoff with realistic dynamics.

	## The Environment

	Viraltest v2 simulates a 30-day Instagram creator lifecycle grounded in 10+ verified data sources:

	- Engagement signals decomposed into watch_time, sends_per_reach, saves, and likes_per_reach — matching Adam Mosseri's Jan-2025 official ranking signal confirmation
	- Hour-by-hour heatmap from Buffer's 9.6M-post study cross-validated with Sprout Social's 2B-engagement analysis
	- Sleep/cognitive model based on Van Dongen et al. (2003, Sleep, PMID 12683469) — performance lapses are linear above 16 hours awake
	- Tiered audience fatigue from Buffer's 2.1M-post frequency study — not a cliff but a gradual decay
	- 7 competitor archetypes with realistic posting cadences (3–5/week, not per-day)

	## Theme #3.1: Why This Is World Modeling

	The agent starts each day with almost no information — just energy, followers, and last reward. To plan effectively, it must:

	1. Discover tools (`GET /tools`) on day 1
	2. Query the world — trending topics, competitor activity, audience preferences
	3. Form hypotheses and persist them in a scratchpad (`notes` field)
	4. Test plans via `predict_engagement` before committing
	5. Learn from counterfactual feedback — the environment shadow-runs the optimal heatmap plan and shows the delta

	This isn't prompt engineering. The agent must build and maintain an internal world model across 30 steps.

	## Training

	We trained Qwen2.5-1.5B-Instruct using TRL's GRPO trainer. Reward = per-step environment reward + 2× terminal grader score. After 200 episodes, the trained agent outperforms the untrained baseline on all three tasks (monthly_engage, monthly_strategic, monthly_competitive).

	## Every Number Is Verifiable

	We classify our sources into 4 tiers (peer-reviewed → industry → official → survey) and explicitly reject SEO/affiliate blogs. Full bibliography with DOIs, PMIDs, arXiv IDs, methodology extracts, and sample sizes lives in [RESEARCH.md](../RESEARCH.md).

	[Environment on HF Spaces](#) \| [GitHub repo](#) \| [Training notebook](#)