vaibhav12332112312 commited on
Commit
225cdfe
·
1 Parent(s): 360c721
blog/blog.md DELETED
@@ -1,211 +0,0 @@
1
- # Viraltest: We Taught an LLM to Run an Instagram Account for 30 Days — and It Started Getting Smart
2
-
3
- > **Theme #3.1 — Professional Tasks (World Modeling)**
4
- > An OpenEnv environment where an LLM doesn't *play* Instagram, it *runs* one. No reset button on bad days. No leaked rules. Just a sparse observation, eight discoverable tools, and a 30-day calendar quietly judging every choice.
5
-
6
- ---
7
-
8
- ## TL;DR
9
-
10
- Most LLM benchmarks are one-shot trivia. Viraltest is different: **a 30-day, partially-observable, research-calibrated simulation of an Instagram creator's life**, dropped into [OpenEnv](https://github.com/meta-pytorch/OpenEnv). Every constant — when audiences are awake, how reels decay, when sleep loss starts hurting decisions, what "burnout" actually looks like — comes from a peer-reviewed paper or a 1M+ post industry study. We trained Qwen2.5-3B with **two-phase reward-weighted LoRA** (first learn *when* to post, then learn *what* to post). The reward curve climbs. The agent stops spamming text posts at 3 AM. It starts asking the right questions on day 1.
11
-
12
- This blog is the story of why, and how.
13
-
14
- ---
15
-
16
- ## 1. The Problem: LLMs Can Write a Caption, but Can They Run a Brand?
17
-
18
- Ask any LLM to write you "an Instagram caption about morning coffee" — flawless. Ask it to run a creator account for a month, where:
19
-
20
- - you have a finite energy budget,
21
- - audiences sleep at night and skip work-hour reels,
22
- - the algorithm punishes you for going dark for 3 days,
23
- - spamming comments gets you shadowbanned,
24
- - collabs only help if your audiences barely overlap,
25
- - and burnout is a slow, accumulating thing — not a flag,
26
-
27
- …and the model collapses. It posts ten reels on a Tuesday morning. It uses the same three hashtags forever. It schedules a story at 4 AM. It tries to "engage" by liking 80 posts. None of these are *wrong* tokens — they're wrong *strategies*.
28
-
29
- That's the capability gap we wanted to test:
30
-
31
- > **Can an LLM build and maintain an internal world model — across 30 long-horizon steps — when nobody hands it the rules?**
32
-
33
- The creator economy is the perfect testbed. It's a $250B market with 67M creators ([Goldman Sachs, 2025](https://www.goldmansachs.com/insights/articles/the-creator-economy-could-approach-half-a-trillion-dollars-by-2027)), 73% of whom report burnout ([Awin, 2024](https://www.prweb.com/releases/a-majority-of-content-creators-and-influencers-struggle-with-burnout-as-concerns-for-ai-begin-to-surface-according-to-a-new-awin-group-survey-research-302257152.html)). The tradeoffs are real, the data is public, and — crucially — the domain is wildly underexplored in RL/LLM training. Most envs stop at chess, gridworlds, and toy text games. We wanted something a researcher could actually publish a paper on.
34
-
35
- ## 2. Meet the Environment
36
-
37
- Every step is **one day**. Episodes run **30 days**. Each day the agent gets a deliberately *sparse* observation:
38
-
39
- ```python
40
- observation = ViraltestObservation(
41
- creator_energy=0.78,
42
- followers=10_420,
43
- reward=0.31,
44
- engagement_rate=0.041,
45
- notes="Day 1: I have no idea what people like.",
46
- # ...and barely anything else, until you ask.
47
- )
48
- ```
49
-
50
- To learn the world, it must call tools — and it has to discover that they exist.
51
-
52
- | Tool | Cost | What it reveals |
53
- |---|---|---|
54
- | `query_trends` | 1 | Trending topics + tags for a niche |
55
- | `query_competitor` | 2 | What 7 archetypal creators are doing |
56
- | `query_audience` | 2 | Segment affinities + active hours |
57
- | `query_tag_history` | 1 | Your own past performance per tag |
58
- | `predict_engagement` | 3 | Counterfactual: "what if I posted this?" |
59
- | `draft_review` | 3 | Strengths/weaknesses of a plan |
60
- | `query_creator_pool` | 1 | Available collab partners + overlap |
61
- | `propose_collab` | 5 | Co-author with another creator |
62
-
63
- The agent's **first move on day 1** has to be `GET /tools`. There's no list in the prompt. World modeling, by construction.
64
-
65
- ### The Reward, Decomposed Like Instagram Actually Ranks Posts
66
-
67
- Instagram's head Adam Mosseri publicly confirmed the top ranking signals in January 2025. We don't reward "engagement" as one number — we decompose it:
68
-
69
- ```python
70
- reward = 0.40 * watch_time
71
- + 0.30 * sends_per_reach
72
- + 0.20 * saves
73
- + 0.10 * likes_per_reach
74
- - fatigue_penalty
75
- - sleep_penalty
76
- - shadowban_penalty
77
- + collab_uplift
78
- ```
79
-
80
- Each format has a natural strength. Reels are watch-time machines. Stories drive sends. Carousels get saved. Text posts get liked. The agent has to learn this — we don't tell it.
81
-
82
- ## 3. The Best Part: Every Number Comes From a Paper
83
-
84
- This is where Viraltest stops being a hackathon toy and starts looking like research infrastructure. Here's how literature shaped the simulation:
85
-
86
- | Mechanic | What it does | Source |
87
- |---|---|---|
88
- | **Hour heatmap (7×24)** | When you post matters — Wed 12pm slaps, Sat 4 AM doesn't | [Buffer 9.6M posts](https://buffer.com/resources/when-is-the-best-time-to-post-on-instagram) cross-validated with [Sprout Social 2B engagements](https://sproutsocial.com/insights/best-times-to-post-on-social-media/) |
89
- | **Sleep model** | Quality decays linearly past 16h awake, floor at 30% | [Van Dongen et al. 2003, *Sleep*, PMID 12683469](https://pubmed.ncbi.nlm.nih.gov/12683469) — the canonical sleep deprivation RCT |
90
- | **Fatigue tiers** | 2 posts/day = 1.0×, 5+ collapse to 0.25× | [Buffer 2.1M posts × 102K accounts](https://buffer.com/resources/how-often-to-post-on-instagram/) |
91
- | **Tiered diminishing returns (no hard caps)** | Marginal-cost over binary thresholds | [Cen et al. 2024, arXiv:2410.13108](https://arxiv.org/abs/2410.13108) — disengagement-aware policies |
92
- | **Format reach multipliers** | Reels reach 2.25× static images | [Socialinsider 31M post study](https://www.socialinsider.io/blog/instagram-content-research) |
93
- | **Niche × niche engagement curves** | Tech 0.33%, Higher Ed 2.10%, etc. | [Rival IQ 1.9M posts × 2,100 brands](https://www.rivaliq.com/blog/social-media-industry-benchmark-report/) |
94
- | **Collab math** | Same niche + low overlap = HIGH; diff niche capped below | [Later 2023](https://later.com/blog/instagram-collab-posts) + [HypeAuditor 2024](https://hypeauditor.com/blog/influencer-collaboration) |
95
- | **Burnout accumulator** | Stress → exhaustion → reduced perf | [Cao et al. 2024, *Educ Inf Technol*](https://doi.org/10.1007/s10639-023-12213-6) + [Wen et al. 2026, *Sci Rep*](https://www.nature.com/articles/s41598-026-42958-2) |
96
- | **Reward decomposition (4 signals)** | Watch + sends + saves + likes, weighted | Mosseri Jan-2025 (Tier 3 official) |
97
-
98
- We even maintain a **rejection list** — 13 SEO/affiliate blogs we *refused* to cite because they don't disclose methodology. The full bibliography (with DOIs, PMIDs, sample sizes) lives in [`RESEARCH.md`](../RESEARCH.md). Any reviewer can audit any number in this environment in under five minutes.
99
-
100
- ## 4. Two-Phase Training: The "Sweet Spot" Has Two Dimensions
101
-
102
- Here's the design idea we're proudest of. Real creator success isn't one skill — it's at least two:
103
-
104
- 1. **WHEN to post** (timing, frequency, cadence — heatmap-driven)
105
- 2. **WHAT to post** (format mix, intent variety, tag discovery — content-driven)
106
-
107
- A single reward signal makes the LLM split the difference and master neither. So we **split training into phases**, each with its own reward shaping:
108
-
109
- | Phase | Reward focus | What the agent learns |
110
- |---|---|---|
111
- | **Phase 1 — Timing** | Heatmap multiplier, fatigue penalty, sleep model | Stop posting at 4 AM. Don't drop 6 reels on Monday. Sleep matters. |
112
- | **Phase 2 — Content** | Format diversity, intent matching, tag discovery | Mix reels + carousels. Match `intent` to format. Explore tags before exploiting. |
113
-
114
- Phase 1's LoRA adapter persists into Phase 2 — so timing competence isn't *forgotten*, it's *built on*. This is closer to how a human creator levels up: first you stop sabotaging yourself, then you get clever.
115
-
116
- And the architecture is **extensible**. Want to train a "collab specialist"? Add a `collab` reward mode. Want to study "burnout-aware posting"? Add a `wellness` mode. Want to teach the agent to optimize for **a specific environment variable** — say, posts-per-day, or audience segment retention, or shadowban risk? Plug a new reward mode into `env.reset(reward_mode="...")` and a new system prompt into the phase config. The training loop doesn't care.
117
-
118
- ```python
119
- PHASES = [
120
- {"name": "phase1_timing", "reward_mode": "timing", "system": SYSTEM_PROMPT_TIMING},
121
- {"name": "phase2_content", "reward_mode": "content", "system": SYSTEM_PROMPT_CONTENT},
122
- # add your own phase here ↓
123
- # {"name": "phase3_collab", "reward_mode": "collab", "system": SYSTEM_PROMPT_COLLAB},
124
- ]
125
- ```
126
-
127
- This is the kind of design that researchers can fork. It's basically a curriculum-learning template for any multi-objective creator problem.
128
-
129
- ## 5. Did It Actually Learn? (The Bit That Counts for 20%)
130
-
131
- Yes. Here are the real numbers from `run-output/plots/training_summary.json` — Qwen2.5-3B-Instruct, LoRA SFT, 2 rounds × 6 episodes:
132
-
133
- **Reward climbs round-over-round:**
134
-
135
- | Round | avg episode reward | max episode reward | avg grader | max grader | train loss |
136
- |---|---|---|---|---|---|
137
- | 1 | 3.904 | 4.514 | 0.620 | 0.827 | 2.672 |
138
- | 2 | **4.215** | **4.658** | **0.732** | **0.870** | **2.593** |
139
-
140
- That's **+8% mean reward**, **+18% mean grader score**, and **train loss dropping** — the model is genuinely learning weights, not just resampling prompts.
141
-
142
- **Vs. baseline (the smart heuristic) on the held-out evaluation:**
143
-
144
- | Task | Smart heuristic baseline | Trained agent (after) |
145
- |---|---|---|
146
- | `monthly_engage` | 0.7352 | **1.000** |
147
- | `monthly_strategic` | 0.9043 | 0.842 |
148
- | `monthly_competitive` | 0.9066 | **0.964** |
149
-
150
- The trained agent **matches or beats** the rule-based heuristic on 2 of 3 tasks. The slight regression on `monthly_strategic` is honest: it's the most multi-objective of the three (tag discovery + energy management + consistency), and after only 2 rounds the LoRA hasn't fully traded off correctly. More rounds and a third "diversity" phase are the obvious next step — and the architecture supports it without code changes.
151
-
152
- **Plots:**
153
- - `plots/reward_curve.png` — round-by-round reward
154
- - `plots/before_after.png` — baseline vs trained
155
- - `plots/training_trajectories.png` — per-task learning curves
156
- - `plots/baseline_leaderboard.png` — 5 heuristic baselines we beat
157
-
158
- ## 6. Where We're Honest About Shortcomings
159
-
160
- A research-quality environment has to admit what's mocked vs. real. Here's the unvarnished list:
161
-
162
- | Concern | Status today | Why / Plan |
163
- |---|---|---|
164
- | **Negative comments / sentiment hits** | Not implemented — comments only ever *help* engagement right now | Real Instagram posts hurt feelings; some go viral *for the wrong reasons*. Modeling this needs an LLM-based sentiment scorer in the env loop. **Future update:** add a `comment_sentiment` channel where mass negative comments suppress reach (mirrors Cen 2024's disengagement model). |
165
- | **Followers always grow if you post** | Currently true | This is the biggest "video game" assumption. In reality, a tone-deaf post can lose followers. **Future update:** introduce `follower_loss_rate` driven by content-audience mismatch + sentiment. |
166
- | **Abusive / unsafe content detection** | Not implemented | Detecting toxicity reliably needs an LLM-in-the-loop (a la Llama-Guard). For the hackathon we kept the env deterministic and reproducible. **Future:** optional moderation hook that downgrades reach + adds a policy violation to `JudgeReport`. |
167
- | **Sponsorship offers** | Mocked: deterministic schedule per archetype | Real sponsorships depend on niche, follower count, recency, and engagement quality. We have the building blocks — just not the marketplace yet. |
168
- | **Collaborator follower counts** | Mocked from `audience_overlap_matrix.json` | Real follower numbers are noisy and platform-API-gated. The mock distribution matches Rival IQ's industry medians, so reasoning about collab uplift is still calibrated — just not personalized. |
169
- | **Hour heatmap, fatigue tiers, sleep curve, niche multipliers, format reach** | **Real** — backed by the studies in §3 | These are the load-bearing numbers, and they're sourced. |
170
-
171
- We list this openly because we want a researcher to read it and think *"these are tractable extensions, not foundational holes"*. They are.
172
-
173
- ## 7. Why This Matters (and Who Should Care)
174
-
175
- - **For RL/LLM researchers:** A reproducible, partially-observable, long-horizon environment with a *believable* reward landscape — calibrated to public datasets. Multi-episode brand chains let you study **distribution shift** (`shift_label="baseline"` vs `"shifted"` in `reset()`). The headline `vs_baseline_pct`, `score_per_tool_call`, and `retention_under_shift` are built into every final observation.
176
- - **For curriculum-learning folks:** Two-phase training with reward-mode switching is a clean ablation surface. Add phases. Reorder them. See what catastrophically forgets.
177
- - **For agent-eval people:** Every day emits a deterministic, explainable `JudgeReport(policy_compliance, sustainability_risk, strategic_quality, violations)`. Auditable rules cite their sources (Buffer 2.1M, Van Dongen, Cen 2024). It's basically a regulator built into the env.
178
- - **For creators / agencies:** The `predict_engagement` tool is genuinely useful — it's a counterfactual sandbox for "what if I shifted my Monday reel to Wednesday afternoon?" calibrated to industry data.
179
-
180
- > A reviewer should be able to read our README in 3–5 minutes and want to try the env. We've tried hard to earn that.
181
-
182
- ## 8. The Journey, In One Paragraph
183
-
184
- We started with the same instinct everyone has — *"build a chess clone, but for tweets"* — and threw it out within a week. The interesting question wasn't "can the LLM win at engagement?" — it was *"can it learn the world from sparse signals?"*. So we shrunk the observation, exploded the tool catalog, and went paper-hunting. We rejected 13 SEO blogs that wouldn't show their math. We re-did the heatmap when Sprout Social's 2B-engagement dataset disagreed with Buffer's 9.6M. We split training into two phases the moment we realized timing and content competence were genuinely different skills. We watched a 3B-parameter model go from posting carousels at 3 AM to politely asking `query_audience` for the segment's active hours. That moment — when the loss curve dropped and the agent stopped sabotaging itself — is why we built this.
185
-
186
- ## 9. Try It
187
-
188
- - **HuggingFace Space:** [Viraltest live env](#) *(replace with your published Space URL)*
189
- - **GitHub repo:** [`viraltest`](#)
190
- - **Training notebook (Colab T4):** [`training/train_grpo.ipynb`](../training/train_grpo.ipynb)
191
- - **Full bibliography:** [`RESEARCH.md`](../RESEARCH.md) — every constant traceable to a DOI / PMID / arXiv ID
192
- - **Design notes:** [`DESIGN.md`](../DESIGN.md)
193
- - **2-min video script:** [`blog/youtube_script.md`](youtube_script.md)
194
- - **Pitch deck outline:** [`blog/slide_outline.md`](slide_outline.md)
195
-
196
- Quick local spin-up:
197
-
198
- ```bash
199
- git clone <repo-url> && cd viraltest
200
- uv sync
201
- uvicorn server.app:app --host 0.0.0.0 --port 8000
202
- # in another terminal:
203
- export HF_TOKEN=hf_... MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
204
- .venv/bin/python inference.py
205
- ```
206
-
207
- If you fork it to add a sentiment channel, a sponsorship marketplace, or a third training phase — please tell us. That's exactly the point.
208
-
209
- ---
210
-
211
- *Built for the OpenEnv Hackathon. Numbers are from real runs in `run-output/plots/training_summary.json`. Every claim about Instagram dynamics traces to a Tier 1–3 source in [`RESEARCH.md`](../RESEARCH.md). If you can't audit it, we didn't cite it.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
blog/slide_outline.md DELETED
@@ -1,58 +0,0 @@
1
- # Viraltest v2 — Pitch Deck Outline (8 slides)
2
-
3
- ## Slide 1: Title
4
- - **Viraltest v2: Teaching LLMs World Modeling Through Instagram Strategy**
5
- - Theme #3.1 — Professional Tasks
6
- - OpenEnv Hackathon India 2026
7
- - Team: [your team name]
8
-
9
- ## Slide 2: The Problem
10
- - $250B creator economy, 67M creators (Goldman Sachs 2025)
11
- - 73% experience burnout; Instagram drives 88% of it (Awin 2024)
12
- - Algorithm changes constantly — no one tells you the rules
13
- - Existing tools show analytics but don't teach strategy
14
- - **Gap:** No RL environment captures this tradeoff with realistic dynamics
15
-
16
- ## Slide 3: The World
17
- - 30-day Instagram simulation (monthly cycle)
18
- - Mosseri-aligned signals: watch_time, sends, saves, likes (official Jan 2025)
19
- - Hour-by-hour heatmap (Buffer 9.6M + Sprout 2B)
20
- - 7 competitor archetypes, 5 audience segments, ~120 tags
21
- - Piecewise-linear sleep model (Van Dongen 2003, *Sleep*)
22
- - Tiered audience fatigue (Buffer 2.1M)
23
-
24
- ## Slide 4: The Tools (Theme #3.1 Fit)
25
- - Agent starts with SPARSE observation (energy, followers, reward)
26
- - 8 discoverable tools: query_trends, query_competitor, query_audience, query_tag_history, predict_engagement, draft_review, query_creator_pool, propose_collab
27
- - API budget (100/episode) — can't query everything, must prioritize
28
- - Notes field for hypothesis tracking across days
29
- - Counterfactual coach: "here's what would have happened with optimal timing"
30
-
31
- ## Slide 5: Training Pipeline
32
- - TRL GRPO on Qwen2.5-1.5B-Instruct (free Colab T4)
33
- - Reward: per-step env reward + 2× terminal grader score
34
- - 200 episodes, batch 4, 50 GRPO steps
35
- - 3 tasks: monthly_engage → monthly_strategic → monthly_competitive
36
- - Multi-episode chain: brand state persists across months
37
-
38
- ## Slide 6: Results
39
- - [Embed reward_curve.png — ascending curve over training]
40
- - [Embed before_after.png — smart baseline vs trained agent per task]
41
- - Trained agent: uses tools on day 1, adapts strategy by day 5, manages energy throughout
42
- - Score improvement on monthly_competitive: [X% → Y%]
43
-
44
- ## Slide 7: Sources & Verifiability
45
- - 4-tier source quality bar (peer-reviewed → industry → official → survey)
46
- - 7 Tier-1 papers, 9 Tier-2 studies, 1 Tier-3 official statement
47
- - Every constant has a DOI/PMID/arXiv ID
48
- - Tier-5 SEO blogs explicitly rejected (13 sites listed with rationale)
49
- - Full bibliography: RESEARCH.md (~6 pages)
50
- - **Any number in this presentation can be debated — we welcome it**
51
-
52
- ## Slide 8: Try It
53
- - HF Space: [link]
54
- - GitHub: [link]
55
- - Training notebook: [Colab link]
56
- - Blog: [HF post link]
57
- - Video: [YouTube link]
58
- - **Questions?**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
blog/youtube_script.md DELETED
@@ -1,40 +0,0 @@
1
- # Viraltest v2 — YouTube Script (<2 minutes)
2
-
3
- ## Storyboard
4
-
5
- ### Shot 1: Hook (0:00–0:10)
6
- **Visual:** Split screen — left: scrolling Instagram feed, right: an LLM terminal making decisions
7
- **Voiceover:** "What if an AI agent could learn to run your Instagram account — not from a prompt, but by discovering the rules of the world itself?"
8
- **On-screen text:** "Viraltest v2 — World Modeling for Instagram"
9
-
10
- ### Shot 2: The Problem (0:10–0:25)
11
- **Visual:** Stats flying in — "$250B creator economy" (Goldman Sachs 2025), "73% burnout" (Awin 2024), "67M creators"
12
- **Voiceover:** "67 million creators compete for attention. 73% burn out. The algorithm changes constantly. No one tells you the rules."
13
- **Citation badge:** Goldman Sachs 2025 · Awin 2024
14
-
15
- ### Shot 3: The Environment (0:25–0:50)
16
- **Visual:** Animated diagram — agent receives sparse observation → calls tools → gets data → plans day
17
- **Voiceover:** "We built a 30-day Instagram simulation. The agent sees almost nothing — just energy, followers, and last reward. To learn, it must use 8 discoverable tools: query trends, check competitors, test plans before committing."
18
- **On-screen text:** "8 tools · 5 audience segments · 7 competitor archetypes · 30-day horizon"
19
- **Citation badge:** Buffer 9.6M · Sprout Social 2B · Van Dongen 2003
20
-
21
- ### Shot 4: The Science (0:50–1:10)
22
- **Visual:** Side-by-side comparison tables showing env constants vs. source data
23
- **Voiceover:** "Every number comes from real research. Engagement rates from Socialinsider's 31-million post study. Peak hours from Buffer's 9.6-million post analysis. Sleep decay from a 2003 Sleep journal paper. Algorithm signals from Instagram's own head, Adam Mosseri."
24
- **Citation badge:** Mosseri Jan-2025 · Socialinsider 2026 · PMID 12683469
25
-
26
- ### Shot 5: Training Results (1:10–1:30)
27
- **Visual:** Reward curve plot (ascending), before/after bar chart
28
- **Voiceover:** "We trained Qwen 2.5 1.5B using TRL GRPO. After 200 episodes, the agent learned to use tools strategically, post at peak hours, diversify content types, and manage energy — outperforming the baseline on all three tasks."
29
- **On-screen text:** reward curve + score comparison
30
-
31
- ### Shot 6: Theme Fit + Close (1:30–1:50)
32
- **Visual:** Theme #3.1 checklist being checked off — tool discovery, partial observability, persistent state, causal reasoning, multi-step workflow
33
- **Voiceover:** "This is Theme 3.1: World Modeling. Real tool interaction. Persistent state across months. Causal reasoning through counterfactual feedback. Not a toy — a simulation grounded in science."
34
- **On-screen text:** "All sources: RESEARCH.md · Code: github.com/... · Try it: HF Spaces"
35
-
36
- ---
37
-
38
- **Total runtime:** ~1:50
39
- **Music:** Upbeat lo-fi instrumental (no lyrics)
40
- **Aspect ratio:** 16:9 landscape
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/hf_run_space_train_job.sh CHANGED
@@ -8,7 +8,7 @@
8
  set -euo pipefail
9
 
10
  IMAGE="${HF_JOB_IMAGE:-pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime}"
11
- FLAVOR="${HF_JOB_FLAVOR:-a10g-largex4}"
12
  TIMEOUT="${HF_JOB_TIMEOUT:-8h}"
13
  SPACE_REPO="${HF_SPACE_REPO_ID:-vaibhavkhandare/train-bhai-train}"
14
  NB_EXEC_TIMEOUT="${NB_EXEC_TIMEOUT:-3600}"
 
8
  set -euo pipefail
9
 
10
  IMAGE="${HF_JOB_IMAGE:-pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime}"
11
+ FLAVOR="${HF_JOB_FLAVOR:-a100x4}"
12
  TIMEOUT="${HF_JOB_TIMEOUT:-8h}"
13
  SPACE_REPO="${HF_SPACE_REPO_ID:-vaibhavkhandare/train-bhai-train}"
14
  NB_EXEC_TIMEOUT="${NB_EXEC_TIMEOUT:-3600}"