Ikshitha Janarthanan commited on
Commit
3e3afc7
Β·
1 Parent(s): b1f8065

feat:task 3 enhanced

Browse files
Files changed (5) hide show
  1. README.md +155 -5
  2. app.py +4 -2
  3. inference.py +56 -9
  4. models.py +54 -13
  5. requirements.txt +15 -5
README.md CHANGED
@@ -28,6 +28,8 @@ with a fully open, dataset-calibrated simulation grounded in:
28
  | [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog |
29
  | [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) |
30
  | [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank |
 
 
31
 
32
  All datasets are **optional** β€” the environment falls back to published
33
  statistics so it runs out-of-the-box with zero downloads.
@@ -41,7 +43,8 @@ class Action(BaseModel):
41
  bid_price: float # USD bid for the RTB auction (β‰₯ 0)
42
  headline_id: int # Index into the 6-slot headlines catalog (0–5)
43
  creative_id: int # Index into the 6-slot creatives catalog (0–5)
44
- generated_caption: str | None # Free-text caption (hard_assembly only)
 
45
  ```
46
 
47
  ## Observation Space
@@ -60,6 +63,11 @@ class Observation(BaseModel):
60
  carryover_boost: float # Brand-recall CTR boost [0, 0.30]
61
  last_ctr: float # Previous step CTR
62
  cumulative_revenue: float # Total revenue earned
 
 
 
 
 
63
  ```
64
 
65
  ## Reward Signal
@@ -69,6 +77,7 @@ class Observation(BaseModel):
69
  | Auction **won** | `adjusted_ctr Γ— $15 βˆ’ clearing_price` |
70
  | Auction **lost** | `βˆ’$0.10` (missed opportunity) |
71
  | Over-pacing (medium only) | `βˆ’$1.00` penalty |
 
72
 
73
  Rewards are **per-step** (not sparse), providing continuous gradient signal.
74
 
@@ -84,9 +93,33 @@ Rewards are **per-step** (not sparse), providing continuous gradient signal.
84
  **Objective:** Pace $50 across 24 hours; retain β‰₯ 20% for peak hours (18–22).
85
  **Budget:** $50 | **Grader:** `0.3Γ—smoothness + 0.3Γ—peak_survival + 0.4Γ—revenue` | **Target:** 0.70
86
 
87
- ### Level 3 β€” `hard_assembly` (Hard)
88
- **Objective:** Generate captions aligned with the viral trend AND win auctions.
89
- **Budget:** $100 | **Grader:** `0.6Γ—cosine_sim + 0.4Γ—revenue_factor` | **Target:** 0.65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  ### Level 4 β€” `hard_sequencing` (Hard)
92
  **Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts.
@@ -96,6 +129,102 @@ a 20% diversity bonus.
96
 
97
  ---
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ## Setup & Usage
100
 
101
  ### Prerequisites
@@ -147,6 +276,7 @@ The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdou
147
  | `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name |
148
  | `AUCTIONEER_TASK` | No | Task to run (default: `all`) |
149
  | `MIND_SOURCE` | No | `local` / `huggingface` / `azure` |
 
150
  | `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator |
151
 
152
  ---
@@ -157,7 +287,7 @@ The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdou
157
  |------|---------------|-------|
158
  | `easy_headline` | 0.55 – 0.80 | Contextβ†’headline matching is learnable |
159
  | `medium_pacing` | 0.45 – 0.70 | Requires budget discipline |
160
- | `hard_assembly` | 0.40 – 0.65 | Caption quality + auction wins |
161
  | `hard_sequencing` | 0.35 – 0.60 | Compared against DP oracle |
162
 
163
  Scores depend on LLM quality and market stochasticity. Run multiple episodes
@@ -170,6 +300,19 @@ for stable estimates.
170
  ```
171
  β”œβ”€β”€ models.py # Pydantic models: Action, Observation, Reward, Info
172
  β”œβ”€β”€ environment.py # OpenEnvAuctioneer + graders + dataset layers
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  β”œβ”€β”€ app.py # FastAPI server (runs inside Docker)
174
  β”œβ”€β”€ inference.py # Baseline inference script (mandatory format)
175
  β”œβ”€β”€ openenv.yaml # OpenEnv metadata & task definitions
@@ -179,6 +322,13 @@ for stable estimates.
179
  └── Datasets/ # Optional dataset mount point
180
  ```
181
 
 
 
 
 
 
 
 
182
  ## License
183
 
184
  MIT
 
28
  | [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog |
29
  | [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) |
30
  | [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank |
31
+ | [MS-COCO Captions 2017](https://cocodataset.org/) | Ad + caption pool for `hard_assembly` |
32
+ | [Google Trends](https://github.com/GeneralMills/pytrends) / [Reddit](https://www.reddit.com/) | Live viral hashtag scraping |
33
 
34
  All datasets are **optional** β€” the environment falls back to published
35
  statistics so it runs out-of-the-box with zero downloads.
 
43
  bid_price: float # USD bid for the RTB auction (β‰₯ 0)
44
  headline_id: int # Index into the 6-slot headlines catalog (0–5)
45
  creative_id: int # Index into the 6-slot creatives catalog (0–5)
46
+ generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags
47
+ generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"])
48
  ```
49
 
50
  ## Observation Space
 
63
  carryover_boost: float # Brand-recall CTR boost [0, 0.30]
64
  last_ctr: float # Previous step CTR
65
  cumulative_revenue: float # Total revenue earned
66
+
67
+ # hard_assembly only:
68
+ live_hashtags: list[str] # Real-time scraped viral hashtags
69
+ image_description: str # Source ad image description
70
+ base_caption: str # Base caption to rewrite
71
  ```
72
 
73
  ## Reward Signal
 
77
  | Auction **won** | `adjusted_ctr Γ— $15 βˆ’ clearing_price` |
78
  | Auction **lost** | `βˆ’$0.10` (missed opportunity) |
79
  | Over-pacing (medium only) | `βˆ’$1.00` penalty |
80
+ | Assembly bonus (hard_assembly) | `+composite_score Γ— $8.00` |
81
 
82
  Rewards are **per-step** (not sparse), providing continuous gradient signal.
83
 
 
93
  **Objective:** Pace $50 across 24 hours; retain β‰₯ 20% for peak hours (18–22).
94
  **Budget:** $50 | **Grader:** `0.3Γ—smoothness + 0.3Γ—peak_survival + 0.4Γ—revenue` | **Target:** 0.70
95
 
96
+ ### Level 3 β€” `hard_assembly` (Hard) πŸ”₯
97
+ **Objective:** Given an ad image description + base caption + live viral hashtags,
98
+ **generate a new caption** that is simultaneously viral, coherent with the image,
99
+ and creatively novel β€” while also winning auctions profitably.
100
+
101
+ **Budget:** $120 | **Target:** 0.65
102
+
103
+ **The RL loop (what the LLM agent does each step):**
104
+ ```
105
+ 1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend
106
+ 2. Agent must:
107
+ a. Select 2–4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit)
108
+ b. Rewrite the base caption to weave those hashtags into natural ad copy
109
+ c. Add its own creative words (target 30–50% novel vocabulary)
110
+ d. Keep the caption coherent with the source image
111
+ e. Set a profitable bid price
112
+ 3. Grader scores the assembled caption on 4 axes:
113
+ β€’ 35% β€” Hashtag relevance (cosine_sim of each hashtag vs viral_trend)
114
+ β€’ 35% β€” Caption-trend alignment (cosine_sim of caption vs viral_trend)
115
+ β€’ 20% β€” Caption-image coherence (cosine_sim of caption vs image_description)
116
+ β€’ 10% β€” Novelty (fraction of new words vs base_caption, target ~40%)
117
+ 4. Reward = auction_reward + composite_score Γ— $8.00 bonus
118
+ ```
119
+
120
+ **Data sources for hard_assembly:**
121
+ - **Ad creatives**: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool.
122
+ - **Viral hashtags**: `ViralHashtagScraper` queries Google Trends (via `pytrends`) and Reddit `/r/popular/hot.json` (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour.
123
 
124
  ### Level 4 β€” `hard_sequencing` (Hard)
125
  **Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts.
 
129
 
130
  ---
131
 
132
+ ## Grading Details
133
+
134
+ ### `EasyHeadlineGrader`
135
+ ```
136
+ step_score = CTR_selected / CTR_oracle
137
+ final_score = mean(step_scores) // [0.0, 1.0]
138
+ ```
139
+
140
+ ### `MediumPacingGrader`
141
+ ```
142
+ smoothness = 1 βˆ’ mean(|hourly_spend βˆ’ ideal_spend| / ideal_spend)
143
+ peak_survival = 1.0 if remaining_budget β‰₯ 20% at hour 18, else 0.0
144
+ revenue_factor = min(1.0, total_revenue / $30)
145
+
146
+ final_score = 0.30 Γ— smoothness + 0.30 Γ— peak_survival + 0.40 Γ— revenue_factor
147
+ ```
148
+
149
+ ### `HardAssemblyGrader` β€” 4-Axis Composite
150
+
151
+ | Axis | Weight | Metric |
152
+ |------|--------|--------|
153
+ | Hashtag Relevance | 0.35 | `mean(cosine_sim(hashtag, viral_trend))` |
154
+ | Caption-Trend Alignment | 0.35 | `cosine_sim(caption, viral_trend)` |
155
+ | Caption-Image Coherence | 0.20 | `cosine_sim(caption, image_description)` |
156
+ | Novelty | 0.10 | `1 βˆ’ |novel_fraction βˆ’ 0.40| / 0.60` |
157
+
158
+ ```
159
+ composite = Ξ£ (weight Γ— axis_score)
160
+
161
+ final_score = 0.60 Γ— mean(composite_scores)
162
+ + 0.40 Γ— min(1.0, total_revenue / $55)
163
+ ```
164
+
165
+ ### `HardSequencingGrader`
166
+ ```
167
+ agent_conversions = Ξ£ [CTR_t Γ— (1 + carryover_boost_t) Γ— $15]
168
+ oracle_conversions = DP-optimal bid/skip sequence with carry-over
169
+
170
+ diversity_mult = 1.20 if β‰₯3 distinct contexts won, else 1.0
171
+
172
+ final_score = min(1.0, agent_conv / oracle_conv Γ— diversity_mult)
173
+ ```
174
+
175
+ ---
176
+
177
+ ## Architecture
178
+
179
+ ```
180
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
181
+ β”‚ OpenEnvAuctioneer (Gym-style environment) β”‚
182
+ β”‚ β”‚
183
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
184
+ β”‚ β”‚ Market Engine β”‚ β”‚ User Simulator β”‚ β”‚
185
+ β”‚ β”‚ (Statistical) β”‚ β”‚ (Semantic / LLM) β”‚ β”‚
186
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
187
+ β”‚ β”‚ iPinYou RTB logs β”‚ β”‚ SentenceTransformer β”‚ β”‚
188
+ β”‚ β”‚ β†’ Lognormal per β”‚ β”‚ all-MiniLM-L6-v2 β”‚ β”‚
189
+ β”‚ β”‚ hour bucket β”‚ β”‚ + optional Llama-3-8B β”‚ β”‚
190
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
191
+ β”‚ β”‚
192
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
193
+ β”‚ β”‚ MIND Dataset Layer (Microsoft News Dataset) β”‚ β”‚
194
+ β”‚ β”‚ behaviours.tsv β†’ CTRCalibrator β”‚ β”‚
195
+ β”‚ β”‚ news.tsv β†’ MINDCreativePool (headlines) β”‚ β”‚
196
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
197
+ β”‚ β”‚
198
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
199
+ β”‚ β”‚ Ad + Caption Dataset (MS-COCO Captions 2017) β”‚ β”‚
200
+ β”‚ β”‚ β†’ image_description + base_caption per step β”‚ β”‚
201
+ β”‚ β”‚ β†’ ViralHashtagScraper (pytrends + Reddit + seeds) β”‚ β”‚
202
+ β”‚ β”‚ β†’ agent rewrites caption with viral hashtags β”‚ β”‚
203
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
204
+ β”‚ β”‚
205
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
206
+ β”‚ β”‚ Grader (task-specific, deterministic 0.0–1.0) β”‚ β”‚
207
+ β”‚ β”‚ Level 1: easy_headline β†’ headline CTR lookup β”‚ β”‚
208
+ β”‚ β”‚ Level 2: medium_pacing β†’ pacing + survival β”‚ β”‚
209
+ β”‚ β”‚ Level 3: hard_assembly β†’ 4-axis composite score β”‚ β”‚
210
+ │ │ Level 4: hard_sequencing→ DP oracle comparison │ │
211
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
212
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
213
+ ```
214
+
215
+ ---
216
+
217
+ ## Models
218
+
219
+ | Model | Role | Always Active? |
220
+ |-------|------|----------------|
221
+ | `all-MiniLM-L6-v2` (SentenceTransformer) | Semantic CTR scoring + grader cosine similarity | βœ… Yes |
222
+ | `Meta-Llama-3-8B-Instruct` (4-bit) | Richer LLM-based CTR scoring | ❌ Optional (`USE_LLM_SIMULATOR=1`) |
223
+
224
+ When the LLM simulator is active: `final_ctr = 0.60 Γ— llm_ctr + 0.40 Γ— semantic_ctr`
225
+
226
+ ---
227
+
228
  ## Setup & Usage
229
 
230
  ### Prerequisites
 
276
  | `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name |
277
  | `AUCTIONEER_TASK` | No | Task to run (default: `all`) |
278
  | `MIND_SOURCE` | No | `local` / `huggingface` / `azure` |
279
+ | `COCO_SOURCE` | No | `local` / `url` (auto-download COCO annotations) |
280
  | `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator |
281
 
282
  ---
 
287
  |------|---------------|-------|
288
  | `easy_headline` | 0.55 – 0.80 | Contextβ†’headline matching is learnable |
289
  | `medium_pacing` | 0.45 – 0.70 | Requires budget discipline |
290
+ | `hard_assembly` | 0.40 – 0.65 | Caption quality + hashtag matching + auction wins |
291
  | `hard_sequencing` | 0.35 – 0.60 | Compared against DP oracle |
292
 
293
  Scores depend on LLM quality and market stochasticity. Run multiple episodes
 
300
  ```
301
  β”œβ”€β”€ models.py # Pydantic models: Action, Observation, Reward, Info
302
  β”œβ”€β”€ environment.py # OpenEnvAuctioneer + graders + dataset layers
303
+ β”‚ β”œβ”€β”€ MINDLoader # MIND dataset loader (HF / Azure / local)
304
+ β”‚ β”œβ”€β”€ MarketCalibrator # iPinYou-based auction price simulator
305
+ β”‚ β”œβ”€β”€ CTRCalibrator # MIND-based CTR lookup tables
306
+ β”‚ β”œβ”€β”€ MINDCreativePool # 6-slot headline/creative catalog from news.tsv
307
+ β”‚ β”œβ”€β”€ PersonaBank # Vogue Dialogue persona sampling
308
+ β”‚ β”œβ”€β”€ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit)
309
+ β”‚ β”œβ”€β”€ AdCaptionDataset # COCO-based ad image+caption pool
310
+ β”‚ β”œβ”€β”€ UserSimulator # Semantic + optional LLM CTR scoring
311
+ β”‚ β”œβ”€β”€ EasyHeadlineGrader # Level 1 grader
312
+ β”‚ β”œβ”€β”€ MediumPacingGrader # Level 2 grader
313
+ β”‚ β”œβ”€β”€ HardAssemblyGrader # Level 3 grader (4-axis composite)
314
+ β”‚ β”œβ”€β”€ HardSequencingGrader# Level 4 grader (DP oracle)
315
+ β”‚ └── OpenEnvAuctioneer # Main Gym-style env class
316
  β”œβ”€β”€ app.py # FastAPI server (runs inside Docker)
317
  β”œβ”€β”€ inference.py # Baseline inference script (mandatory format)
318
  β”œβ”€β”€ openenv.yaml # OpenEnv metadata & task definitions
 
322
  └── Datasets/ # Optional dataset mount point
323
  ```
324
 
325
+ ## References
326
+
327
+ 1. **MIND**: Wu et al. (2020) β€” *"MIND: A Large-scale Dataset for News Recommendation"*, ACL 2020. [msnews.github.io](https://msnews.github.io/)
328
+ 2. **iPinYou RTB**: Zhang et al. (2014) β€” *"Real-Time Bidding Benchmarking with iPinYou Dataset"*. [contest.ipinyou.com](https://contest.ipinyou.com/)
329
+ 3. **MS-COCO Captions**: Lin et al. (2014) β€” *"Microsoft COCO: Common Objects in Context"*. [cocodataset.org](https://cocodataset.org/)
330
+ 4. **SentenceTransformers**: Reimers & Gurevych (2019) β€” *"Sentence-BERT"*. [sbert.net](https://www.sbert.net/)
331
+
332
  ## License
333
 
334
  MIT
app.py CHANGED
@@ -8,7 +8,7 @@ Runs inside the Docker container and exposes HTTP endpoints:
8
  GET /health β†’ liveness check
9
  """
10
 
11
- from typing import Optional
12
 
13
  from fastapi import FastAPI, Query
14
  from pydantic import BaseModel
@@ -16,7 +16,7 @@ from pydantic import BaseModel
16
  from environment import OpenEnvAuctioneer
17
  from models import Action
18
 
19
- app = FastAPI(title="OpenEnv Creative Auctioneer", version="0.3.0")
20
 
21
  # ---------------------------------------------------------------------------
22
  # Global environment instance (one per container)
@@ -33,6 +33,7 @@ class StepRequest(BaseModel):
33
  headline_id: int
34
  creative_id: int
35
  generated_caption: Optional[str] = None
 
36
 
37
 
38
  class ResetResponse(BaseModel):
@@ -75,6 +76,7 @@ def step_env(action: StepRequest):
75
  headline_id=action.headline_id,
76
  creative_id=action.creative_id,
77
  generated_caption=action.generated_caption,
 
78
  )
79
  obs, reward, done, info = _env.step(act)
80
  return StepResponse(
 
8
  GET /health β†’ liveness check
9
  """
10
 
11
+ from typing import List, Optional
12
 
13
  from fastapi import FastAPI, Query
14
  from pydantic import BaseModel
 
16
  from environment import OpenEnvAuctioneer
17
  from models import Action
18
 
19
+ app = FastAPI(title="OpenEnv Creative Auctioneer", version="0.4.0")
20
 
21
  # ---------------------------------------------------------------------------
22
  # Global environment instance (one per container)
 
33
  headline_id: int
34
  creative_id: int
35
  generated_caption: Optional[str] = None
36
+ generated_hashtags: Optional[List[str]] = None
37
 
38
 
39
  class ResetResponse(BaseModel):
 
76
  headline_id=action.headline_id,
77
  creative_id=action.creative_id,
78
  generated_caption=action.generated_caption,
79
+ generated_hashtags=action.generated_hashtags,
80
  )
81
  obs, reward, done, info = _env.step(act)
82
  return StepResponse(
inference.py CHANGED
@@ -73,9 +73,27 @@ SYSTEM_PROMPTS: Dict[str, str] = {
73
  If budget < $5 before hour 18, bid $0."""),
74
 
75
  "hard_assembly": textwrap.dedent("""\
76
- You optimise for VIRAL TREND ALIGNMENT. Generate a short caption
77
- (<=12 words) that aligns with the viral trend AND context.
78
- Score = 60% cosine-similarity + 40% revenue. Bid $0.60-$1.50."""),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  "hard_sequencing": textwrap.dedent("""\
81
  You focus on CROSS-CONTEXT CAMPAIGN SEQUENCING.
@@ -184,12 +202,38 @@ def build_user_prompt(task_id: str, obs: dict) -> str:
184
  ]
185
  if task_id == "hard_sequencing":
186
  lines.append(f"Carryover boost: {obs.get('carryover_boost', 0):.2f}")
187
- lines.append(CATALOG_CTX)
188
- schema = '{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>'
189
  if task_id == "hard_assembly":
190
- schema += ', "generated_caption": "<caption>"'
191
- schema += "}"
192
- lines.append(f"Respond ONLY with JSON: {schema}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  return "\n".join(lines)
194
 
195
 
@@ -239,6 +283,7 @@ async def run_task(task_id: str, image_name: str) -> float:
239
  headline_id=int(action_data.get("headline_id", 0)),
240
  creative_id=int(action_data.get("creative_id", 0)),
241
  generated_caption=action_data.get("generated_caption"),
 
242
  )
243
 
244
  result = await env.step(action)
@@ -251,6 +296,8 @@ async def run_task(task_id: str, image_name: str) -> float:
251
  act_str = f"bid({action.bid_price:.2f},h={action.headline_id},c={action.creative_id})"
252
  if action.generated_caption:
253
  act_str += f",cap={action.generated_caption[:25]}"
 
 
254
 
255
  log_step(step=step, action=act_str, reward=reward,
256
  done=result.done, error=None)
@@ -302,4 +349,4 @@ async def main() -> None:
302
 
303
 
304
  if __name__ == "__main__":
305
- asyncio.run(main())
 
73
  If budget < $5 before hour 18, bid $0."""),
74
 
75
  "hard_assembly": textwrap.dedent("""\
76
+ You are an AI Account Manager and Creative Director for the hard_assembly task.
77
+
78
+ YOUR JOB each step:
79
+ 1. You receive a SOURCE AD CREATIVE: an image description + a base caption.
80
+ 2. You receive LIVE VIRAL HASHTAGS scraped from Google Trends / Reddit.
81
+ 3. You receive the current VIRAL TREND token (cultural keyword).
82
+ 4. You must ASSEMBLE a final ad by:
83
+ (a) Selecting 2–4 hashtags from the live list that best match the trend.
84
+ (b) Rewriting the base caption to weave those hashtags into natural, punchy
85
+ ad copy β€” DO NOT just append hashtags at the end. Blend them into prose.
86
+ (c) Adding your own creative words (target 30–50% new vocabulary).
87
+ (d) The final caption must stay coherent with the image description.
88
+
89
+ GRADER weights (what earns you points):
90
+ 35% β€” Hashtag relevance: chosen hashtags semantically match viral_trend
91
+ 35% β€” Caption-trend align: your caption text matches viral_trend vocabulary
92
+ 20% β€” Image coherence: your caption stays faithful to the image
93
+ 10% β€” Novelty: you added real creative words, not just copy-paste
94
+
95
+ REWARD: auction_base + (composite_score Γ— $8 bonus per winning step)
96
+ BUDGET: $120 for 24 hours. Bid $0.60–$1.50 per step."""),
97
 
98
  "hard_sequencing": textwrap.dedent("""\
99
  You focus on CROSS-CONTEXT CAMPAIGN SEQUENCING.
 
202
  ]
203
  if task_id == "hard_sequencing":
204
  lines.append(f"Carryover boost: {obs.get('carryover_boost', 0):.2f}")
205
+
 
206
  if task_id == "hard_assembly":
207
+ # Show source creative and live hashtags
208
+ img_desc = obs.get("image_description", "")
209
+ base_cap = obs.get("base_caption", "")
210
+ live_tags = obs.get("live_hashtags", [])
211
+ hashtag_list = " ".join(live_tags) if live_tags else "(none scraped)"
212
+ lines.append("")
213
+ lines.append(f"━━━━━ SOURCE CREATIVE ━━━━━")
214
+ lines.append(f"Image description : {img_desc}")
215
+ lines.append(f"Base caption : {base_cap}")
216
+ lines.append(f"")
217
+ lines.append(f"━━━━━ LIVE VIRAL HASHTAGS (scraped now) ━━━━━")
218
+ lines.append(f" {hashtag_list}")
219
+ lines.append(f"")
220
+ lines.append(f"━━━━━ TASK ━━━━━")
221
+ lines.append(f"Select 2–4 hashtags from the list above that best match "
222
+ f"the viral trend '{obs['viral_trend']}'.")
223
+ lines.append(f"Rewrite the base caption to weave them in naturally.")
224
+ lines.append(f"Stay coherent with the image. Add your own creative words.")
225
+ lines.append("")
226
+ schema = ('Respond ONLY with JSON:\n'
227
+ '{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>, '
228
+ '"generated_caption": "<your caption>", '
229
+ '"generated_hashtags": ["#Tag1", "#Tag2", ...]}')
230
+ else:
231
+ lines.append(CATALOG_CTX)
232
+ schema = '{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>}'
233
+ if task_id != "hard_assembly":
234
+ schema = f"Respond ONLY with JSON: {schema}"
235
+
236
+ lines.append(schema)
237
  return "\n".join(lines)
238
 
239
 
 
283
  headline_id=int(action_data.get("headline_id", 0)),
284
  creative_id=int(action_data.get("creative_id", 0)),
285
  generated_caption=action_data.get("generated_caption"),
286
+ generated_hashtags=action_data.get("generated_hashtags"),
287
  )
288
 
289
  result = await env.step(action)
 
296
  act_str = f"bid({action.bid_price:.2f},h={action.headline_id},c={action.creative_id})"
297
  if action.generated_caption:
298
  act_str += f",cap={action.generated_caption[:25]}"
299
+ if action.generated_hashtags:
300
+ act_str += f",tags={len(action.generated_hashtags)}"
301
 
302
  log_step(step=step, action=act_str, reward=reward,
303
  done=result.done, error=None)
 
349
 
350
 
351
  if __name__ == "__main__":
352
+ asyncio.run(main())
models.py CHANGED
@@ -4,13 +4,15 @@ models.py β€” Typed data contracts for the OpenEnv Creative Auctioneer.
4
  All tensors / vectors are represented as plain Python types so the environment
5
  stays framework-agnostic (no hard dependency on PyTorch at this layer).
6
 
7
- Dataset provenance (v0.3):
8
  CTR calibration β†’ MIND (Microsoft News Dataset) behaviours.tsv + news.tsv
9
  Market engine β†’ iPinYou Global RTB logs (Lognormal per hour)
10
  Persona bank β†’ Vogue Dialogue Dataset
 
 
11
  """
12
 
13
- from typing import Optional
14
  from pydantic import BaseModel, Field
15
 
16
 
@@ -42,7 +44,7 @@ class Observation(BaseModel):
42
  # ── Contextual Signals (Privacy-Native β€” no user IDs) ──────────────────
43
  current_context: str = Field(...,
44
  description="Content category derived from MIND news.tsv taxonomy "
45
- "(e.g. 'sports', 'technology', 'lifestyle', 'entertainment').")
46
  news_category: str = Field(default="",
47
  description="Fine-grained MIND subcategory (e.g. 'nfl', 'gadgets'). "
48
  "Provides richer signal than coarse context alone.")
@@ -50,6 +52,20 @@ class Observation(BaseModel):
50
  description="Current cultural viral token surfaced from Reels "
51
  "(e.g. 'Quiet Luxury', 'Eco-Friendly', 'Cyberpunk', 'Minimalism').")
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  # ── Market Signals ─────────────────────────────────────────────────────
54
  market_pressure: float = Field(default=0.5, ge=0.0, le=1.0,
55
  description="Normalised indicator of how competitive the auction is "
@@ -60,10 +76,10 @@ class Observation(BaseModel):
60
  description="Number of ads already shown; drives the fatigue penalty.")
61
  fatigue_level: float = Field(default=0.0, ge=0.0, le=1.0,
62
  description="Accumulated user-fatigue penalty (0 = fresh, 1 = fully fatigued).")
 
 
63
 
64
  # ── Performance Feedback (delayed by 1 step) ───────────────────────────
65
- carryover_boost: float = Field(default=0.0, ge=0.0, le=0.30,
66
- description="CTR boost from brand-recall carry-over (decaying from prior ad wins).")
67
  last_ctr: float = Field(default=0.0, ge=0.0, le=1.0,
68
  description="CTR returned by the User Simulator on the previous step.")
69
  cumulative_revenue: float = Field(default=0.0,
@@ -85,9 +101,18 @@ class Action(BaseModel):
85
  creative_id: int = Field(..., ge=0, le=5,
86
  description="Index into the Creatives Catalog (0–5).")
87
 
88
- # Optional free-text caption (retained for backwards compatibility). No effect in hard_sequencing.
89
  generated_caption: Optional[str] = Field(default=None,
90
- description="Free-form caption generated by the agent in old hard_assembly mode.")
 
 
 
 
 
 
 
 
 
91
 
92
 
93
  # ---------------------------------------------------------------------------
@@ -111,16 +136,32 @@ class Info(BaseModel):
111
  task_score: float = Field(..., ge=0.0, le=1.0,
112
  description="Final 0.0–1.0 task-completion score.")
113
 
114
- # Level-specific sub-scores (populated per task_id)
115
  headline_alignment_score: float = Field(default=0.0, ge=0.0, le=1.0,
116
  description="[easy_headline] CTR_selected / CTR_best for this context.")
 
 
117
  pacing_score: float = Field(default=0.0, ge=0.0, le=1.0,
118
  description="[medium_pacing] Budget-smoothness and peak-hour survival bonus.")
 
 
119
  clip_similarity_score: float = Field(default=0.0, ge=0.0, le=1.0,
120
- description="[hard_assembly] Cosine similarity between caption and viral token.")
 
 
 
 
 
 
 
 
 
 
 
 
121
  sequencing_score: float = Field(default=0.0, ge=0.0, le=1.0,
122
- description="[hard_sequencing] Agent conversions / Oracle conversions Γ— diversity bonus.")
123
- contexts_covered: int = Field(default=0, ge=0, le=4,
124
- description="[hard_sequencing] Number of distinct contexts that received β‰₯1 ad.")
125
  diversity_multiplier: float = Field(default=1.0,
126
- description="[hard_sequencing] 1.2 if β‰₯3 contexts covered, else 1.0.")
 
4
  All tensors / vectors are represented as plain Python types so the environment
5
  stays framework-agnostic (no hard dependency on PyTorch at this layer).
6
 
7
+ Dataset provenance (v0.4):
8
  CTR calibration β†’ MIND (Microsoft News Dataset) behaviours.tsv + news.tsv
9
  Market engine β†’ iPinYou Global RTB logs (Lognormal per hour)
10
  Persona bank β†’ Vogue Dialogue Dataset
11
+ Ad+Caption pool β†’ MS-COCO Captions OR Google Conceptual Captions CC3M
12
+ Viral hashtags β†’ Pytrends / Hashtagify / static fallback table
13
  """
14
 
15
+ from typing import List, Optional
16
  from pydantic import BaseModel, Field
17
 
18
 
 
44
  # ── Contextual Signals (Privacy-Native β€” no user IDs) ──────────────────
45
  current_context: str = Field(...,
46
  description="Content category derived from MIND news.tsv taxonomy "
47
+ "(e.g. 'Fitness', 'Tech', 'Fashion', 'Gaming').")
48
  news_category: str = Field(default="",
49
  description="Fine-grained MIND subcategory (e.g. 'nfl', 'gadgets'). "
50
  "Provides richer signal than coarse context alone.")
 
52
  description="Current cultural viral token surfaced from Reels "
53
  "(e.g. 'Quiet Luxury', 'Eco-Friendly', 'Cyberpunk', 'Minimalism').")
54
 
55
+ # ── hard_assembly: live scraped hashtags + source creative ─────────────
56
+ live_hashtags: List[str] = Field(default_factory=list,
57
+ description="[hard_assembly] Real-time scraped viral hashtags from "
58
+ "Google Trends / Reddit. The agent selects which to use "
59
+ "and weaves them into generated_caption. "
60
+ "Example: ['#QuietLuxury', '#OOTD', '#SlowFashion'].")
61
+ image_description: str = Field(default="",
62
+ description="[hard_assembly] Text description of the source ad image "
63
+ "from AdCaptionDataset (COCO or seed pool). "
64
+ "Agent caption must stay coherent with this.")
65
+ base_caption: str = Field(default="",
66
+ description="[hard_assembly] Base caption from AdCaptionDataset. "
67
+ "Agent rewrites this to incorporate viral hashtags.")
68
+
69
  # ── Market Signals ─────────────────────────────────────────────────────
70
  market_pressure: float = Field(default=0.5, ge=0.0, le=1.0,
71
  description="Normalised indicator of how competitive the auction is "
 
76
  description="Number of ads already shown; drives the fatigue penalty.")
77
  fatigue_level: float = Field(default=0.0, ge=0.0, le=1.0,
78
  description="Accumulated user-fatigue penalty (0 = fresh, 1 = fully fatigued).")
79
+ carryover_boost: float = Field(default=0.0, ge=0.0, le=1.0,
80
+ description="[hard_sequencing] Carry-over CTR boost from winning prior auctions.")
81
 
82
  # ── Performance Feedback (delayed by 1 step) ───────────────────────────
 
 
83
  last_ctr: float = Field(default=0.0, ge=0.0, le=1.0,
84
  description="CTR returned by the User Simulator on the previous step.")
85
  cumulative_revenue: float = Field(default=0.0,
 
101
  creative_id: int = Field(..., ge=0, le=5,
102
  description="Index into the Creatives Catalog (0–5).")
103
 
104
+ # ── hard_assembly fields ────────────────────────────────────────────────
105
  generated_caption: Optional[str] = Field(default=None,
106
+ description="[hard_assembly] Final assembled caption β€” should incorporate "
107
+ "viral hashtags and remain coherent with the source image. "
108
+ "Leave None for easy/medium tasks.")
109
+
110
+ generated_hashtags: Optional[List[str]] = Field(default=None,
111
+ description="[hard_assembly] List of hashtag strings (with #) that the agent "
112
+ "chose to include. The agent must scrape these from ViralHashtagScraper "
113
+ "and select which ones to weave into generated_caption. "
114
+ "Example: ['#QuietLuxury', '#OOTD', '#SlowFashion']. "
115
+ "Leave None for easy/medium/sequencing tasks.")
116
 
117
 
118
  # ---------------------------------------------------------------------------
 
136
  task_score: float = Field(..., ge=0.0, le=1.0,
137
  description="Final 0.0–1.0 task-completion score.")
138
 
139
+ # Level 1 sub-score
140
  headline_alignment_score: float = Field(default=0.0, ge=0.0, le=1.0,
141
  description="[easy_headline] CTR_selected / CTR_best for this context.")
142
+
143
+ # Level 2 sub-score
144
  pacing_score: float = Field(default=0.0, ge=0.0, le=1.0,
145
  description="[medium_pacing] Budget-smoothness and peak-hour survival bonus.")
146
+
147
+ # Level 3 sub-scores (all three axes)
148
  clip_similarity_score: float = Field(default=0.0, ge=0.0, le=1.0,
149
+ description="[hard_assembly] Composite grader score (0.35Γ—hashtag + 0.35Γ—align + 0.30Γ—coherence).")
150
+ hashtag_relevance_score: float = Field(default=0.0, ge=0.0, le=1.0,
151
+ description="[hard_assembly] Mean cosine_sim(chosen_hashtag, viral_trend).")
152
+ caption_trend_alignment: float = Field(default=0.0, ge=0.0, le=1.0,
153
+ description="[hard_assembly] cosine_sim(final_caption, viral_trend).")
154
+ caption_image_coherence: float = Field(default=0.0, ge=0.0, le=1.0,
155
+ description="[hard_assembly] cosine_sim(final_caption, image_description).")
156
+ chosen_hashtags: List[str] = Field(default_factory=list,
157
+ description="[hard_assembly] Hashtags the agent chose this step.")
158
+ assembly_reward_bonus: float = Field(default=0.0,
159
+ description="[hard_assembly] Extra reward granted for viral alignment quality.")
160
+
161
+ # Level 4 sub-scores
162
  sequencing_score: float = Field(default=0.0, ge=0.0, le=1.0,
163
+ description="[hard_sequencing] agent_conversions / oracle_conversions Γ— diversity.")
164
+ contexts_covered: int = Field(default=0,
165
+ description="[hard_sequencing] Number of distinct contexts won at least once.")
166
  diversity_multiplier: float = Field(default=1.0,
167
+ description="[hard_sequencing] Bonus multiplier for covering β‰₯3 contexts.")
requirements.txt CHANGED
@@ -9,18 +9,28 @@ openai>=1.0.0
9
  sentence-transformers>=2.2.2
10
  torch>=2.0.0
11
 
12
- # MIND dataset β€” Option A (HuggingFace streaming, no local disk)
13
- # Standard library urllib is used for direct TSV download (no extra deps needed).
14
- # Uncomment below ONLY if you want the full HuggingFace datasets library instead:
 
 
 
 
 
 
 
 
 
 
15
  # datasets>=2.18.0
16
 
17
  # Optional: LLM-based User Simulator (activate with USE_LLM_SIMULATOR=1)
18
- # Uncomment if running on GPU with β‰₯16 GB VRAM
19
  # transformers>=4.40.0
20
  # bitsandbytes>=0.43.0
21
  # accelerate>=0.29.0
22
 
23
- # Optional: true CLIP scoring for hard_assembly grader
24
  # open-clip-torch>=2.24.0
25
 
26
  # Serving / inference client
 
9
  sentence-transformers>=2.2.2
10
  torch>=2.0.0
11
 
12
+ # ── hard_assembly: ViralHashtagScraper ──────────────────────────────────────
13
+ # Source 1: Google Trends (free, no API key)
14
+ pytrends>=4.9.2
15
+ # Source 2: Reddit public REST API β€” uses stdlib urllib (no extra dep needed)
16
+
17
+ # ── hard_assembly: AdCaptionDataset ─────────────────────────────────────────
18
+ # MS-COCO Captions 2017 val annotations (~241 MB)
19
+ # Set COCO_SOURCE=url to auto-download on first run (uses stdlib urllib + zipfile)
20
+ # Set COCO_SOURCE=local (default) + place captions_val2017.json at Datasets/coco_captions/
21
+
22
+ # ── MIND dataset β€” Option A HuggingFace (zero local disk) ───────────────────
23
+ # Uses stdlib urllib for direct TSV download (no extra deps needed)
24
+ # Uncomment below if you prefer the full HuggingFace datasets library:
25
  # datasets>=2.18.0
26
 
27
  # Optional: LLM-based User Simulator (activate with USE_LLM_SIMULATOR=1)
28
+ # Requires GPU with β‰₯16 GB VRAM
29
  # transformers>=4.40.0
30
  # bitsandbytes>=0.43.0
31
  # accelerate>=0.29.0
32
 
33
+ # Optional: true CLIP image+text scoring for hard_assembly grader
34
  # open-clip-torch>=2.24.0
35
 
36
  # Serving / inference client