Spaces:
Sleeping
Sleeping
Ikshitha Janarthanan commited on
Commit Β·
3e3afc7
1
Parent(s): b1f8065
feat:task 3 enhanced
Browse files- README.md +155 -5
- app.py +4 -2
- inference.py +56 -9
- models.py +54 -13
- requirements.txt +15 -5
README.md
CHANGED
|
@@ -28,6 +28,8 @@ with a fully open, dataset-calibrated simulation grounded in:
|
|
| 28 |
| [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog |
|
| 29 |
| [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) |
|
| 30 |
| [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank |
|
|
|
|
|
|
|
| 31 |
|
| 32 |
All datasets are **optional** β the environment falls back to published
|
| 33 |
statistics so it runs out-of-the-box with zero downloads.
|
|
@@ -41,7 +43,8 @@ class Action(BaseModel):
|
|
| 41 |
bid_price: float # USD bid for the RTB auction (β₯ 0)
|
| 42 |
headline_id: int # Index into the 6-slot headlines catalog (0β5)
|
| 43 |
creative_id: int # Index into the 6-slot creatives catalog (0β5)
|
| 44 |
-
generated_caption: str | None
|
|
|
|
| 45 |
```
|
| 46 |
|
| 47 |
## Observation Space
|
|
@@ -60,6 +63,11 @@ class Observation(BaseModel):
|
|
| 60 |
carryover_boost: float # Brand-recall CTR boost [0, 0.30]
|
| 61 |
last_ctr: float # Previous step CTR
|
| 62 |
cumulative_revenue: float # Total revenue earned
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
```
|
| 64 |
|
| 65 |
## Reward Signal
|
|
@@ -69,6 +77,7 @@ class Observation(BaseModel):
|
|
| 69 |
| Auction **won** | `adjusted_ctr Γ $15 β clearing_price` |
|
| 70 |
| Auction **lost** | `β$0.10` (missed opportunity) |
|
| 71 |
| Over-pacing (medium only) | `β$1.00` penalty |
|
|
|
|
| 72 |
|
| 73 |
Rewards are **per-step** (not sparse), providing continuous gradient signal.
|
| 74 |
|
|
@@ -84,9 +93,33 @@ Rewards are **per-step** (not sparse), providing continuous gradient signal.
|
|
| 84 |
**Objective:** Pace $50 across 24 hours; retain β₯ 20% for peak hours (18β22).
|
| 85 |
**Budget:** $50 | **Grader:** `0.3Γsmoothness + 0.3Γpeak_survival + 0.4Γrevenue` | **Target:** 0.70
|
| 86 |
|
| 87 |
-
### Level 3 β `hard_assembly` (Hard)
|
| 88 |
-
**Objective:**
|
| 89 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
### Level 4 β `hard_sequencing` (Hard)
|
| 92 |
**Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts.
|
|
@@ -96,6 +129,102 @@ a 20% diversity bonus.
|
|
| 96 |
|
| 97 |
---
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
## Setup & Usage
|
| 100 |
|
| 101 |
### Prerequisites
|
|
@@ -147,6 +276,7 @@ The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdou
|
|
| 147 |
| `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name |
|
| 148 |
| `AUCTIONEER_TASK` | No | Task to run (default: `all`) |
|
| 149 |
| `MIND_SOURCE` | No | `local` / `huggingface` / `azure` |
|
|
|
|
| 150 |
| `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator |
|
| 151 |
|
| 152 |
---
|
|
@@ -157,7 +287,7 @@ The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdou
|
|
| 157 |
|------|---------------|-------|
|
| 158 |
| `easy_headline` | 0.55 β 0.80 | Contextβheadline matching is learnable |
|
| 159 |
| `medium_pacing` | 0.45 β 0.70 | Requires budget discipline |
|
| 160 |
-
| `hard_assembly` | 0.40 β 0.65 | Caption quality + auction wins |
|
| 161 |
| `hard_sequencing` | 0.35 β 0.60 | Compared against DP oracle |
|
| 162 |
|
| 163 |
Scores depend on LLM quality and market stochasticity. Run multiple episodes
|
|
@@ -170,6 +300,19 @@ for stable estimates.
|
|
| 170 |
```
|
| 171 |
βββ models.py # Pydantic models: Action, Observation, Reward, Info
|
| 172 |
βββ environment.py # OpenEnvAuctioneer + graders + dataset layers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
βββ app.py # FastAPI server (runs inside Docker)
|
| 174 |
βββ inference.py # Baseline inference script (mandatory format)
|
| 175 |
βββ openenv.yaml # OpenEnv metadata & task definitions
|
|
@@ -179,6 +322,13 @@ for stable estimates.
|
|
| 179 |
βββ Datasets/ # Optional dataset mount point
|
| 180 |
```
|
| 181 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
## License
|
| 183 |
|
| 184 |
MIT
|
|
|
|
| 28 |
| [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog |
|
| 29 |
| [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) |
|
| 30 |
| [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank |
|
| 31 |
+
| [MS-COCO Captions 2017](https://cocodataset.org/) | Ad + caption pool for `hard_assembly` |
|
| 32 |
+
| [Google Trends](https://github.com/GeneralMills/pytrends) / [Reddit](https://www.reddit.com/) | Live viral hashtag scraping |
|
| 33 |
|
| 34 |
All datasets are **optional** β the environment falls back to published
|
| 35 |
statistics so it runs out-of-the-box with zero downloads.
|
|
|
|
| 43 |
bid_price: float # USD bid for the RTB auction (β₯ 0)
|
| 44 |
headline_id: int # Index into the 6-slot headlines catalog (0β5)
|
| 45 |
creative_id: int # Index into the 6-slot creatives catalog (0β5)
|
| 46 |
+
generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags
|
| 47 |
+
generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"])
|
| 48 |
```
|
| 49 |
|
| 50 |
## Observation Space
|
|
|
|
| 63 |
carryover_boost: float # Brand-recall CTR boost [0, 0.30]
|
| 64 |
last_ctr: float # Previous step CTR
|
| 65 |
cumulative_revenue: float # Total revenue earned
|
| 66 |
+
|
| 67 |
+
# hard_assembly only:
|
| 68 |
+
live_hashtags: list[str] # Real-time scraped viral hashtags
|
| 69 |
+
image_description: str # Source ad image description
|
| 70 |
+
base_caption: str # Base caption to rewrite
|
| 71 |
```
|
| 72 |
|
| 73 |
## Reward Signal
|
|
|
|
| 77 |
| Auction **won** | `adjusted_ctr Γ $15 β clearing_price` |
|
| 78 |
| Auction **lost** | `β$0.10` (missed opportunity) |
|
| 79 |
| Over-pacing (medium only) | `β$1.00` penalty |
|
| 80 |
+
| Assembly bonus (hard_assembly) | `+composite_score Γ $8.00` |
|
| 81 |
|
| 82 |
Rewards are **per-step** (not sparse), providing continuous gradient signal.
|
| 83 |
|
|
|
|
| 93 |
**Objective:** Pace $50 across 24 hours; retain β₯ 20% for peak hours (18β22).
|
| 94 |
**Budget:** $50 | **Grader:** `0.3Γsmoothness + 0.3Γpeak_survival + 0.4Γrevenue` | **Target:** 0.70
|
| 95 |
|
| 96 |
+
### Level 3 β `hard_assembly` (Hard) π₯
|
| 97 |
+
**Objective:** Given an ad image description + base caption + live viral hashtags,
|
| 98 |
+
**generate a new caption** that is simultaneously viral, coherent with the image,
|
| 99 |
+
and creatively novel β while also winning auctions profitably.
|
| 100 |
+
|
| 101 |
+
**Budget:** $120 | **Target:** 0.65
|
| 102 |
+
|
| 103 |
+
**The RL loop (what the LLM agent does each step):**
|
| 104 |
+
```
|
| 105 |
+
1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend
|
| 106 |
+
2. Agent must:
|
| 107 |
+
a. Select 2β4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit)
|
| 108 |
+
b. Rewrite the base caption to weave those hashtags into natural ad copy
|
| 109 |
+
c. Add its own creative words (target 30β50% novel vocabulary)
|
| 110 |
+
d. Keep the caption coherent with the source image
|
| 111 |
+
e. Set a profitable bid price
|
| 112 |
+
3. Grader scores the assembled caption on 4 axes:
|
| 113 |
+
β’ 35% β Hashtag relevance (cosine_sim of each hashtag vs viral_trend)
|
| 114 |
+
β’ 35% β Caption-trend alignment (cosine_sim of caption vs viral_trend)
|
| 115 |
+
β’ 20% β Caption-image coherence (cosine_sim of caption vs image_description)
|
| 116 |
+
β’ 10% β Novelty (fraction of new words vs base_caption, target ~40%)
|
| 117 |
+
4. Reward = auction_reward + composite_score Γ $8.00 bonus
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
**Data sources for hard_assembly:**
|
| 121 |
+
- **Ad creatives**: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool.
|
| 122 |
+
- **Viral hashtags**: `ViralHashtagScraper` queries Google Trends (via `pytrends`) and Reddit `/r/popular/hot.json` (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour.
|
| 123 |
|
| 124 |
### Level 4 β `hard_sequencing` (Hard)
|
| 125 |
**Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts.
|
|
|
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
+
## Grading Details
|
| 133 |
+
|
| 134 |
+
### `EasyHeadlineGrader`
|
| 135 |
+
```
|
| 136 |
+
step_score = CTR_selected / CTR_oracle
|
| 137 |
+
final_score = mean(step_scores) // [0.0, 1.0]
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
### `MediumPacingGrader`
|
| 141 |
+
```
|
| 142 |
+
smoothness = 1 β mean(|hourly_spend β ideal_spend| / ideal_spend)
|
| 143 |
+
peak_survival = 1.0 if remaining_budget β₯ 20% at hour 18, else 0.0
|
| 144 |
+
revenue_factor = min(1.0, total_revenue / $30)
|
| 145 |
+
|
| 146 |
+
final_score = 0.30 Γ smoothness + 0.30 Γ peak_survival + 0.40 Γ revenue_factor
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### `HardAssemblyGrader` β 4-Axis Composite
|
| 150 |
+
|
| 151 |
+
| Axis | Weight | Metric |
|
| 152 |
+
|------|--------|--------|
|
| 153 |
+
| Hashtag Relevance | 0.35 | `mean(cosine_sim(hashtag, viral_trend))` |
|
| 154 |
+
| Caption-Trend Alignment | 0.35 | `cosine_sim(caption, viral_trend)` |
|
| 155 |
+
| Caption-Image Coherence | 0.20 | `cosine_sim(caption, image_description)` |
|
| 156 |
+
| Novelty | 0.10 | `1 β |novel_fraction β 0.40| / 0.60` |
|
| 157 |
+
|
| 158 |
+
```
|
| 159 |
+
composite = Ξ£ (weight Γ axis_score)
|
| 160 |
+
|
| 161 |
+
final_score = 0.60 Γ mean(composite_scores)
|
| 162 |
+
+ 0.40 Γ min(1.0, total_revenue / $55)
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
### `HardSequencingGrader`
|
| 166 |
+
```
|
| 167 |
+
agent_conversions = Ξ£ [CTR_t Γ (1 + carryover_boost_t) Γ $15]
|
| 168 |
+
oracle_conversions = DP-optimal bid/skip sequence with carry-over
|
| 169 |
+
|
| 170 |
+
diversity_mult = 1.20 if β₯3 distinct contexts won, else 1.0
|
| 171 |
+
|
| 172 |
+
final_score = min(1.0, agent_conv / oracle_conv Γ diversity_mult)
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
---
|
| 176 |
+
|
| 177 |
+
## Architecture
|
| 178 |
+
|
| 179 |
+
```
|
| 180 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 181 |
+
β OpenEnvAuctioneer (Gym-style environment) β
|
| 182 |
+
β β
|
| 183 |
+
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
|
| 184 |
+
β β Market Engine β β User Simulator β β
|
| 185 |
+
β β (Statistical) β β (Semantic / LLM) β β
|
| 186 |
+
β β β β β β
|
| 187 |
+
β β iPinYou RTB logs β β SentenceTransformer β β
|
| 188 |
+
β β β Lognormal per β β all-MiniLM-L6-v2 β β
|
| 189 |
+
β β hour bucket β β + optional Llama-3-8B β β
|
| 190 |
+
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
|
| 191 |
+
β β
|
| 192 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 193 |
+
β β MIND Dataset Layer (Microsoft News Dataset) β β
|
| 194 |
+
β β behaviours.tsv β CTRCalibrator β β
|
| 195 |
+
β β news.tsv β MINDCreativePool (headlines) β β
|
| 196 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 197 |
+
β β
|
| 198 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 199 |
+
β β Ad + Caption Dataset (MS-COCO Captions 2017) β β
|
| 200 |
+
β β β image_description + base_caption per step β β
|
| 201 |
+
β β β ViralHashtagScraper (pytrends + Reddit + seeds) β β
|
| 202 |
+
β β β agent rewrites caption with viral hashtags β β
|
| 203 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 204 |
+
β β
|
| 205 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 206 |
+
β β Grader (task-specific, deterministic 0.0β1.0) β β
|
| 207 |
+
β β Level 1: easy_headline β headline CTR lookup β β
|
| 208 |
+
β β Level 2: medium_pacing β pacing + survival β β
|
| 209 |
+
β β Level 3: hard_assembly β 4-axis composite score β β
|
| 210 |
+
β β Level 4: hard_sequencingβ DP oracle comparison β β
|
| 211 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 212 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
---
|
| 216 |
+
|
| 217 |
+
## Models
|
| 218 |
+
|
| 219 |
+
| Model | Role | Always Active? |
|
| 220 |
+
|-------|------|----------------|
|
| 221 |
+
| `all-MiniLM-L6-v2` (SentenceTransformer) | Semantic CTR scoring + grader cosine similarity | β
Yes |
|
| 222 |
+
| `Meta-Llama-3-8B-Instruct` (4-bit) | Richer LLM-based CTR scoring | β Optional (`USE_LLM_SIMULATOR=1`) |
|
| 223 |
+
|
| 224 |
+
When the LLM simulator is active: `final_ctr = 0.60 Γ llm_ctr + 0.40 Γ semantic_ctr`
|
| 225 |
+
|
| 226 |
+
---
|
| 227 |
+
|
| 228 |
## Setup & Usage
|
| 229 |
|
| 230 |
### Prerequisites
|
|
|
|
| 276 |
| `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name |
|
| 277 |
| `AUCTIONEER_TASK` | No | Task to run (default: `all`) |
|
| 278 |
| `MIND_SOURCE` | No | `local` / `huggingface` / `azure` |
|
| 279 |
+
| `COCO_SOURCE` | No | `local` / `url` (auto-download COCO annotations) |
|
| 280 |
| `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator |
|
| 281 |
|
| 282 |
---
|
|
|
|
| 287 |
|------|---------------|-------|
|
| 288 |
| `easy_headline` | 0.55 β 0.80 | Contextβheadline matching is learnable |
|
| 289 |
| `medium_pacing` | 0.45 β 0.70 | Requires budget discipline |
|
| 290 |
+
| `hard_assembly` | 0.40 β 0.65 | Caption quality + hashtag matching + auction wins |
|
| 291 |
| `hard_sequencing` | 0.35 β 0.60 | Compared against DP oracle |
|
| 292 |
|
| 293 |
Scores depend on LLM quality and market stochasticity. Run multiple episodes
|
|
|
|
| 300 |
```
|
| 301 |
βββ models.py # Pydantic models: Action, Observation, Reward, Info
|
| 302 |
βββ environment.py # OpenEnvAuctioneer + graders + dataset layers
|
| 303 |
+
β βββ MINDLoader # MIND dataset loader (HF / Azure / local)
|
| 304 |
+
β βββ MarketCalibrator # iPinYou-based auction price simulator
|
| 305 |
+
β βββ CTRCalibrator # MIND-based CTR lookup tables
|
| 306 |
+
β βββ MINDCreativePool # 6-slot headline/creative catalog from news.tsv
|
| 307 |
+
β βββ PersonaBank # Vogue Dialogue persona sampling
|
| 308 |
+
β βββ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit)
|
| 309 |
+
β βββ AdCaptionDataset # COCO-based ad image+caption pool
|
| 310 |
+
β βββ UserSimulator # Semantic + optional LLM CTR scoring
|
| 311 |
+
β βββ EasyHeadlineGrader # Level 1 grader
|
| 312 |
+
β βββ MediumPacingGrader # Level 2 grader
|
| 313 |
+
β βββ HardAssemblyGrader # Level 3 grader (4-axis composite)
|
| 314 |
+
β βββ HardSequencingGrader# Level 4 grader (DP oracle)
|
| 315 |
+
β βββ OpenEnvAuctioneer # Main Gym-style env class
|
| 316 |
βββ app.py # FastAPI server (runs inside Docker)
|
| 317 |
βββ inference.py # Baseline inference script (mandatory format)
|
| 318 |
βββ openenv.yaml # OpenEnv metadata & task definitions
|
|
|
|
| 322 |
βββ Datasets/ # Optional dataset mount point
|
| 323 |
```
|
| 324 |
|
| 325 |
+
## References
|
| 326 |
+
|
| 327 |
+
1. **MIND**: Wu et al. (2020) β *"MIND: A Large-scale Dataset for News Recommendation"*, ACL 2020. [msnews.github.io](https://msnews.github.io/)
|
| 328 |
+
2. **iPinYou RTB**: Zhang et al. (2014) β *"Real-Time Bidding Benchmarking with iPinYou Dataset"*. [contest.ipinyou.com](https://contest.ipinyou.com/)
|
| 329 |
+
3. **MS-COCO Captions**: Lin et al. (2014) β *"Microsoft COCO: Common Objects in Context"*. [cocodataset.org](https://cocodataset.org/)
|
| 330 |
+
4. **SentenceTransformers**: Reimers & Gurevych (2019) β *"Sentence-BERT"*. [sbert.net](https://www.sbert.net/)
|
| 331 |
+
|
| 332 |
## License
|
| 333 |
|
| 334 |
MIT
|
app.py
CHANGED
|
@@ -8,7 +8,7 @@ Runs inside the Docker container and exposes HTTP endpoints:
|
|
| 8 |
GET /health β liveness check
|
| 9 |
"""
|
| 10 |
|
| 11 |
-
from typing import Optional
|
| 12 |
|
| 13 |
from fastapi import FastAPI, Query
|
| 14 |
from pydantic import BaseModel
|
|
@@ -16,7 +16,7 @@ from pydantic import BaseModel
|
|
| 16 |
from environment import OpenEnvAuctioneer
|
| 17 |
from models import Action
|
| 18 |
|
| 19 |
-
app = FastAPI(title="OpenEnv Creative Auctioneer", version="0.
|
| 20 |
|
| 21 |
# ---------------------------------------------------------------------------
|
| 22 |
# Global environment instance (one per container)
|
|
@@ -33,6 +33,7 @@ class StepRequest(BaseModel):
|
|
| 33 |
headline_id: int
|
| 34 |
creative_id: int
|
| 35 |
generated_caption: Optional[str] = None
|
|
|
|
| 36 |
|
| 37 |
|
| 38 |
class ResetResponse(BaseModel):
|
|
@@ -75,6 +76,7 @@ def step_env(action: StepRequest):
|
|
| 75 |
headline_id=action.headline_id,
|
| 76 |
creative_id=action.creative_id,
|
| 77 |
generated_caption=action.generated_caption,
|
|
|
|
| 78 |
)
|
| 79 |
obs, reward, done, info = _env.step(act)
|
| 80 |
return StepResponse(
|
|
|
|
| 8 |
GET /health β liveness check
|
| 9 |
"""
|
| 10 |
|
| 11 |
+
from typing import List, Optional
|
| 12 |
|
| 13 |
from fastapi import FastAPI, Query
|
| 14 |
from pydantic import BaseModel
|
|
|
|
| 16 |
from environment import OpenEnvAuctioneer
|
| 17 |
from models import Action
|
| 18 |
|
| 19 |
+
app = FastAPI(title="OpenEnv Creative Auctioneer", version="0.4.0")
|
| 20 |
|
| 21 |
# ---------------------------------------------------------------------------
|
| 22 |
# Global environment instance (one per container)
|
|
|
|
| 33 |
headline_id: int
|
| 34 |
creative_id: int
|
| 35 |
generated_caption: Optional[str] = None
|
| 36 |
+
generated_hashtags: Optional[List[str]] = None
|
| 37 |
|
| 38 |
|
| 39 |
class ResetResponse(BaseModel):
|
|
|
|
| 76 |
headline_id=action.headline_id,
|
| 77 |
creative_id=action.creative_id,
|
| 78 |
generated_caption=action.generated_caption,
|
| 79 |
+
generated_hashtags=action.generated_hashtags,
|
| 80 |
)
|
| 81 |
obs, reward, done, info = _env.step(act)
|
| 82 |
return StepResponse(
|
inference.py
CHANGED
|
@@ -73,9 +73,27 @@ SYSTEM_PROMPTS: Dict[str, str] = {
|
|
| 73 |
If budget < $5 before hour 18, bid $0."""),
|
| 74 |
|
| 75 |
"hard_assembly": textwrap.dedent("""\
|
| 76 |
-
You
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
"hard_sequencing": textwrap.dedent("""\
|
| 81 |
You focus on CROSS-CONTEXT CAMPAIGN SEQUENCING.
|
|
@@ -184,12 +202,38 @@ def build_user_prompt(task_id: str, obs: dict) -> str:
|
|
| 184 |
]
|
| 185 |
if task_id == "hard_sequencing":
|
| 186 |
lines.append(f"Carryover boost: {obs.get('carryover_boost', 0):.2f}")
|
| 187 |
-
|
| 188 |
-
schema = '{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>'
|
| 189 |
if task_id == "hard_assembly":
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
return "\n".join(lines)
|
| 194 |
|
| 195 |
|
|
@@ -239,6 +283,7 @@ async def run_task(task_id: str, image_name: str) -> float:
|
|
| 239 |
headline_id=int(action_data.get("headline_id", 0)),
|
| 240 |
creative_id=int(action_data.get("creative_id", 0)),
|
| 241 |
generated_caption=action_data.get("generated_caption"),
|
|
|
|
| 242 |
)
|
| 243 |
|
| 244 |
result = await env.step(action)
|
|
@@ -251,6 +296,8 @@ async def run_task(task_id: str, image_name: str) -> float:
|
|
| 251 |
act_str = f"bid({action.bid_price:.2f},h={action.headline_id},c={action.creative_id})"
|
| 252 |
if action.generated_caption:
|
| 253 |
act_str += f",cap={action.generated_caption[:25]}"
|
|
|
|
|
|
|
| 254 |
|
| 255 |
log_step(step=step, action=act_str, reward=reward,
|
| 256 |
done=result.done, error=None)
|
|
@@ -302,4 +349,4 @@ async def main() -> None:
|
|
| 302 |
|
| 303 |
|
| 304 |
if __name__ == "__main__":
|
| 305 |
-
asyncio.run(main())
|
|
|
|
| 73 |
If budget < $5 before hour 18, bid $0."""),
|
| 74 |
|
| 75 |
"hard_assembly": textwrap.dedent("""\
|
| 76 |
+
You are an AI Account Manager and Creative Director for the hard_assembly task.
|
| 77 |
+
|
| 78 |
+
YOUR JOB each step:
|
| 79 |
+
1. You receive a SOURCE AD CREATIVE: an image description + a base caption.
|
| 80 |
+
2. You receive LIVE VIRAL HASHTAGS scraped from Google Trends / Reddit.
|
| 81 |
+
3. You receive the current VIRAL TREND token (cultural keyword).
|
| 82 |
+
4. You must ASSEMBLE a final ad by:
|
| 83 |
+
(a) Selecting 2β4 hashtags from the live list that best match the trend.
|
| 84 |
+
(b) Rewriting the base caption to weave those hashtags into natural, punchy
|
| 85 |
+
ad copy β DO NOT just append hashtags at the end. Blend them into prose.
|
| 86 |
+
(c) Adding your own creative words (target 30β50% new vocabulary).
|
| 87 |
+
(d) The final caption must stay coherent with the image description.
|
| 88 |
+
|
| 89 |
+
GRADER weights (what earns you points):
|
| 90 |
+
35% β Hashtag relevance: chosen hashtags semantically match viral_trend
|
| 91 |
+
35% β Caption-trend align: your caption text matches viral_trend vocabulary
|
| 92 |
+
20% β Image coherence: your caption stays faithful to the image
|
| 93 |
+
10% β Novelty: you added real creative words, not just copy-paste
|
| 94 |
+
|
| 95 |
+
REWARD: auction_base + (composite_score Γ $8 bonus per winning step)
|
| 96 |
+
BUDGET: $120 for 24 hours. Bid $0.60β$1.50 per step."""),
|
| 97 |
|
| 98 |
"hard_sequencing": textwrap.dedent("""\
|
| 99 |
You focus on CROSS-CONTEXT CAMPAIGN SEQUENCING.
|
|
|
|
| 202 |
]
|
| 203 |
if task_id == "hard_sequencing":
|
| 204 |
lines.append(f"Carryover boost: {obs.get('carryover_boost', 0):.2f}")
|
| 205 |
+
|
|
|
|
| 206 |
if task_id == "hard_assembly":
|
| 207 |
+
# Show source creative and live hashtags
|
| 208 |
+
img_desc = obs.get("image_description", "")
|
| 209 |
+
base_cap = obs.get("base_caption", "")
|
| 210 |
+
live_tags = obs.get("live_hashtags", [])
|
| 211 |
+
hashtag_list = " ".join(live_tags) if live_tags else "(none scraped)"
|
| 212 |
+
lines.append("")
|
| 213 |
+
lines.append(f"βββββ SOURCE CREATIVE βββββ")
|
| 214 |
+
lines.append(f"Image description : {img_desc}")
|
| 215 |
+
lines.append(f"Base caption : {base_cap}")
|
| 216 |
+
lines.append(f"")
|
| 217 |
+
lines.append(f"βββββ LIVE VIRAL HASHTAGS (scraped now) βββββ")
|
| 218 |
+
lines.append(f" {hashtag_list}")
|
| 219 |
+
lines.append(f"")
|
| 220 |
+
lines.append(f"βββββ TASK βββββ")
|
| 221 |
+
lines.append(f"Select 2β4 hashtags from the list above that best match "
|
| 222 |
+
f"the viral trend '{obs['viral_trend']}'.")
|
| 223 |
+
lines.append(f"Rewrite the base caption to weave them in naturally.")
|
| 224 |
+
lines.append(f"Stay coherent with the image. Add your own creative words.")
|
| 225 |
+
lines.append("")
|
| 226 |
+
schema = ('Respond ONLY with JSON:\n'
|
| 227 |
+
'{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>, '
|
| 228 |
+
'"generated_caption": "<your caption>", '
|
| 229 |
+
'"generated_hashtags": ["#Tag1", "#Tag2", ...]}')
|
| 230 |
+
else:
|
| 231 |
+
lines.append(CATALOG_CTX)
|
| 232 |
+
schema = '{"bid_price": <float>, "headline_id": <int 0-5>, "creative_id": <int 0-5>}'
|
| 233 |
+
if task_id != "hard_assembly":
|
| 234 |
+
schema = f"Respond ONLY with JSON: {schema}"
|
| 235 |
+
|
| 236 |
+
lines.append(schema)
|
| 237 |
return "\n".join(lines)
|
| 238 |
|
| 239 |
|
|
|
|
| 283 |
headline_id=int(action_data.get("headline_id", 0)),
|
| 284 |
creative_id=int(action_data.get("creative_id", 0)),
|
| 285 |
generated_caption=action_data.get("generated_caption"),
|
| 286 |
+
generated_hashtags=action_data.get("generated_hashtags"),
|
| 287 |
)
|
| 288 |
|
| 289 |
result = await env.step(action)
|
|
|
|
| 296 |
act_str = f"bid({action.bid_price:.2f},h={action.headline_id},c={action.creative_id})"
|
| 297 |
if action.generated_caption:
|
| 298 |
act_str += f",cap={action.generated_caption[:25]}"
|
| 299 |
+
if action.generated_hashtags:
|
| 300 |
+
act_str += f",tags={len(action.generated_hashtags)}"
|
| 301 |
|
| 302 |
log_step(step=step, action=act_str, reward=reward,
|
| 303 |
done=result.done, error=None)
|
|
|
|
| 349 |
|
| 350 |
|
| 351 |
if __name__ == "__main__":
|
| 352 |
+
asyncio.run(main())
|
models.py
CHANGED
|
@@ -4,13 +4,15 @@ models.py β Typed data contracts for the OpenEnv Creative Auctioneer.
|
|
| 4 |
All tensors / vectors are represented as plain Python types so the environment
|
| 5 |
stays framework-agnostic (no hard dependency on PyTorch at this layer).
|
| 6 |
|
| 7 |
-
Dataset provenance (v0.
|
| 8 |
CTR calibration β MIND (Microsoft News Dataset) behaviours.tsv + news.tsv
|
| 9 |
Market engine β iPinYou Global RTB logs (Lognormal per hour)
|
| 10 |
Persona bank β Vogue Dialogue Dataset
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
| 13 |
-
from typing import Optional
|
| 14 |
from pydantic import BaseModel, Field
|
| 15 |
|
| 16 |
|
|
@@ -42,7 +44,7 @@ class Observation(BaseModel):
|
|
| 42 |
# ββ Contextual Signals (Privacy-Native β no user IDs) ββββββββββββββββββ
|
| 43 |
current_context: str = Field(...,
|
| 44 |
description="Content category derived from MIND news.tsv taxonomy "
|
| 45 |
-
"(e.g. '
|
| 46 |
news_category: str = Field(default="",
|
| 47 |
description="Fine-grained MIND subcategory (e.g. 'nfl', 'gadgets'). "
|
| 48 |
"Provides richer signal than coarse context alone.")
|
|
@@ -50,6 +52,20 @@ class Observation(BaseModel):
|
|
| 50 |
description="Current cultural viral token surfaced from Reels "
|
| 51 |
"(e.g. 'Quiet Luxury', 'Eco-Friendly', 'Cyberpunk', 'Minimalism').")
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
# ββ Market Signals βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 54 |
market_pressure: float = Field(default=0.5, ge=0.0, le=1.0,
|
| 55 |
description="Normalised indicator of how competitive the auction is "
|
|
@@ -60,10 +76,10 @@ class Observation(BaseModel):
|
|
| 60 |
description="Number of ads already shown; drives the fatigue penalty.")
|
| 61 |
fatigue_level: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 62 |
description="Accumulated user-fatigue penalty (0 = fresh, 1 = fully fatigued).")
|
|
|
|
|
|
|
| 63 |
|
| 64 |
# ββ Performance Feedback (delayed by 1 step) βββββββββββββββββββββββββββ
|
| 65 |
-
carryover_boost: float = Field(default=0.0, ge=0.0, le=0.30,
|
| 66 |
-
description="CTR boost from brand-recall carry-over (decaying from prior ad wins).")
|
| 67 |
last_ctr: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 68 |
description="CTR returned by the User Simulator on the previous step.")
|
| 69 |
cumulative_revenue: float = Field(default=0.0,
|
|
@@ -85,9 +101,18 @@ class Action(BaseModel):
|
|
| 85 |
creative_id: int = Field(..., ge=0, le=5,
|
| 86 |
description="Index into the Creatives Catalog (0β5).")
|
| 87 |
|
| 88 |
-
#
|
| 89 |
generated_caption: Optional[str] = Field(default=None,
|
| 90 |
-
description="
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
|
| 93 |
# ---------------------------------------------------------------------------
|
|
@@ -111,16 +136,32 @@ class Info(BaseModel):
|
|
| 111 |
task_score: float = Field(..., ge=0.0, le=1.0,
|
| 112 |
description="Final 0.0β1.0 task-completion score.")
|
| 113 |
|
| 114 |
-
# Level
|
| 115 |
headline_alignment_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 116 |
description="[easy_headline] CTR_selected / CTR_best for this context.")
|
|
|
|
|
|
|
| 117 |
pacing_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 118 |
description="[medium_pacing] Budget-smoothness and peak-hour survival bonus.")
|
|
|
|
|
|
|
| 119 |
clip_similarity_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 120 |
-
description="[hard_assembly]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
sequencing_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 122 |
-
description="[hard_sequencing]
|
| 123 |
-
contexts_covered: int = Field(default=0,
|
| 124 |
-
description="[hard_sequencing] Number of distinct contexts
|
| 125 |
diversity_multiplier: float = Field(default=1.0,
|
| 126 |
-
description="[hard_sequencing]
|
|
|
|
| 4 |
All tensors / vectors are represented as plain Python types so the environment
|
| 5 |
stays framework-agnostic (no hard dependency on PyTorch at this layer).
|
| 6 |
|
| 7 |
+
Dataset provenance (v0.4):
|
| 8 |
CTR calibration β MIND (Microsoft News Dataset) behaviours.tsv + news.tsv
|
| 9 |
Market engine β iPinYou Global RTB logs (Lognormal per hour)
|
| 10 |
Persona bank β Vogue Dialogue Dataset
|
| 11 |
+
Ad+Caption pool β MS-COCO Captions OR Google Conceptual Captions CC3M
|
| 12 |
+
Viral hashtags β Pytrends / Hashtagify / static fallback table
|
| 13 |
"""
|
| 14 |
|
| 15 |
+
from typing import List, Optional
|
| 16 |
from pydantic import BaseModel, Field
|
| 17 |
|
| 18 |
|
|
|
|
| 44 |
# ββ Contextual Signals (Privacy-Native β no user IDs) ββββββββββββββββββ
|
| 45 |
current_context: str = Field(...,
|
| 46 |
description="Content category derived from MIND news.tsv taxonomy "
|
| 47 |
+
"(e.g. 'Fitness', 'Tech', 'Fashion', 'Gaming').")
|
| 48 |
news_category: str = Field(default="",
|
| 49 |
description="Fine-grained MIND subcategory (e.g. 'nfl', 'gadgets'). "
|
| 50 |
"Provides richer signal than coarse context alone.")
|
|
|
|
| 52 |
description="Current cultural viral token surfaced from Reels "
|
| 53 |
"(e.g. 'Quiet Luxury', 'Eco-Friendly', 'Cyberpunk', 'Minimalism').")
|
| 54 |
|
| 55 |
+
# ββ hard_assembly: live scraped hashtags + source creative βββββββββββββ
|
| 56 |
+
live_hashtags: List[str] = Field(default_factory=list,
|
| 57 |
+
description="[hard_assembly] Real-time scraped viral hashtags from "
|
| 58 |
+
"Google Trends / Reddit. The agent selects which to use "
|
| 59 |
+
"and weaves them into generated_caption. "
|
| 60 |
+
"Example: ['#QuietLuxury', '#OOTD', '#SlowFashion'].")
|
| 61 |
+
image_description: str = Field(default="",
|
| 62 |
+
description="[hard_assembly] Text description of the source ad image "
|
| 63 |
+
"from AdCaptionDataset (COCO or seed pool). "
|
| 64 |
+
"Agent caption must stay coherent with this.")
|
| 65 |
+
base_caption: str = Field(default="",
|
| 66 |
+
description="[hard_assembly] Base caption from AdCaptionDataset. "
|
| 67 |
+
"Agent rewrites this to incorporate viral hashtags.")
|
| 68 |
+
|
| 69 |
# ββ Market Signals βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 70 |
market_pressure: float = Field(default=0.5, ge=0.0, le=1.0,
|
| 71 |
description="Normalised indicator of how competitive the auction is "
|
|
|
|
| 76 |
description="Number of ads already shown; drives the fatigue penalty.")
|
| 77 |
fatigue_level: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 78 |
description="Accumulated user-fatigue penalty (0 = fresh, 1 = fully fatigued).")
|
| 79 |
+
carryover_boost: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 80 |
+
description="[hard_sequencing] Carry-over CTR boost from winning prior auctions.")
|
| 81 |
|
| 82 |
# ββ Performance Feedback (delayed by 1 step) βββββββββββββββββββββββββββ
|
|
|
|
|
|
|
| 83 |
last_ctr: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 84 |
description="CTR returned by the User Simulator on the previous step.")
|
| 85 |
cumulative_revenue: float = Field(default=0.0,
|
|
|
|
| 101 |
creative_id: int = Field(..., ge=0, le=5,
|
| 102 |
description="Index into the Creatives Catalog (0β5).")
|
| 103 |
|
| 104 |
+
# ββ hard_assembly fields ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 105 |
generated_caption: Optional[str] = Field(default=None,
|
| 106 |
+
description="[hard_assembly] Final assembled caption β should incorporate "
|
| 107 |
+
"viral hashtags and remain coherent with the source image. "
|
| 108 |
+
"Leave None for easy/medium tasks.")
|
| 109 |
+
|
| 110 |
+
generated_hashtags: Optional[List[str]] = Field(default=None,
|
| 111 |
+
description="[hard_assembly] List of hashtag strings (with #) that the agent "
|
| 112 |
+
"chose to include. The agent must scrape these from ViralHashtagScraper "
|
| 113 |
+
"and select which ones to weave into generated_caption. "
|
| 114 |
+
"Example: ['#QuietLuxury', '#OOTD', '#SlowFashion']. "
|
| 115 |
+
"Leave None for easy/medium/sequencing tasks.")
|
| 116 |
|
| 117 |
|
| 118 |
# ---------------------------------------------------------------------------
|
|
|
|
| 136 |
task_score: float = Field(..., ge=0.0, le=1.0,
|
| 137 |
description="Final 0.0β1.0 task-completion score.")
|
| 138 |
|
| 139 |
+
# Level 1 sub-score
|
| 140 |
headline_alignment_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 141 |
description="[easy_headline] CTR_selected / CTR_best for this context.")
|
| 142 |
+
|
| 143 |
+
# Level 2 sub-score
|
| 144 |
pacing_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 145 |
description="[medium_pacing] Budget-smoothness and peak-hour survival bonus.")
|
| 146 |
+
|
| 147 |
+
# Level 3 sub-scores (all three axes)
|
| 148 |
clip_similarity_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 149 |
+
description="[hard_assembly] Composite grader score (0.35Γhashtag + 0.35Γalign + 0.30Γcoherence).")
|
| 150 |
+
hashtag_relevance_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 151 |
+
description="[hard_assembly] Mean cosine_sim(chosen_hashtag, viral_trend).")
|
| 152 |
+
caption_trend_alignment: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 153 |
+
description="[hard_assembly] cosine_sim(final_caption, viral_trend).")
|
| 154 |
+
caption_image_coherence: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 155 |
+
description="[hard_assembly] cosine_sim(final_caption, image_description).")
|
| 156 |
+
chosen_hashtags: List[str] = Field(default_factory=list,
|
| 157 |
+
description="[hard_assembly] Hashtags the agent chose this step.")
|
| 158 |
+
assembly_reward_bonus: float = Field(default=0.0,
|
| 159 |
+
description="[hard_assembly] Extra reward granted for viral alignment quality.")
|
| 160 |
+
|
| 161 |
+
# Level 4 sub-scores
|
| 162 |
sequencing_score: float = Field(default=0.0, ge=0.0, le=1.0,
|
| 163 |
+
description="[hard_sequencing] agent_conversions / oracle_conversions Γ diversity.")
|
| 164 |
+
contexts_covered: int = Field(default=0,
|
| 165 |
+
description="[hard_sequencing] Number of distinct contexts won at least once.")
|
| 166 |
diversity_multiplier: float = Field(default=1.0,
|
| 167 |
+
description="[hard_sequencing] Bonus multiplier for covering β₯3 contexts.")
|
requirements.txt
CHANGED
|
@@ -9,18 +9,28 @@ openai>=1.0.0
|
|
| 9 |
sentence-transformers>=2.2.2
|
| 10 |
torch>=2.0.0
|
| 11 |
|
| 12 |
-
#
|
| 13 |
-
#
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
# datasets>=2.18.0
|
| 16 |
|
| 17 |
# Optional: LLM-based User Simulator (activate with USE_LLM_SIMULATOR=1)
|
| 18 |
-
#
|
| 19 |
# transformers>=4.40.0
|
| 20 |
# bitsandbytes>=0.43.0
|
| 21 |
# accelerate>=0.29.0
|
| 22 |
|
| 23 |
-
# Optional: true CLIP scoring for hard_assembly grader
|
| 24 |
# open-clip-torch>=2.24.0
|
| 25 |
|
| 26 |
# Serving / inference client
|
|
|
|
| 9 |
sentence-transformers>=2.2.2
|
| 10 |
torch>=2.0.0
|
| 11 |
|
| 12 |
+
# ββ hard_assembly: ViralHashtagScraper ββββββββββββββββββββββββββββββββββββββ
|
| 13 |
+
# Source 1: Google Trends (free, no API key)
|
| 14 |
+
pytrends>=4.9.2
|
| 15 |
+
# Source 2: Reddit public REST API β uses stdlib urllib (no extra dep needed)
|
| 16 |
+
|
| 17 |
+
# ββ hard_assembly: AdCaptionDataset βββββββββββββββββββββββββββββββββββββββββ
|
| 18 |
+
# MS-COCO Captions 2017 val annotations (~241 MB)
|
| 19 |
+
# Set COCO_SOURCE=url to auto-download on first run (uses stdlib urllib + zipfile)
|
| 20 |
+
# Set COCO_SOURCE=local (default) + place captions_val2017.json at Datasets/coco_captions/
|
| 21 |
+
|
| 22 |
+
# ββ MIND dataset β Option A HuggingFace (zero local disk) βββββββββββββββββββ
|
| 23 |
+
# Uses stdlib urllib for direct TSV download (no extra deps needed)
|
| 24 |
+
# Uncomment below if you prefer the full HuggingFace datasets library:
|
| 25 |
# datasets>=2.18.0
|
| 26 |
|
| 27 |
# Optional: LLM-based User Simulator (activate with USE_LLM_SIMULATOR=1)
|
| 28 |
+
# Requires GPU with β₯16 GB VRAM
|
| 29 |
# transformers>=4.40.0
|
| 30 |
# bitsandbytes>=0.43.0
|
| 31 |
# accelerate>=0.29.0
|
| 32 |
|
| 33 |
+
# Optional: true CLIP image+text scoring for hard_assembly grader
|
| 34 |
# open-clip-torch>=2.24.0
|
| 35 |
|
| 36 |
# Serving / inference client
|