hamverbot
/

rtb-bidding-comparison

Model card Files Files and versions

xet

Community

hamverbot commited on 3 days ago

Commit

7028d46

verified ·

1 Parent(s): 3c25a45

Upload RESEARCH_RESOURCES.md

Browse files

Files changed (1) hide show

RESEARCH_RESOURCES.md +565 -0

RESEARCH_RESOURCES.md ADDED Viewed

	@@ -0,0 +1,565 @@

+# RTB Bidding Algorithm Comparison — Complete Research Resource List
+> Generated: 2026-05-05 | Author: ML Intern for hamverbot
+> Repository: https://huggingface.co/hamverbot/rtb-bidding-comparison
+---
+## Table of Contents
+1. [Bidding Algorithms](#1-bidding-algorithms)
+2. [CTR Prediction Models](#2-ctr-prediction-models)
+3. [Clearing Price / Market Price Prediction](#3-clearing-price--market-price-prediction)
+4. [Datasets](#4-datasets)
+5. [Codebases & Implementations](#5-codebases--implementations)
+6. [Benchmark Leaderboards](#6-benchmark-leaderboards)
+7. [Recommended Architecture](#7-recommended-architecture)
+---
+## 1. Bidding Algorithms
+### 1.1 Lagrangian Dual + Online Gradient Descent (BEST MATCH)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Learning to Bid in Repeated First-Price Auctions with Budgets" |
+| **Authors** | Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong (2023) |
+| **Venue** | NeurIPS 2023 (implied) |
+| **arXiv** | [2304.13477](https://arxiv.org/abs/2304.13477) |
+| **HF Papers** | https://huggingface.co/papers/2304.13477 |
+| **Algorithm** | DualOGD — Lagrangian dual multiplier updated by online error gradient descent |
+| **Auction Type** | First-price (also handles second-price) |
+| **Constraints** | Budget cap: total spend ≤ ρT |
+| **Regret Bound** | Õ(√T) for both full-information and one-sided feedback |
+| **Key Formula** | λ_{t+1} = Proj_{λ>0}(λ_t − ε·(ρ − c̃_t(b_t))) |
+| **Bid Rule** | b_t = argmax_b (r̃_t(v_t, b) − λ_t·c̃_t(b)) |
+| **Prediction Models Needed** | CTR predictor (for v_t), empirical CDF of competing bids (G̃) |
+| **Why It's The Best Match** | You explicitly described "Lagrangian dual multiplier and updating the dual variables online by error gradient descent" — this is exactly Algorithm 1, line 7. |
+### 1.2 Dual Mirror Descent (Second-Price)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems" |
+| **Authors** | Santiago Balseiro, Haihao Lu, Vahab Mirrokni (2020) |
+| **Venue** | Operations Research (2023) / NeurIPS 2020 Workshop |
+| **arXiv** | [2011.10124](https://arxiv.org/abs/2011.10124) |
+| **HF Papers** | https://huggingface.co/papers/2011.10124 |
+| **Citations** | 135+ |
+| **Algorithm** | Dual mirror descent — generalizes OGD with Bregman divergences |
+| **Auction Type** | Second-price (truthful) |
+| **Bid Rule** | b_t = v_t / (1 + μ_t) |
+| **Dual Update** | μ_{t+1} = Proj(μ_t − η·(ρ − payment_t)) |
+| **Key Insight** | In second-price auctions, you don't need a market price model. The dual multiplier naturally paces spending. |
+| **Prediction Models** | CTR predictor only (no market price model needed) |
+### 1.3 Dual Descent with RoS + Budget (Multi-Constraint)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Online Bidding Algorithms for Return-on-Spend Constrained Advertisers" |
+| **Authors** | Zhe Feng, Swati Padmanabhan, Di Wang (2022) |
+| **Venue** | ICML 2022 |
+| **arXiv** | [2208.13713](https://arxiv.org/abs/2208.13713) |
+| **Citations** | 38+ |
+| **Algorithm** | Two dual variables: λ for RoS, μ for budget |
+| **Bid Rule** | b_t = ((1+λ_t)/(μ_t+λ_t)) · v_t |
+| **Updates** | λ_{t+1} = λ_t·exp(-α·(v_t·x_t(b_t) − p_t(b_t))) [multiplicative]; μ_{t+1} = Proj(μ_t − η·(ρ − p_t(b_t))) [sub-gradient] |
+| **Key Insight** | Can be adapted for your "ensure k% spend" floor — use second dual variable to enforce minimum spend |
+| **Prediction Models** | CTR predictor (v_t), payment observed |
+### 1.4 RLB — Reinforcement Learning Bidding
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Real-Time Bidding by Reinforcement Learning in Display Advertising" |
+| **Authors** | Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo (2017) |
+| **Venue** | WSDM 2017 |
+| **arXiv** | [1701.02490](https://arxiv.org/abs/1701.02490) |
+| **HF Papers** | https://huggingface.co/papers/1701.02490 |
+| **GitHub** | https://github.com/han-cai/rlb-dp (188 stars) |
+| **Algorithm** | MDP + Dynamic Programming + Neural value function approximation |
+| **State** | (t remaining auctions, b remaining budget, x feature vector) |
+| **Action** | bid price a ∈ [0, b] |
+| **Results** | +22% clicks over linear bidding at tight budgets on iPinYou |
+| **Prediction Models** | CTR θ(x) + market price distribution m(δ, x) |
+| **Key Insight** | Foundational; explicitly models the budget-depletion tradeoff via DP. Superseded by dual methods for budget pacing but still influential. |
+### 1.5 HiBid — Industrial Hierarchical Dual-RL
+| Property | Detail |
+|----------|--------|
+| **Paper** | "HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning" |
+| **Authors** | Yuhang Wang et al. (2023) |
+| **arXiv** | [2312.17503](https://arxiv.org/abs/2312.17503) |
+| **HF Papers** | https://huggingface.co/papers/2312.17503 |
+| **Algorithm** | High-level RL budget allocation + Low-level λ-parameterized bidding |
+| **Scale** | 64K advertisers, 70M requests/day, 4 channels, deployed at Meituan |
+| **Results** | Outperforms RL-based baselines (R-BCQ, BCQ, CQL) on clicks, CPC, CSR, ROI |
+### 1.6 Contextual First-Price Extension (Very Recent!)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" |
+| **Authors** | (2026) |
+| **arXiv** | [2603.07207](https://arxiv.org/abs/2603.07207) |
+| **Algorithm** | Dual OGD + quantile-based contextual censored regression |
+| **Key Innovation** | Extends Wang et al. (2023) to handle contextual (feature-based) auctions with a novel quantile trick for censored data |
+| **Regret** | Õ(√T) in contextual first-price auctions |
+### 1.7 Unified View of Lagrangian Dual Multiplier Methods
+All dual methods follow the same template:
+```
+For each auction t:
+1. Observe value v_t (from CTR prediction × click value)
+2. Compute bid: b_t = f(v_t, dual_multiplier_t)
+3. Observe outcome: payment c_t (if won) or 0 (if lost)
+4. Compute gradient: g_t = ρ − c_t
+5. Update multiplier: λ_{t+1} = Proj_{λ≥0}(λ_t − η·g_t)
+```
+| Method | Auction | Bid Function f(v, λ) |
+|--------|---------|----------------------|
+| Wang 2023 | First-price | argmax_b (r̃(v,b) − λ·c̃(b)) |
+| Balseiro 2020 | Second-price | v / (1+λ) |
+| Feng 2022 | Second-price | ((1+λ_RoS)/(λ_RoS+λ_budget)) · v |
+### 1.8 Additional Papers (Supplementary)
+| Paper | Key Idea | arXiv |
+|-------|----------|-------|
+| Dynamic Budget Throttling | Throttle participation rate to control spend | 2207.04690 |
+| No-Regret Learning in Repeated First-Price Auctions | General no-regret framework for first-price | 2205.14572 |
+| Robust Budget Pacing with a Single Sample | Near-optimal regret from 1 sample per distribution | 2302.02006 |
+| Learning to Bid Optimally in Adversarial First-Price | Adversarial (non-i.i.d.) setting | 2007.04568 |
+| Optimal No-Regret Learning in Repeated FPA | Minimax optimal bounds | 2003.09795 |
+| Multi-Channel Autobidding with Budget and ROI | Per-channel optimization optimality | 2302.01523 |
+| Leveraging the Hints: Adaptive Bidding | Uses hints/forecasts for better bidding | 2211.06358 |
+| Adaptive Bidding under Non-stationarity | Handles distribution shift | 2505.02796 |
+| Joint Value Estimation and Bidding | Simultaneous CTR learning + bidding | 2502.17292 |
+| No-Regret is not enough! | Adaptive regret for constrained bandits | 2405.06575 |
+| AIGB: Generative Auto-bidding | Diffusion models for bid trajectory generation | 2405.16141 |
+### Two-Sided Budget Constraint (Your Specific Need)
+You need: **maximize clicks s.t. spend ≤ B AND spend ≥ k·B**.
+This requires two dual variables:
+- **μ** for the budget cap: μ_{t+1} = Proj(μ_t − η₁·(ρ − spend_t))
+- **ν** for the spend floor: ν_{t+1} = Proj(ν_t − η₂·(spend_t − kρ))
+Bid function: b_t = v_t · f(μ_t, ν_t) where f decreases with μ and increases with ν.
+---
+## 2. CTR Prediction Models
+### 2.1 FinalMLP (RECOMMENDED — Best AUC, Fastest Inference)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction" |
+| **Authors** | Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong (2023) |
+| **Venue** | AAAI 2023 |
+| **arXiv** | [2304.00902](https://arxiv.org/abs/2304.00902) |
+| **HF Papers** | https://huggingface.co/papers/2304.00902 |
+| **Datasets** | reczoo/Criteo_x1, reczoo/Avazu_x1, reczoo/MovielensLatest_x1, reczoo/Frappe_x1 |
+| **Criteo AUC** | **0.8149** |
+| **Avazu AUC** | **0.7666** |
+| **Architecture** | Two-stream MLP: two independent MLP towers + feature gating (soft selection) + bilinear fusion |
+| **Inference Speed** | Fastest among SOTA (pure MLP, ~400-dim hidden, no attention) |
+| **Why Best for RTB** | Pure feed-forward, <1ms inference, easy to deploy |
+### 2.2 GDCN — Gated Deep Cross Network
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction" |
+| **Authors** | Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu (2023) |
+| **Venue** | CIKM 2023 |
+| **arXiv** | [2311.04635](https://arxiv.org/abs/2311.04635) |
+| **HF Papers** | https://huggingface.co/papers/2311.04635 |
+| **Criteo AUC** | **0.8161** (own split — not directly comparable) |
+| **Architecture** | DCNv2 + learned information gate per cross layer + Field-level Dimension Optimization (FDO) |
+| **Key Insight** | Gate filters noisy interactions; FDO compresses embeddings 60%+. Good for memory-constrained RTB. |
+### 2.3 DCNv2 — Industry Workhorse
+| Property | Detail |
+|----------|--------|
+| **Paper** | "DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems" |
+| **Authors** | Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi (2021) |
+| **Venue** | WWW 2021 |
+| **arXiv** | [2008.13535](https://arxiv.org/abs/2008.13535) |
+| **HF Papers** | https://huggingface.co/papers/2008.13535 |
+| **Criteo AUC** | **0.8142-0.8144** (retuned) |
+| **Architecture** | Embedding → parallel CrossNetV2 + DNN → concat → sigmoid |
+| **Key Insight** | Mixture-of-Experts-style low-rank decomposition. Battle-tested at Google. |
+### 2.4 DeepFM — Simple, Strong Baseline
+| Property | Detail |
+|----------|--------|
+| **Paper** | "DeepFM: A Factorization-Machine based Neural Network for CTR Prediction" |
+| **Authors** | Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He (2017) |
+| **Venue** | IJCAI 2017 |
+| **Criteo AUC** | **0.8138** (retuned) |
+| **Architecture** | Shared embedding → parallel FM (2nd-order) + DNN → sum → sigmoid |
+| **Key Insight** | Shared embedding between FM and DNN is the secret. End-to-end, no pre-training. |
+### 2.5 FCN — Fusing Cross Network (Most Recent)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction" |
+| **Authors** | (2024) |
+| **arXiv** | [2407.13349](https://arxiv.org/abs/2407.13349) |
+| **HF Papers** | https://huggingface.co/papers/2407.13349 |
+| **Architecture** | Two explicit cross sub-networks: LCN (linear, order grows linearly) + ECN (exponential, order doubles per layer) |
+| **Key Insight** | No DNN needed — all interactions explicit. 50% fewer params, 23% lower latency than DCNv2. |
+| **Caveat** | Newer paper with less community validation. GitHub: https://github.com/salmon1802/FCN |
+### 2.6 BARS Meta-Finding
+| Property | Detail |
+|----------|--------|
+| **Paper** | "BARS-CTR: Open Benchmarking for Click-Through Rate Prediction" |
+| **Authors** | Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He (2021) |
+| **Venue** | CIKM 2021 |
+| **arXiv** | [2009.05794](https://arxiv.org/abs/2009.05794) |
+| **HF Papers** | https://huggingface.co/papers/2009.05794 |
+| **Key Result** | After 7,000+ experiments and 12,000 GPU hours: **differences between SOTA deep CTR models are surprisingly small** (~0.1-0.3% AUC). Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering. All models converge to ~0.814 AUC on Criteo after proper tuning. |
+### 2.7 Additional CTR Papers
+| Paper | Key Idea | arXiv |
+|-------|----------|-------|
+| DIN (KDD 2018) | Attention over user behavior sequence | 1706.06978 |
+| DIEN (AAAI 2019) | Interest evolution with GRU + attention | 1809.03672 |
+| xDeepFM (KDD 2018) | Compressed Interaction Network (CIN) for vector-wise crosses | 1803.05170 |
+| AutoInt (CIKM 2019) | Multi-head self-attention for feature interactions | 1810.11921 |
+| DLRM (Meta, 2019) | Specialized for recommendation: MLP for dense + embedding for sparse | 1906.00091 |
+| Wide & Deep (Google, 2016) | Memorization (wide) + generalization (deep) | 1606.07792 |
+| FTRL-Proximal (KDD 2013) | "Ad Click Prediction: a View from the Trenches" — online learning for linear CTR | — |
+| Streaming CTR (2023) | Online CTR with non-stationary data | 2307.07509 |
+### 2.8 Latency Considerations for RTB
+| Model | Architecture | Inference Speed | RTB-Suitable |
+|-------|-------------|-----------------|--------------|
+| **FinalMLP** | Pure MLP | ⭐⭐⭐⭐⭐ (<1ms) | ✅ Best |
+| **DCNv2** | CrossNet + DNN | ⭐⭐⭐⭐ | ✅ |
+| **GDCN** | Gated Cross + DNN | ⭐⭐⭐⭐ | ✅ |
+| **DeepFM** | FM + DNN | ⭐⭐⭐⭐ | ✅ |
+| **FCN** | LCN + ECN (no DNN) | ⭐⭐⭐⭐ | ✅ |
+| DIN | Attention (user history) | ⭐⭐ | ❌ Too slow |
+| DIEN | GRU + attention | ⭐ | ❌ Too slow |
+| AutoInt | Multi-head attention | ⭐⭐ | ❌ Too slow |
+---
+## 3. Clearing Price / Market Price Prediction
+### 3.1 Non-Parametric Empirical CDF (RECOMMENDED BASELINE)
+| Property | Detail |
+|----------|--------|
+| **Source** | Wang et al. (2023), Algorithm 1, Section 3.1 |
+| **arXiv** | [2304.13477](https://arxiv.org/abs/2304.13477) |
+| **Method** | Maintain array of observed competing bids d_s, estimate G̃_t(b) = (1/(t-1))∑𝟙{b ≥ d_s} |
+| **Win Probability** | P(win\|b) = G̃_t(b) |
+| **Expected Cost** | E[cost\|win,b] = (1/G̃_t(b)) · mean of {d_s : d_s ≤ b} |
+| **Pros** | No model training needed, theoretically sound (Õ(√T) regret), handles distribution shift naturally |
+| **Cons** | No context/features, cold-start when t is small |
+### 3.2 Censored Linear Regression (Wu et al. 2015)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Predicting Winning Price in Real Time Bidding with Censored Data" |
+| **Authors** | Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen (2015) |
+| **Venue** | KDD 2015 |
+| **Citations** | ~101 |
+| **Method** | Tobit-like model: log(market_price) = β·x + ε, ε ~ N(0, σ²) |
+| **Key Insight** | Properly handles censoring via likelihood: winning auctions contribute f(price\|x), losing auctions contribute S(bid\|x) |
+| **Pros** | Contextual, simple, computationally cheap |
+| **Cons** | Linear model — limited capacity for complex interactions |
+### 3.3 Deep Censored Learning / Survival Analysis
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Deep Censored Learning of the Winning Price" (Zhu et al., WWW 2019) |
+| **Method** | Neural network trained with censored survival loss |
+| **Loss** | Winning: -log f(price\|x); Losing: -log S(bid\|x) |
+| **Library** | **TorchSurv** ([arXiv:2404.10761](https://arxiv.org/abs/2404.10761), Novartis, 200★ GitHub) |
+| **TorchSurv URL** | https://github.com/Novartis/torchsurv |
+| **TorchSurv Docs** | https://opensource.nibr.com/torchsurv/ |
+| **PyPI** | `pip install torchsurv` |
+| **Key Insight** | Proper survival framework handles censoring. Win = exact price observed (uncensored). Loss = only lower bound (censored at bid). |
+| **Architecture** | Deep FC predicting either hazard rate λ(t\|x) (Cox PH) or distribution parameters (Weibull/log-normal AFT) |
+```python
+# TorchSurv pattern for market price:
+from torchsurv.loss import cox
+log_hazard = model(features)  # shape (batch,)
+# event=1 if won, 0 if lost (censored)
+# time = market_price if won, bid if lost
+loss = cox.neg_partial_log_likelihood(log_hazard, event, time)
+```
+### 3.4 Win Probability Neural Network (Simplest ML)
+| Property | Detail |
+|----------|--------|
+| **Method** | Direct binary classification: P(win\|bid_price, features) |
+| **Pros** | Dead simple, works with standard BCELoss |
+| **Cons** | Ignores censored price info when you win — only uses binary win/loss signal |
+| **Input** | features + bid_price → sigmoid |
+### 3.5 Parametric Distribution Fitting
+| Property | Detail |
+|----------|--------|
+| **Paper** | Referenced in RLB (Cai et al. 2017) — "Functional Bid Landscape Forecasting" (ECML-PKDD 2016) |
+| **Method** | Fit log-normal or gamma distribution to observed winning prices; predict parameters from features using GBDT |
+| **Pros** | Parametric assumptions reduce variance |
+| **Cons** | Distribution assumption may not hold; doesn't properly handle censoring |
+### 3.6 Contextual Quantile-Based (2026)
+| Property | Detail |
+|----------|--------|
+| **Paper** | "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" |
+| **arXiv** | [2603.07207](https://arxiv.org/abs/2603.07207) |
+| **Method** | Models competing bid as d_t = α·x_t + z_t (linear contextual); quantile-based estimator for α |
+| **Key Trick** | Splits samples by bid quantile and exploits identifiable conditional quantiles to circumvent full censoring |
+| **Pros** | Theoretical guarantees in contextual setting |
+| **Cons** | Linear contextual model only; very recent |
+### 3.7 Comparison Summary
+| Method | Contextual? | Handles Censoring? | Model Training? | Complexity |
+|--------|-------------|-------------------|-----------------|------------|
+| Empirical CDF | ❌ | N/A (full info) | None | Minimal |
+| Censored Linear Reg | ✅ | ✅ (proper likelihood) | Linear model | Low |
+| Deep Survival (TorchSurv) | ✅ | ✅ (proper likelihood) | Neural net | Medium |
+| Win Prob Classifier | ✅ | ❌ (binary only) | Neural net | Low |
+| Parametric (log-normal) | Optional | ❌ | GBDT | Medium |
+| Quantile Censored | ✅ | ✅ (quantile trick) | Linear | Medium-High |
+---
+## 4. Datasets
+### 4.1 CTR Prediction Datasets
+| Dataset | HF Hub Path | Size | Fields | Label | Verified |
+|---------|------------|------|--------|-------|----------|
+| **Criteo_x4** | `reczoo/Criteo_x4` | 45.8M rows, 5.6GB | 13 dense (I1-I13) + 26 categorical (C1-C26) | `Label` (0/1) | ✅ |
+| **Avazu_x4** | `reczoo/Avazu_x4` | 40.4M rows, 1.8GB | 22 fields (mixed) | `click` (0/1) | ✅ |
+| Criteo_x1 | `reczoo/Criteo_x1` | ~11M rows | Same as x4 | `Label` | ✅ |
+| Avazu_x1 | `reczoo/Avazu_x1` | ~10M rows | Same as x4 | `click` | ✅ |
+**Standard split**: 80% train / 10% val / 10% test (BARS protocol).
+### 4.2 RTB Bidding Datasets
+| Dataset | Source | Size | Format | Availability |
+|---------|--------|------|--------|-------------|
+| **iPinYou** | data.computational-advertising.org | 19.5M impressions, 9 campaigns, 10 days (2013) | Bid logs with market price | External download only (NOT on HF Hub) |
+| **YOYI** | Various academic mirrors | ~400M records | Bid logs | External download only |
+**iPinYou format**: `(click, paying_price, bid_price, slot_id, user_tags, ...)` — already includes market price info needed for bidding simulation.
+**Key Gap**: No RTB bid-log datasets on HuggingFace Hub. Criteo/Avazu have click labels but no bid/price columns — they can only be used for CTR training and require synthetic price generation for bidding evaluation.
+### 4.3 Data Requirements for Each Algorithm
+| Algorithm | Needs from Dataset |
+|-----------|-------------------|
+| Dual OGD (Wang) | click labels (CTR training) + competing bids (or synthetic prices for simulation) |
+| Dual Mirror Descent (Balseiro) | click labels + auction payment (second-price) |
+| RLB (Cai) | click labels + market prices + impression features |
+| CTR models (all) | click labels + features (Criteo/Avazu: ✅) |
+| Clearing price models | observed prices (won auctions) + bids (lost auctions) |
+---
+## 5. Codebases & Implementations
+### 5.1 CTR Model Libraries
+| Library | URL | Models | Framework | Notes |
+|---------|-----|--------|-----------|-------|
+| **FuxiCTR** | https://github.com/reczoo/FuxiCTR | 40+ (FinalMLP, DeepFM, DCNv2, GDCN, FCN, xDeepFM, AutoInt) | PyTorch | Config-driven (YAML). Used by all SOTA benchmark papers. |
+| **DeepCTR-Torch** | https://github.com/shenweichen/DeepCTR-Torch | 20+ (DeepFM, DCN, DIN, DIEN, xDeepFM) | PyTorch | Simpler API (Python class). Good for quick prototyping. |
+| **TorchSurv** | https://github.com/Novartis/torchsurv | Cox PH, Weibull AFT, DeepSurv, DeepHit | PyTorch | Deep survival analysis for clearing price. |
+| **BARS** | https://github.com/openbenchmark/BARS | Benchmarking | — | Standardized evaluation pipeline. 389★ |
+### 5.2 Bidding Algorithm Implementations
+| Repo | URL | Algorithms | Notes |
+|------|-----|------------|-------|
+| **rlb-dp** | https://github.com/han-cai/rlb-dp | RLB (MDP + DP) | 188 stars. Original implementation of RL for RTB. |
+| **budget_constrained_bidding** | https://github.com/dingmu365/budget_constrained_bidding | Budget-constrained RTB | Contains multiple budget-constrained bidding algorithms. |
+| **budget_constrained_bidding** (fork) | https://github.com/GinNie23/budget_constrained_bidding | Same | Fork with modifications. |
+| **Budget_Constrained_Bidding** | https://github.com/venkatacrc/Budget_Constrained_Bidding | Same | Another implementation. |
+| **hamverbot/rtb-bidding-comparison** | https://huggingface.co/hamverbot/rtb-bidding-comparison | DualOGD, Linear, ORTB, Threshold, MPC | **Your repo** — already has a working comparison framework! |
+### 5.3 FuxiCTR Quick Start
+```bash
+pip install fuxictr
+```
+```yaml
+# config/criteo_finalmlp.yaml
+dataset_id: Criteo_x4
+model: FinalMLP
+embedding_dim: 10
+hidden_units: [400, 400, 400]
+batch_size: 4096
+learning_rate: 1e-3
+epochs: 10
+metrics: [auc, logloss]
+```
+```python
+from fuxictr import autotuner
+autotuner.run("config/criteo_finalmlp.yaml", "Criteo_x4", "FinalMLP")
+```
+### 5.4 DeepCTR-Torch Quick Start
+```bash
+pip install deepctr-torch
+```
+```python
+from deepctr_torch.models import DeepFM
+from deepctr_torch.inputs import SparseFeat, DenseFeat
+sparse_features = [SparseFeat(f, vocab_size=df[f].nunique(), embedding_dim=10)
+                   for f in categorical_cols]
+dense_features = [DenseFeat(f, 1) for f in numerical_cols]
+model = DeepFM(linear_feature_columns=sparse_features + dense_features,
+               dnn_feature_columns=sparse_features + dense_features,
+               dnn_hidden_units=(400, 400, 400), device='cuda')
+model.compile('adam', 'binary_crossentropy', metrics=['auc'])
+model.fit(train_input, train_labels, batch_size=4096, epochs=10)
+```
+---
+## 6. Benchmark Leaderboards
+| Leaderboard | URL | Description |
+|-------------|-----|-------------|
+| **BARS CTR Criteo_x4** | https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x4.html | Definite CTR benchmark — 24 models compared |
+| **BARS CTR Criteo_x1** | https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x1.html | Smaller Criteo subset |
+| **BARS CTR Avazu** | https://openbenchmark.github.io/BARS/CTR/leaderboard/avazu_x4.html | Avazu benchmark |
+| **BARS Main** | https://openbenchmark.github.io/BARS | Full recommender systems benchmark |
+**Top Criteo_x4 AUC scores (from BARS):**
+- FinalMLP: 0.8149
+- DCNv2: 0.8142
+- DeepFM: 0.8138
+- xDeepFM: 0.8136
+- AutoInt+: 0.8134
+Key takeaway: Top 5 models are within 0.15% AUC of each other.
+---
+## 7. Recommended Architecture
+### For Your Problem: "Lagrangian Dual Multiplier with Online Error Gradient Descent"
+```
+┌─────────────────────────────────────────────────────────────┐
+│                   BIDDING ALGORITHM                          │
+│                                                              │
+│  Dual OGD (Wang et al. 2023)                                 │
+│  λ_{t+1} = Proj(λ_t - ε·(ρ - c̃_t(b_t)))                    │
+│  b_t = argmax_b (r̃_t(v_t, b) - λ_t·c̃_t(b))                 │
+│                                                              │
+├─────────────────────────────────────────────────────────────┤
+│                 PREDICTION MODELS                            │
+│                                                              │
+│  ┌──────────────────┐    ┌──────────────────────┐            │
+│  │  CTR Predictor   │    │  Clearing Price Est. ���            │
+│  │  (FinalMLP)      │    │  (Empirical CDF       │            │
+│  │                   │    │   OR TorchSurv)      │            │
+│  │  v_t = pCTR × V  │    │  G̃(b) = P(win | b)   │            │
+│  └──────────────────┘    └──────────────────────┘            │
+│                                                              │
+├─────────────────────────────────────────────────────────────┤
+│                     DATASETS                                 │
+│                                                              │
+│  Criteo_x4 (CTR training) + iPinYou (bidding simulation)     │
+│  OR: Criteo_x4 + synthetic price generation                  │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+### Implementation Priority
+1. **Phase 1**: Improve CTR model — replace current LogisticRegression with FinalMLP trained on Criteo_x4 (via FuxiCTR)
+2. **Phase 2**: Improve clearing price — add TorchSurv-based censored regression alongside current empirical CDF
+3. **Phase 3**: Add Balseiro dual mirror descent for comparison (simpler baseline, no market price model)
+4. **Phase 4**: Add two-sided budget constraint (cap + floor) with dual dual variables
+5. **Phase 5**: Full sweep over hyperparameters: step size ε, budget fraction k%, value per click, CTR model architecture
+### Online Learning Note
+For production RTB where the environment is non-stationary, implement periodic retraining:
+- Save model checkpoint every N hours
+- Reload and train on sliding window of most recent data
+- Deploy updated model without restarting the bidding algorithm
+The Lagrangian multiplier λ is intrinsically online (updated per auction). The CTR model needs separate periodic retraining.
+---
+## Paper Index (All Papers Referenced)
+| # | Paper | arXiv | Venue | Year | Citations |
+|---|-------|-------|-------|------|-----------|
+| 1 | Wang et al. — Learning to Bid in Repeated First-Price Auctions with Budgets | 2304.13477 | NeurIPS | 2023 | Growing |
+| 2 | Balseiro et al. — Dual Mirror Descent for Online Allocation | 2011.10124 | Ops Research | 2020 | 135+ |
+| 3 | Feng et al. — Online Bidding for RoS Constrained Advertisers | 2208.13713 | ICML | 2022 | 38+ |
+| 4 | Cai et al. — RTB by RL in Display Advertising | 1701.02490 | WSDM | 2017 | 300+ |
+| 5 | Wang et al. — HiBid Hierarchical DRL Bidding | 2312.17503 | — | 2023 | New |
+| 6 | — Online Bidding for Contextual First-Price (Quantile) | 2603.07207 | — | 2026 | New |
+| 7 | Mao et al. — FinalMLP | 2304.00902 | AAAI | 2023 | Growing |
+| 8 | Wang et al. — GDCN | 2311.04635 | CIKM | 2023 | Growing |
+| 9 | Wang et al. — DCN V2 | 2008.13535 | WWW | 2021 | 500+ |
+| 10 | Guo et al. — DeepFM | — | IJCAI | 2017 | 3000+ |
+| 11 | — FCN: Fusing Cross Network | 2407.13349 | — | 2024 | New |
+| 12 | Zhu et al. — BARS-CTR Benchmark | 2009.05794 | CIKM | 2021 | 100+ |
+| 13 | Wu et al. — Predicting Winning Price with Censored Data | — | KDD | 2015 | 101 |
+| 14 | — Deep Censored Learning of Winning Price | — | WWW | 2019 | Well-cited |
+| 15 | Katzman et al. — DeepSurv | — | BMC | 2018 | 1000+ |
+| 16 | — TorchSurv | 2404.10761 | — | 2024 | New |
+| 17 | — Robust Budget Pacing with a Single Sample | 2302.02006 | — | 2023 | Growing |
+| 18 | — Multi-Channel Autobidding with Budget and ROI | 2302.01523 | — | 2023 | Growing |
+| 19 | — No-Regret in Repeated FPA with Budgets | 2205.14572 | — | 2022 | 14 |
+| 20 | — Dynamic Budget Throttling | 2207.04690 | — | 2022 | 6 |
+| 21 | — AIGB: Generative Auto-bidding | 2405.16141 | — | 2024 | New |
+| 22 | — Adaptive Bidding under Non-Stationarity | 2505.02796 | — | 2025 | 2 |
+| 23 | — Joint Value Estimation and Bidding | 2502.17292 | — | 2025 | 4 |
+| 24 | — Leveraging Hints: Adaptive Bidding | 2211.06358 | — | 2022 | 13 |
+| 25 | Zhou et al. — DIN | 1706.06978 | KDD | 2018 | 2000+ |
+| 26 | Zhou et al. — DIEN | 1809.03672 | AAAI | 2019 | 1000+ |
+| 27 | Lian et al. — xDeepFM | 1803.05170 | KDD | 2018 | 1000+ |
+| 28 | Song et al. — AutoInt | 1810.11921 | CIKM | 2019 | 500+ |
+| 29 | Naumov et al. — DLRM (Meta) | 1906.00091 | — | 2019 | 500+ |
+| 30 | Cheng et al. — Wide & Deep | 1606.07792 | RecSys | 2016 | 4000+ |
+| 31 | McMahan et al. — Ad Click Prediction (FTRL) | — | KDD | 2013 | 2000+ |
+| 32 | Zhang et al. — Optimal RTB for Display Advertising | — | KDD | 2014 | 500+ |