rtb-bidding-comparison / RESEARCH_RESOURCES.md

Upload RESEARCH_RESOURCES.md

7028d46 verified 3 days ago

30 kB

	# RTB Bidding Algorithm Comparison — Complete Research Resource List

	> Generated: 2026-05-05 \| Author: ML Intern for hamverbot
	> Repository: https://huggingface.co/hamverbot/rtb-bidding-comparison

	---

	## Table of Contents

	1. [Bidding Algorithms](#1-bidding-algorithms)
	2. [CTR Prediction Models](#2-ctr-prediction-models)
	3. [Clearing Price / Market Price Prediction](#3-clearing-price--market-price-prediction)
	4. [Datasets](#4-datasets)
	5. [Codebases & Implementations](#5-codebases--implementations)
	6. [Benchmark Leaderboards](#6-benchmark-leaderboards)
	7. [Recommended Architecture](#7-recommended-architecture)

	---

	## 1. Bidding Algorithms

	### 1.1 Lagrangian Dual + Online Gradient Descent (BEST MATCH)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Learning to Bid in Repeated First-Price Auctions with Budgets" \|
	\| Authors \| Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong (2023) \|
	\| Venue \| NeurIPS 2023 (implied) \|
	\| arXiv \| [2304.13477](https://arxiv.org/abs/2304.13477) \|
	\| HF Papers \| https://huggingface.co/papers/2304.13477 \|
	\| Algorithm \| DualOGD — Lagrangian dual multiplier updated by online error gradient descent \|
	\| Auction Type \| First-price (also handles second-price) \|
	\| Constraints \| Budget cap: total spend ≤ ρT \|
	\| Regret Bound \| Õ(√T) for both full-information and one-sided feedback \|
	\| Key Formula \| λ_{t+1} = Proj_{λ>0}(λ_t − ε·(ρ − c̃_t(b_t))) \|
	\| Bid Rule \| b_t = argmax_b (r̃_t(v_t, b) − λ_t·c̃_t(b)) \|
	\| Prediction Models Needed \| CTR predictor (for v_t), empirical CDF of competing bids (G̃) \|
	\| Why It's The Best Match \| You explicitly described "Lagrangian dual multiplier and updating the dual variables online by error gradient descent" — this is exactly Algorithm 1, line 7. \|

	### 1.2 Dual Mirror Descent (Second-Price)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems" \|
	\| Authors \| Santiago Balseiro, Haihao Lu, Vahab Mirrokni (2020) \|
	\| Venue \| Operations Research (2023) / NeurIPS 2020 Workshop \|
	\| arXiv \| [2011.10124](https://arxiv.org/abs/2011.10124) \|
	\| HF Papers \| https://huggingface.co/papers/2011.10124 \|
	\| Citations \| 135+ \|
	\| Algorithm \| Dual mirror descent — generalizes OGD with Bregman divergences \|
	\| Auction Type \| Second-price (truthful) \|
	\| Bid Rule \| b_t = v_t / (1 + μ_t) \|
	\| Dual Update \| μ_{t+1} = Proj(μ_t − η·(ρ − payment_t)) \|
	\| Key Insight \| In second-price auctions, you don't need a market price model. The dual multiplier naturally paces spending. \|
	\| Prediction Models \| CTR predictor only (no market price model needed) \|

	### 1.3 Dual Descent with RoS + Budget (Multi-Constraint)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Online Bidding Algorithms for Return-on-Spend Constrained Advertisers" \|
	\| Authors \| Zhe Feng, Swati Padmanabhan, Di Wang (2022) \|
	\| Venue \| ICML 2022 \|
	\| arXiv \| [2208.13713](https://arxiv.org/abs/2208.13713) \|
	\| Citations \| 38+ \|
	\| Algorithm \| Two dual variables: λ for RoS, μ for budget \|
	\| Bid Rule \| b_t = ((1+λ_t)/(μ_t+λ_t)) · v_t \|
	\| Updates \| λ_{t+1} = λ_t·exp(-α·(v_t·x_t(b_t) − p_t(b_t))) [multiplicative]; μ_{t+1} = Proj(μ_t − η·(ρ − p_t(b_t))) [sub-gradient] \|
	\| Key Insight \| Can be adapted for your "ensure k% spend" floor — use second dual variable to enforce minimum spend \|
	\| Prediction Models \| CTR predictor (v_t), payment observed \|

	### 1.4 RLB — Reinforcement Learning Bidding

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Real-Time Bidding by Reinforcement Learning in Display Advertising" \|
	\| Authors \| Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo (2017) \|
	\| Venue \| WSDM 2017 \|
	\| arXiv \| [1701.02490](https://arxiv.org/abs/1701.02490) \|
	\| HF Papers \| https://huggingface.co/papers/1701.02490 \|
	\| GitHub \| https://github.com/han-cai/rlb-dp (188 stars) \|
	\| Algorithm \| MDP + Dynamic Programming + Neural value function approximation \|
	\| State \| (t remaining auctions, b remaining budget, x feature vector) \|
	\| Action \| bid price a ∈ [0, b] \|
	\| Results \| +22% clicks over linear bidding at tight budgets on iPinYou \|
	\| Prediction Models \| CTR θ(x) + market price distribution m(δ, x) \|
	\| Key Insight \| Foundational; explicitly models the budget-depletion tradeoff via DP. Superseded by dual methods for budget pacing but still influential. \|

	### 1.5 HiBid — Industrial Hierarchical Dual-RL

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning" \|
	\| Authors \| Yuhang Wang et al. (2023) \|
	\| arXiv \| [2312.17503](https://arxiv.org/abs/2312.17503) \|
	\| HF Papers \| https://huggingface.co/papers/2312.17503 \|
	\| Algorithm \| High-level RL budget allocation + Low-level λ-parameterized bidding \|
	\| Scale \| 64K advertisers, 70M requests/day, 4 channels, deployed at Meituan \|
	\| Results \| Outperforms RL-based baselines (R-BCQ, BCQ, CQL) on clicks, CPC, CSR, ROI \|

	### 1.6 Contextual First-Price Extension (Very Recent!)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" \|
	\| Authors \| (2026) \|
	\| arXiv \| [2603.07207](https://arxiv.org/abs/2603.07207) \|
	\| Algorithm \| Dual OGD + quantile-based contextual censored regression \|
	\| Key Innovation \| Extends Wang et al. (2023) to handle contextual (feature-based) auctions with a novel quantile trick for censored data \|
	\| Regret \| Õ(√T) in contextual first-price auctions \|

	### 1.7 Unified View of Lagrangian Dual Multiplier Methods

	All dual methods follow the same template:

	```
	For each auction t:
	1. Observe value v_t (from CTR prediction × click value)
	2. Compute bid: b_t = f(v_t, dual_multiplier_t)
	3. Observe outcome: payment c_t (if won) or 0 (if lost)
	4. Compute gradient: g_t = ρ − c_t
	5. Update multiplier: λ_{t+1} = Proj_{λ≥0}(λ_t − η·g_t)
	```

	\| Method \| Auction \| Bid Function f(v, λ) \|
	\|--------\|---------\|----------------------\|
	\| Wang 2023 \| First-price \| argmax_b (r̃(v,b) − λ·c̃(b)) \|
	\| Balseiro 2020 \| Second-price \| v / (1+λ) \|
	\| Feng 2022 \| Second-price \| ((1+λ_RoS)/(λ_RoS+λ_budget)) · v \|

	### 1.8 Additional Papers (Supplementary)

	\| Paper \| Key Idea \| arXiv \|
	\|-------\|----------\|-------\|
	\| Dynamic Budget Throttling \| Throttle participation rate to control spend \| 2207.04690 \|
	\| No-Regret Learning in Repeated First-Price Auctions \| General no-regret framework for first-price \| 2205.14572 \|
	\| Robust Budget Pacing with a Single Sample \| Near-optimal regret from 1 sample per distribution \| 2302.02006 \|
	\| Learning to Bid Optimally in Adversarial First-Price \| Adversarial (non-i.i.d.) setting \| 2007.04568 \|
	\| Optimal No-Regret Learning in Repeated FPA \| Minimax optimal bounds \| 2003.09795 \|
	\| Multi-Channel Autobidding with Budget and ROI \| Per-channel optimization optimality \| 2302.01523 \|
	\| Leveraging the Hints: Adaptive Bidding \| Uses hints/forecasts for better bidding \| 2211.06358 \|
	\| Adaptive Bidding under Non-stationarity \| Handles distribution shift \| 2505.02796 \|
	\| Joint Value Estimation and Bidding \| Simultaneous CTR learning + bidding \| 2502.17292 \|
	\| No-Regret is not enough! \| Adaptive regret for constrained bandits \| 2405.06575 \|
	\| AIGB: Generative Auto-bidding \| Diffusion models for bid trajectory generation \| 2405.16141 \|

	### Two-Sided Budget Constraint (Your Specific Need)

	You need: maximize clicks s.t. spend ≤ B AND spend ≥ k·B.

	This requires two dual variables:
	- μ for the budget cap: μ_{t+1} = Proj(μ_t − η₁·(ρ − spend_t))
	- ν for the spend floor: ν_{t+1} = Proj(ν_t − η₂·(spend_t − kρ))

	Bid function: b_t = v_t · f(μ_t, ν_t) where f decreases with μ and increases with ν.

	---

	## 2. CTR Prediction Models

	### 2.1 FinalMLP (RECOMMENDED — Best AUC, Fastest Inference)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction" \|
	\| Authors \| Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong (2023) \|
	\| Venue \| AAAI 2023 \|
	\| arXiv \| [2304.00902](https://arxiv.org/abs/2304.00902) \|
	\| HF Papers \| https://huggingface.co/papers/2304.00902 \|
	\| Datasets \| reczoo/Criteo_x1, reczoo/Avazu_x1, reczoo/MovielensLatest_x1, reczoo/Frappe_x1 \|
	\| Criteo AUC \| 0.8149 \|
	\| Avazu AUC \| 0.7666 \|
	\| Architecture \| Two-stream MLP: two independent MLP towers + feature gating (soft selection) + bilinear fusion \|
	\| Inference Speed \| Fastest among SOTA (pure MLP, ~400-dim hidden, no attention) \|
	\| Why Best for RTB \| Pure feed-forward, <1ms inference, easy to deploy \|

	### 2.2 GDCN — Gated Deep Cross Network

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction" \|
	\| Authors \| Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu (2023) \|
	\| Venue \| CIKM 2023 \|
	\| arXiv \| [2311.04635](https://arxiv.org/abs/2311.04635) \|
	\| HF Papers \| https://huggingface.co/papers/2311.04635 \|
	\| Criteo AUC \| 0.8161 (own split — not directly comparable) \|
	\| Architecture \| DCNv2 + learned information gate per cross layer + Field-level Dimension Optimization (FDO) \|
	\| Key Insight \| Gate filters noisy interactions; FDO compresses embeddings 60%+. Good for memory-constrained RTB. \|

	### 2.3 DCNv2 — Industry Workhorse

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems" \|
	\| Authors \| Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi (2021) \|
	\| Venue \| WWW 2021 \|
	\| arXiv \| [2008.13535](https://arxiv.org/abs/2008.13535) \|
	\| HF Papers \| https://huggingface.co/papers/2008.13535 \|
	\| Criteo AUC \| 0.8142-0.8144 (retuned) \|
	\| Architecture \| Embedding → parallel CrossNetV2 + DNN → concat → sigmoid \|
	\| Key Insight \| Mixture-of-Experts-style low-rank decomposition. Battle-tested at Google. \|

	### 2.4 DeepFM — Simple, Strong Baseline

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "DeepFM: A Factorization-Machine based Neural Network for CTR Prediction" \|
	\| Authors \| Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He (2017) \|
	\| Venue \| IJCAI 2017 \|
	\| Criteo AUC \| 0.8138 (retuned) \|
	\| Architecture \| Shared embedding → parallel FM (2nd-order) + DNN → sum → sigmoid \|
	\| Key Insight \| Shared embedding between FM and DNN is the secret. End-to-end, no pre-training. \|

	### 2.5 FCN — Fusing Cross Network (Most Recent)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction" \|
	\| Authors \| (2024) \|
	\| arXiv \| [2407.13349](https://arxiv.org/abs/2407.13349) \|
	\| HF Papers \| https://huggingface.co/papers/2407.13349 \|
	\| Architecture \| Two explicit cross sub-networks: LCN (linear, order grows linearly) + ECN (exponential, order doubles per layer) \|
	\| Key Insight \| No DNN needed — all interactions explicit. 50% fewer params, 23% lower latency than DCNv2. \|
	\| Caveat \| Newer paper with less community validation. GitHub: https://github.com/salmon1802/FCN \|

	### 2.6 BARS Meta-Finding

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "BARS-CTR: Open Benchmarking for Click-Through Rate Prediction" \|
	\| Authors \| Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He (2021) \|
	\| Venue \| CIKM 2021 \|
	\| arXiv \| [2009.05794](https://arxiv.org/abs/2009.05794) \|
	\| HF Papers \| https://huggingface.co/papers/2009.05794 \|
	\| Key Result \| After 7,000+ experiments and 12,000 GPU hours: differences between SOTA deep CTR models are surprisingly small (~0.1-0.3% AUC). Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering. All models converge to ~0.814 AUC on Criteo after proper tuning. \|

	### 2.7 Additional CTR Papers

	\| Paper \| Key Idea \| arXiv \|
	\|-------\|----------\|-------\|
	\| DIN (KDD 2018) \| Attention over user behavior sequence \| 1706.06978 \|
	\| DIEN (AAAI 2019) \| Interest evolution with GRU + attention \| 1809.03672 \|
	\| xDeepFM (KDD 2018) \| Compressed Interaction Network (CIN) for vector-wise crosses \| 1803.05170 \|
	\| AutoInt (CIKM 2019) \| Multi-head self-attention for feature interactions \| 1810.11921 \|
	\| DLRM (Meta, 2019) \| Specialized for recommendation: MLP for dense + embedding for sparse \| 1906.00091 \|
	\| Wide & Deep (Google, 2016) \| Memorization (wide) + generalization (deep) \| 1606.07792 \|
	\| FTRL-Proximal (KDD 2013) \| "Ad Click Prediction: a View from the Trenches" — online learning for linear CTR \| — \|
	\| Streaming CTR (2023) \| Online CTR with non-stationary data \| 2307.07509 \|

	### 2.8 Latency Considerations for RTB

	\| Model \| Architecture \| Inference Speed \| RTB-Suitable \|
	\|-------\|-------------\|-----------------\|--------------\|
	\| FinalMLP \| Pure MLP \| ⭐⭐⭐⭐⭐ (<1ms) \| ✅ Best \|
	\| DCNv2 \| CrossNet + DNN \| ⭐⭐⭐⭐ \| ✅ \|
	\| GDCN \| Gated Cross + DNN \| ⭐⭐⭐⭐ \| ✅ \|
	\| DeepFM \| FM + DNN \| ⭐⭐⭐⭐ \| ✅ \|
	\| FCN \| LCN + ECN (no DNN) \| ⭐⭐⭐⭐ \| ✅ \|
	\| DIN \| Attention (user history) \| ⭐⭐ \| ❌ Too slow \|
	\| DIEN \| GRU + attention \| ⭐ \| ❌ Too slow \|
	\| AutoInt \| Multi-head attention \| ⭐⭐ \| ❌ Too slow \|

	---

	## 3. Clearing Price / Market Price Prediction

	### 3.1 Non-Parametric Empirical CDF (RECOMMENDED BASELINE)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Source \| Wang et al. (2023), Algorithm 1, Section 3.1 \|
	\| arXiv \| [2304.13477](https://arxiv.org/abs/2304.13477) \|
	\| Method \| Maintain array of observed competing bids d_s, estimate G̃_t(b) = (1/(t-1))∑𝟙{b ≥ d_s} \|
	\| Win Probability \| P(win\\|b) = G̃_t(b) \|
	\| Expected Cost \| E[cost\\|win,b] = (1/G̃_t(b)) · mean of {d_s : d_s ≤ b} \|
	\| Pros \| No model training needed, theoretically sound (Õ(√T) regret), handles distribution shift naturally \|
	\| Cons \| No context/features, cold-start when t is small \|

	### 3.2 Censored Linear Regression (Wu et al. 2015)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Predicting Winning Price in Real Time Bidding with Censored Data" \|
	\| Authors \| Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen (2015) \|
	\| Venue \| KDD 2015 \|
	\| Citations \| ~101 \|
	\| Method \| Tobit-like model: log(market_price) = β·x + ε, ε ~ N(0, σ²) \|
	\| Key Insight \| Properly handles censoring via likelihood: winning auctions contribute f(price\\|x), losing auctions contribute S(bid\\|x) \|
	\| Pros \| Contextual, simple, computationally cheap \|
	\| Cons \| Linear model — limited capacity for complex interactions \|

	### 3.3 Deep Censored Learning / Survival Analysis

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Deep Censored Learning of the Winning Price" (Zhu et al., WWW 2019) \|
	\| Method \| Neural network trained with censored survival loss \|
	\| Loss \| Winning: -log f(price\\|x); Losing: -log S(bid\\|x) \|
	\| Library \| TorchSurv ([arXiv:2404.10761](https://arxiv.org/abs/2404.10761), Novartis, 200★ GitHub) \|
	\| TorchSurv URL \| https://github.com/Novartis/torchsurv \|
	\| TorchSurv Docs \| https://opensource.nibr.com/torchsurv/ \|
	\| PyPI \| `pip install torchsurv` \|
	\| Key Insight \| Proper survival framework handles censoring. Win = exact price observed (uncensored). Loss = only lower bound (censored at bid). \|
	\| Architecture \| Deep FC predicting either hazard rate λ(t\\|x) (Cox PH) or distribution parameters (Weibull/log-normal AFT) \|

	```python
	# TorchSurv pattern for market price:
	from torchsurv.loss import cox
	log_hazard = model(features) # shape (batch,)
	# event=1 if won, 0 if lost (censored)
	# time = market_price if won, bid if lost
	loss = cox.neg_partial_log_likelihood(log_hazard, event, time)
	```

	### 3.4 Win Probability Neural Network (Simplest ML)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Method \| Direct binary classification: P(win\\|bid_price, features) \|
	\| Pros \| Dead simple, works with standard BCELoss \|
	\| Cons \| Ignores censored price info when you win — only uses binary win/loss signal \|
	\| Input \| features + bid_price → sigmoid \|

	### 3.5 Parametric Distribution Fitting

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| Referenced in RLB (Cai et al. 2017) — "Functional Bid Landscape Forecasting" (ECML-PKDD 2016) \|
	\| Method \| Fit log-normal or gamma distribution to observed winning prices; predict parameters from features using GBDT \|
	\| Pros \| Parametric assumptions reduce variance \|
	\| Cons \| Distribution assumption may not hold; doesn't properly handle censoring \|

	### 3.6 Contextual Quantile-Based (2026)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" \|
	\| arXiv \| [2603.07207](https://arxiv.org/abs/2603.07207) \|
	\| Method \| Models competing bid as d_t = α·x_t + z_t (linear contextual); quantile-based estimator for α \|
	\| Key Trick \| Splits samples by bid quantile and exploits identifiable conditional quantiles to circumvent full censoring \|
	\| Pros \| Theoretical guarantees in contextual setting \|
	\| Cons \| Linear contextual model only; very recent \|

	### 3.7 Comparison Summary

	\| Method \| Contextual? \| Handles Censoring? \| Model Training? \| Complexity \|
	\|--------\|-------------\|-------------------\|-----------------\|------------\|
	\| Empirical CDF \| ❌ \| N/A (full info) \| None \| Minimal \|
	\| Censored Linear Reg \| ✅ \| ✅ (proper likelihood) \| Linear model \| Low \|
	\| Deep Survival (TorchSurv) \| ✅ \| ✅ (proper likelihood) \| Neural net \| Medium \|
	\| Win Prob Classifier \| ✅ \| ❌ (binary only) \| Neural net \| Low \|
	\| Parametric (log-normal) \| Optional \| ❌ \| GBDT \| Medium \|
	\| Quantile Censored \| ✅ \| ✅ (quantile trick) \| Linear \| Medium-High \|

	---

	## 4. Datasets

	### 4.1 CTR Prediction Datasets

	\| Dataset \| HF Hub Path \| Size \| Fields \| Label \| Verified \|
	\|---------\|------------\|------\|--------\|-------\|----------\|
	\| Criteo_x4 \| `reczoo/Criteo_x4` \| 45.8M rows, 5.6GB \| 13 dense (I1-I13) + 26 categorical (C1-C26) \| `Label` (0/1) \| ✅ \|
	\| Avazu_x4 \| `reczoo/Avazu_x4` \| 40.4M rows, 1.8GB \| 22 fields (mixed) \| `click` (0/1) \| ✅ \|
	\| Criteo_x1 \| `reczoo/Criteo_x1` \| ~11M rows \| Same as x4 \| `Label` \| ✅ \|
	\| Avazu_x1 \| `reczoo/Avazu_x1` \| ~10M rows \| Same as x4 \| `click` \| ✅ \|

	Standard split: 80% train / 10% val / 10% test (BARS protocol).

	### 4.2 RTB Bidding Datasets

	\| Dataset \| Source \| Size \| Format \| Availability \|
	\|---------\|--------\|------\|--------\|-------------\|
	\| iPinYou \| data.computational-advertising.org \| 19.5M impressions, 9 campaigns, 10 days (2013) \| Bid logs with market price \| External download only (NOT on HF Hub) \|
	\| YOYI \| Various academic mirrors \| ~400M records \| Bid logs \| External download only \|

	iPinYou format: `(click, paying_price, bid_price, slot_id, user_tags, ...)` — already includes market price info needed for bidding simulation.

	Key Gap: No RTB bid-log datasets on HuggingFace Hub. Criteo/Avazu have click labels but no bid/price columns — they can only be used for CTR training and require synthetic price generation for bidding evaluation.

	### 4.3 Data Requirements for Each Algorithm

	\| Algorithm \| Needs from Dataset \|
	\|-----------\|-------------------\|
	\| Dual OGD (Wang) \| click labels (CTR training) + competing bids (or synthetic prices for simulation) \|
	\| Dual Mirror Descent (Balseiro) \| click labels + auction payment (second-price) \|
	\| RLB (Cai) \| click labels + market prices + impression features \|
	\| CTR models (all) \| click labels + features (Criteo/Avazu: ✅) \|
	\| Clearing price models \| observed prices (won auctions) + bids (lost auctions) \|

	---

	## 5. Codebases & Implementations

	### 5.1 CTR Model Libraries

	\| Library \| URL \| Models \| Framework \| Notes \|
	\|---------\|-----\|--------\|-----------\|-------\|
	\| FuxiCTR \| https://github.com/reczoo/FuxiCTR \| 40+ (FinalMLP, DeepFM, DCNv2, GDCN, FCN, xDeepFM, AutoInt) \| PyTorch \| Config-driven (YAML). Used by all SOTA benchmark papers. \|
	\| DeepCTR-Torch \| https://github.com/shenweichen/DeepCTR-Torch \| 20+ (DeepFM, DCN, DIN, DIEN, xDeepFM) \| PyTorch \| Simpler API (Python class). Good for quick prototyping. \|
	\| TorchSurv \| https://github.com/Novartis/torchsurv \| Cox PH, Weibull AFT, DeepSurv, DeepHit \| PyTorch \| Deep survival analysis for clearing price. \|
	\| BARS \| https://github.com/openbenchmark/BARS \| Benchmarking \| — \| Standardized evaluation pipeline. 389★ \|

	### 5.2 Bidding Algorithm Implementations

	\| Repo \| URL \| Algorithms \| Notes \|
	\|------\|-----\|------------\|-------\|
	\| rlb-dp \| https://github.com/han-cai/rlb-dp \| RLB (MDP + DP) \| 188 stars. Original implementation of RL for RTB. \|
	\| budget_constrained_bidding \| https://github.com/dingmu365/budget_constrained_bidding \| Budget-constrained RTB \| Contains multiple budget-constrained bidding algorithms. \|
	\| budget_constrained_bidding (fork) \| https://github.com/GinNie23/budget_constrained_bidding \| Same \| Fork with modifications. \|
	\| Budget_Constrained_Bidding \| https://github.com/venkatacrc/Budget_Constrained_Bidding \| Same \| Another implementation. \|
	\| hamverbot/rtb-bidding-comparison \| https://huggingface.co/hamverbot/rtb-bidding-comparison \| DualOGD, Linear, ORTB, Threshold, MPC \| Your repo — already has a working comparison framework! \|

	### 5.3 FuxiCTR Quick Start

	```bash
	pip install fuxictr
	```

	```yaml
	# config/criteo_finalmlp.yaml
	dataset_id: Criteo_x4
	model: FinalMLP
	embedding_dim: 10
	hidden_units: [400, 400, 400]
	batch_size: 4096
	learning_rate: 1e-3
	epochs: 10
	metrics: [auc, logloss]
	```

	```python
	from fuxictr import autotuner
	autotuner.run("config/criteo_finalmlp.yaml", "Criteo_x4", "FinalMLP")
	```

	### 5.4 DeepCTR-Torch Quick Start

	```bash
	pip install deepctr-torch
	```

	```python
	from deepctr_torch.models import DeepFM
	from deepctr_torch.inputs import SparseFeat, DenseFeat

	sparse_features = [SparseFeat(f, vocab_size=df[f].nunique(), embedding_dim=10)
	for f in categorical_cols]
	dense_features = [DenseFeat(f, 1) for f in numerical_cols]

	model = DeepFM(linear_feature_columns=sparse_features + dense_features,
	dnn_feature_columns=sparse_features + dense_features,
	dnn_hidden_units=(400, 400, 400), device='cuda')
	model.compile('adam', 'binary_crossentropy', metrics=['auc'])
	model.fit(train_input, train_labels, batch_size=4096, epochs=10)
	```

	---

	## 6. Benchmark Leaderboards

	\| Leaderboard \| URL \| Description \|
	\|-------------\|-----\|-------------\|
	\| BARS CTR Criteo_x4 \| https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x4.html \| Definite CTR benchmark — 24 models compared \|
	\| BARS CTR Criteo_x1 \| https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x1.html \| Smaller Criteo subset \|
	\| BARS CTR Avazu \| https://openbenchmark.github.io/BARS/CTR/leaderboard/avazu_x4.html \| Avazu benchmark \|
	\| BARS Main \| https://openbenchmark.github.io/BARS \| Full recommender systems benchmark \|

	Top Criteo_x4 AUC scores (from BARS):
	- FinalMLP: 0.8149
	- DCNv2: 0.8142
	- DeepFM: 0.8138
	- xDeepFM: 0.8136
	- AutoInt+: 0.8134

	Key takeaway: Top 5 models are within 0.15% AUC of each other.

	---

	## 7. Recommended Architecture

	### For Your Problem: "Lagrangian Dual Multiplier with Online Error Gradient Descent"

	```
	┌─────────────────────────────────────────────────────────────┐
	│ BIDDING ALGORITHM │
	│ │
	│ Dual OGD (Wang et al. 2023) │
	│ λ_{t+1} = Proj(λ_t - ε·(ρ - c̃_t(b_t))) │
	│ b_t = argmax_b (r̃_t(v_t, b) - λ_t·c̃_t(b)) │
	│ │
	├─────────────────────────────────────────────────────────────┤
	│ PREDICTION MODELS │
	│ │
	│ ┌──────────────────┐ ┌──────────────────────┐ │
	│ │ CTR Predictor │ │ Clearing Price Est. │ │
	│ │ (FinalMLP) │ │ (Empirical CDF │ │
	│ │ │ │ OR TorchSurv) │ │
	│ │ v_t = pCTR × V │ │ G̃(b) = P(win \| b) │ │
	│ └──────────────────┘ └──────────────────────┘ │
	│ │
	├─────────────────────────────────────────────────────────────┤
	│ DATASETS │
	│ │
	│ Criteo_x4 (CTR training) + iPinYou (bidding simulation) │
	│ OR: Criteo_x4 + synthetic price generation │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	### Implementation Priority

	1. Phase 1: Improve CTR model — replace current LogisticRegression with FinalMLP trained on Criteo_x4 (via FuxiCTR)
	2. Phase 2: Improve clearing price — add TorchSurv-based censored regression alongside current empirical CDF
	3. Phase 3: Add Balseiro dual mirror descent for comparison (simpler baseline, no market price model)
	4. Phase 4: Add two-sided budget constraint (cap + floor) with dual dual variables
	5. Phase 5: Full sweep over hyperparameters: step size ε, budget fraction k%, value per click, CTR model architecture

	### Online Learning Note

	For production RTB where the environment is non-stationary, implement periodic retraining:
	- Save model checkpoint every N hours
	- Reload and train on sliding window of most recent data
	- Deploy updated model without restarting the bidding algorithm

	The Lagrangian multiplier λ is intrinsically online (updated per auction). The CTR model needs separate periodic retraining.

	---

	## Paper Index (All Papers Referenced)

	\| # \| Paper \| arXiv \| Venue \| Year \| Citations \|
	\|---\|-------\|-------\|-------\|------\|-----------\|
	\| 1 \| Wang et al. — Learning to Bid in Repeated First-Price Auctions with Budgets \| 2304.13477 \| NeurIPS \| 2023 \| Growing \|
	\| 2 \| Balseiro et al. — Dual Mirror Descent for Online Allocation \| 2011.10124 \| Ops Research \| 2020 \| 135+ \|
	\| 3 \| Feng et al. — Online Bidding for RoS Constrained Advertisers \| 2208.13713 \| ICML \| 2022 \| 38+ \|
	\| 4 \| Cai et al. — RTB by RL in Display Advertising \| 1701.02490 \| WSDM \| 2017 \| 300+ \|
	\| 5 \| Wang et al. — HiBid Hierarchical DRL Bidding \| 2312.17503 \| — \| 2023 \| New \|
	\| 6 \| — Online Bidding for Contextual First-Price (Quantile) \| 2603.07207 \| — \| 2026 \| New \|
	\| 7 \| Mao et al. — FinalMLP \| 2304.00902 \| AAAI \| 2023 \| Growing \|
	\| 8 \| Wang et al. — GDCN \| 2311.04635 \| CIKM \| 2023 \| Growing \|
	\| 9 \| Wang et al. — DCN V2 \| 2008.13535 \| WWW \| 2021 \| 500+ \|
	\| 10 \| Guo et al. — DeepFM \| — \| IJCAI \| 2017 \| 3000+ \|
	\| 11 \| — FCN: Fusing Cross Network \| 2407.13349 \| — \| 2024 \| New \|
	\| 12 \| Zhu et al. — BARS-CTR Benchmark \| 2009.05794 \| CIKM \| 2021 \| 100+ \|
	\| 13 \| Wu et al. — Predicting Winning Price with Censored Data \| — \| KDD \| 2015 \| 101 \|
	\| 14 \| — Deep Censored Learning of Winning Price \| — \| WWW \| 2019 \| Well-cited \|
	\| 15 \| Katzman et al. — DeepSurv \| — \| BMC \| 2018 \| 1000+ \|
	\| 16 \| — TorchSurv \| 2404.10761 \| — \| 2024 \| New \|
	\| 17 \| — Robust Budget Pacing with a Single Sample \| 2302.02006 \| — \| 2023 \| Growing \|
	\| 18 \| — Multi-Channel Autobidding with Budget and ROI \| 2302.01523 \| — \| 2023 \| Growing \|
	\| 19 \| — No-Regret in Repeated FPA with Budgets \| 2205.14572 \| — \| 2022 \| 14 \|
	\| 20 \| — Dynamic Budget Throttling \| 2207.04690 \| — \| 2022 \| 6 \|
	\| 21 \| — AIGB: Generative Auto-bidding \| 2405.16141 \| — \| 2024 \| New \|
	\| 22 \| — Adaptive Bidding under Non-Stationarity \| 2505.02796 \| — \| 2025 \| 2 \|
	\| 23 \| — Joint Value Estimation and Bidding \| 2502.17292 \| — \| 2025 \| 4 \|
	\| 24 \| — Leveraging Hints: Adaptive Bidding \| 2211.06358 \| — \| 2022 \| 13 \|
	\| 25 \| Zhou et al. — DIN \| 1706.06978 \| KDD \| 2018 \| 2000+ \|
	\| 26 \| Zhou et al. — DIEN \| 1809.03672 \| AAAI \| 2019 \| 1000+ \|
	\| 27 \| Lian et al. — xDeepFM \| 1803.05170 \| KDD \| 2018 \| 1000+ \|
	\| 28 \| Song et al. — AutoInt \| 1810.11921 \| CIKM \| 2019 \| 500+ \|
	\| 29 \| Naumov et al. — DLRM (Meta) \| 1906.00091 \| — \| 2019 \| 500+ \|
	\| 30 \| Cheng et al. — Wide & Deep \| 1606.07792 \| RecSys \| 2016 \| 4000+ \|
	\| 31 \| McMahan et al. — Ad Click Prediction (FTRL) \| — \| KDD \| 2013 \| 2000+ \|
	\| 32 \| Zhang et al. — Optimal RTB for Display Advertising \| — \| KDD \| 2014 \| 500+ \|