rtb-bidding-comparison / RESEARCH_RESOURCES.md

Upload RESEARCH_RESOURCES.md

7028d46 verified 3 days ago

30 kB

RTB Bidding Algorithm Comparison — Complete Research Resource List

Generated: 2026-05-05 | Author: ML Intern for hamverbot Repository: https://huggingface.co/hamverbot/rtb-bidding-comparison

Bidding Algorithms
CTR Prediction Models
Clearing Price / Market Price Prediction
Datasets
Codebases & Implementations
Benchmark Leaderboards
Recommended Architecture

1. Bidding Algorithms

1.1 Lagrangian Dual + Online Gradient Descent (BEST MATCH)

Property	Detail
Paper	"Learning to Bid in Repeated First-Price Auctions with Budgets"
Authors	Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong (2023)
Venue	NeurIPS 2023 (implied)
arXiv	2304.13477
HF Papers	https://huggingface.co/papers/2304.13477
Algorithm	DualOGD — Lagrangian dual multiplier updated by online error gradient descent
Auction Type	First-price (also handles second-price)
Constraints	Budget cap: total spend ≤ ρT
Regret Bound	Õ(√T) for both full-information and one-sided feedback
Key Formula	λ_{t+1} = Proj_{λ>0}(λ_t − ε·(ρ − c̃_t(b_t)))
Bid Rule	b_t = argmax_b (r̃_t(v_t, b) − λ_t·c̃_t(b))
Prediction Models Needed	CTR predictor (for v_t), empirical CDF of competing bids (G̃)
Why It's The Best Match	You explicitly described "Lagrangian dual multiplier and updating the dual variables online by error gradient descent" — this is exactly Algorithm 1, line 7.

1.2 Dual Mirror Descent (Second-Price)

Property	Detail
Paper	"The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems"
Authors	Santiago Balseiro, Haihao Lu, Vahab Mirrokni (2020)
Venue	Operations Research (2023) / NeurIPS 2020 Workshop
arXiv	2011.10124
HF Papers	https://huggingface.co/papers/2011.10124
Citations	135+
Algorithm	Dual mirror descent — generalizes OGD with Bregman divergences
Auction Type	Second-price (truthful)
Bid Rule	b_t = v_t / (1 + μ_t)
Dual Update	μ_{t+1} = Proj(μ_t − η·(ρ − payment_t))
Key Insight	In second-price auctions, you don't need a market price model. The dual multiplier naturally paces spending.
Prediction Models	CTR predictor only (no market price model needed)

1.3 Dual Descent with RoS + Budget (Multi-Constraint)

Property	Detail
Paper	"Online Bidding Algorithms for Return-on-Spend Constrained Advertisers"
Authors	Zhe Feng, Swati Padmanabhan, Di Wang (2022)
Venue	ICML 2022
arXiv	2208.13713
Citations	38+
Algorithm	Two dual variables: λ for RoS, μ for budget
Bid Rule	b_t = ((1+λ_t)/(μ_t+λ_t)) · v_t
Updates	λ_{t+1} = λ_t·exp(-α·(v_t·x_t(b_t) − p_t(b_t))) [multiplicative]; μ_{t+1} = Proj(μ_t − η·(ρ − p_t(b_t))) [sub-gradient]
Key Insight	Can be adapted for your "ensure k% spend" floor — use second dual variable to enforce minimum spend
Prediction Models	CTR predictor (v_t), payment observed

1.4 RLB — Reinforcement Learning Bidding

Property	Detail
Paper	"Real-Time Bidding by Reinforcement Learning in Display Advertising"
Authors	Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo (2017)
Venue	WSDM 2017
arXiv	1701.02490
HF Papers	https://huggingface.co/papers/1701.02490
GitHub	https://github.com/han-cai/rlb-dp (188 stars)
Algorithm	MDP + Dynamic Programming + Neural value function approximation
State	(t remaining auctions, b remaining budget, x feature vector)
Action	bid price a ∈ [0, b]
Results	+22% clicks over linear bidding at tight budgets on iPinYou
Prediction Models	CTR θ(x) + market price distribution m(δ, x)
Key Insight	Foundational; explicitly models the budget-depletion tradeoff via DP. Superseded by dual methods for budget pacing but still influential.

1.5 HiBid — Industrial Hierarchical Dual-RL

Property	Detail
Paper	"HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning"
Authors	Yuhang Wang et al. (2023)
arXiv	2312.17503
HF Papers	https://huggingface.co/papers/2312.17503
Algorithm	High-level RL budget allocation + Low-level λ-parameterized bidding
Scale	64K advertisers, 70M requests/day, 4 channels, deployed at Meituan
Results	Outperforms RL-based baselines (R-BCQ, BCQ, CQL) on clicks, CPC, CSR, ROI

1.6 Contextual First-Price Extension (Very Recent!)

Property	Detail
Paper	"Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback"
Authors	(2026)
arXiv	2603.07207
Algorithm	Dual OGD + quantile-based contextual censored regression
Key Innovation	Extends Wang et al. (2023) to handle contextual (feature-based) auctions with a novel quantile trick for censored data
Regret	Õ(√T) in contextual first-price auctions

1.7 Unified View of Lagrangian Dual Multiplier Methods

All dual methods follow the same template:

For each auction t:
1. Observe value v_t (from CTR prediction × click value)
2. Compute bid: b_t = f(v_t, dual_multiplier_t)
3. Observe outcome: payment c_t (if won) or 0 (if lost)
4. Compute gradient: g_t = ρ − c_t
5. Update multiplier: λ_{t+1} = Proj_{λ≥0}(λ_t − η·g_t)

Method	Auction	Bid Function f(v, λ)
Wang 2023	First-price	argmax_b (r̃(v,b) − λ·c̃(b))
Balseiro 2020	Second-price	v / (1+λ)
Feng 2022	Second-price	((1+λ_RoS)/(λ_RoS+λ_budget)) · v

1.8 Additional Papers (Supplementary)

Paper	Key Idea	arXiv
Dynamic Budget Throttling	Throttle participation rate to control spend	2207.04690
No-Regret Learning in Repeated First-Price Auctions	General no-regret framework for first-price	2205.14572
Robust Budget Pacing with a Single Sample	Near-optimal regret from 1 sample per distribution	2302.02006
Learning to Bid Optimally in Adversarial First-Price	Adversarial (non-i.i.d.) setting	2007.04568
Optimal No-Regret Learning in Repeated FPA	Minimax optimal bounds	2003.09795
Multi-Channel Autobidding with Budget and ROI	Per-channel optimization optimality	2302.01523
Leveraging the Hints: Adaptive Bidding	Uses hints/forecasts for better bidding	2211.06358
Adaptive Bidding under Non-stationarity	Handles distribution shift	2505.02796
Joint Value Estimation and Bidding	Simultaneous CTR learning + bidding	2502.17292
No-Regret is not enough!	Adaptive regret for constrained bandits	2405.06575
AIGB: Generative Auto-bidding	Diffusion models for bid trajectory generation	2405.16141

Two-Sided Budget Constraint (Your Specific Need)

You need: maximize clicks s.t. spend ≤ B AND spend ≥ k·B.

This requires two dual variables:

μ for the budget cap: μ_{t+1} = Proj(μ_t − η₁·(ρ − spend_t))
ν for the spend floor: ν_{t+1} = Proj(ν_t − η₂·(spend_t − kρ))

Bid function: b_t = v_t · f(μ_t, ν_t) where f decreases with μ and increases with ν.

2. CTR Prediction Models

2.1 FinalMLP (RECOMMENDED — Best AUC, Fastest Inference)

Property	Detail
Paper	"FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction"
Authors	Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong (2023)
Venue	AAAI 2023
arXiv	2304.00902
HF Papers	https://huggingface.co/papers/2304.00902
Datasets	reczoo/Criteo_x1, reczoo/Avazu_x1, reczoo/MovielensLatest_x1, reczoo/Frappe_x1
Criteo AUC	0.8149
Avazu AUC	0.7666
Architecture	Two-stream MLP: two independent MLP towers + feature gating (soft selection) + bilinear fusion
Inference Speed	Fastest among SOTA (pure MLP, ~400-dim hidden, no attention)
Why Best for RTB	Pure feed-forward, <1ms inference, easy to deploy

2.2 GDCN — Gated Deep Cross Network

Property	Detail
Paper	"Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction"
Authors	Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu (2023)
Venue	CIKM 2023
arXiv	2311.04635
HF Papers	https://huggingface.co/papers/2311.04635
Criteo AUC	0.8161 (own split — not directly comparable)
Architecture	DCNv2 + learned information gate per cross layer + Field-level Dimension Optimization (FDO)
Key Insight	Gate filters noisy interactions; FDO compresses embeddings 60%+. Good for memory-constrained RTB.

2.3 DCNv2 — Industry Workhorse

Property	Detail
Paper	"DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems"
Authors	Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi (2021)
Venue	WWW 2021
arXiv	2008.13535
HF Papers	https://huggingface.co/papers/2008.13535
Criteo AUC	0.8142-0.8144 (retuned)
Architecture	Embedding → parallel CrossNetV2 + DNN → concat → sigmoid
Key Insight	Mixture-of-Experts-style low-rank decomposition. Battle-tested at Google.

2.4 DeepFM — Simple, Strong Baseline

Property	Detail
Paper	"DeepFM: A Factorization-Machine based Neural Network for CTR Prediction"
Authors	Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He (2017)
Venue	IJCAI 2017
Criteo AUC	0.8138 (retuned)
Architecture	Shared embedding → parallel FM (2nd-order) + DNN → sum → sigmoid
Key Insight	Shared embedding between FM and DNN is the secret. End-to-end, no pre-training.

2.5 FCN — Fusing Cross Network (Most Recent)

Property	Detail
Paper	"FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction"
Authors	(2024)
arXiv	2407.13349
HF Papers	https://huggingface.co/papers/2407.13349
Architecture	Two explicit cross sub-networks: LCN (linear, order grows linearly) + ECN (exponential, order doubles per layer)
Key Insight	No DNN needed — all interactions explicit. 50% fewer params, 23% lower latency than DCNv2.
Caveat	Newer paper with less community validation. GitHub: https://github.com/salmon1802/FCN

2.6 BARS Meta-Finding

Property	Detail
Paper	"BARS-CTR: Open Benchmarking for Click-Through Rate Prediction"
Authors	Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He (2021)
Venue	CIKM 2021
arXiv	2009.05794
HF Papers	https://huggingface.co/papers/2009.05794
Key Result	After 7,000+ experiments and 12,000 GPU hours: differences between SOTA deep CTR models are surprisingly small (~0.1-0.3% AUC). Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering. All models converge to ~0.814 AUC on Criteo after proper tuning.

2.7 Additional CTR Papers

Paper	Key Idea	arXiv
DIN (KDD 2018)	Attention over user behavior sequence	1706.06978
DIEN (AAAI 2019)	Interest evolution with GRU + attention	1809.03672
xDeepFM (KDD 2018)	Compressed Interaction Network (CIN) for vector-wise crosses	1803.05170
AutoInt (CIKM 2019)	Multi-head self-attention for feature interactions	1810.11921
DLRM (Meta, 2019)	Specialized for recommendation: MLP for dense + embedding for sparse	1906.00091
Wide & Deep (Google, 2016)	Memorization (wide) + generalization (deep)	1606.07792
FTRL-Proximal (KDD 2013)	"Ad Click Prediction: a View from the Trenches" — online learning for linear CTR	—
Streaming CTR (2023)	Online CTR with non-stationary data	2307.07509

2.8 Latency Considerations for RTB

Model	Architecture	Inference Speed	RTB-Suitable
FinalMLP	Pure MLP	⭐⭐⭐⭐⭐ (<1ms)	✅ Best
DCNv2	CrossNet + DNN	⭐⭐⭐⭐	✅
GDCN	Gated Cross + DNN	⭐⭐⭐⭐	✅
DeepFM	FM + DNN	⭐⭐⭐⭐	✅
FCN	LCN + ECN (no DNN)	⭐⭐⭐⭐	✅
DIN	Attention (user history)	⭐⭐	❌ Too slow
DIEN	GRU + attention	⭐	❌ Too slow
AutoInt	Multi-head attention	⭐⭐	❌ Too slow

3. Clearing Price / Market Price Prediction

3.1 Non-Parametric Empirical CDF (RECOMMENDED BASELINE)

Property	Detail
Source	Wang et al. (2023), Algorithm 1, Section 3.1
arXiv	2304.13477
Method	Maintain array of observed competing bids d_s, estimate G̃_t(b) = (1/(t-1))∑𝟙{b ≥ d_s}
Win Probability	P(win\|b) = G̃_t(b)
Expected Cost	E[cost\|win,b] = (1/G̃_t(b)) · mean of {d_s : d_s ≤ b}
Pros	No model training needed, theoretically sound (Õ(√T) regret), handles distribution shift naturally
Cons	No context/features, cold-start when t is small

3.2 Censored Linear Regression (Wu et al. 2015)

Property	Detail
Paper	"Predicting Winning Price in Real Time Bidding with Censored Data"
Authors	Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen (2015)
Venue	KDD 2015
Citations	~101
Method	Tobit-like model: log(market_price) = β·x + ε, ε ~ N(0, σ²)
Key Insight	Properly handles censoring via likelihood: winning auctions contribute f(price\|x), losing auctions contribute S(bid\|x)
Pros	Contextual, simple, computationally cheap
Cons	Linear model — limited capacity for complex interactions

3.3 Deep Censored Learning / Survival Analysis

Property	Detail
Paper	"Deep Censored Learning of the Winning Price" (Zhu et al., WWW 2019)
Method	Neural network trained with censored survival loss
Loss	Winning: -log f(price\|x); Losing: -log S(bid\|x)
Library	TorchSurv (arXiv:2404.10761, Novartis, 200★ GitHub)
TorchSurv URL	https://github.com/Novartis/torchsurv
TorchSurv Docs	https://opensource.nibr.com/torchsurv/
PyPI	`pip install torchsurv`
Key Insight	Proper survival framework handles censoring. Win = exact price observed (uncensored). Loss = only lower bound (censored at bid).
Architecture	Deep FC predicting either hazard rate λ(t\|x) (Cox PH) or distribution parameters (Weibull/log-normal AFT)

# TorchSurv pattern for market price:
from torchsurv.loss import cox
log_hazard = model(features)  # shape (batch,)
# event=1 if won, 0 if lost (censored)
# time = market_price if won, bid if lost
loss = cox.neg_partial_log_likelihood(log_hazard, event, time)

3.4 Win Probability Neural Network (Simplest ML)

Property	Detail
Method	Direct binary classification: P(win\|bid_price, features)
Pros	Dead simple, works with standard BCELoss
Cons	Ignores censored price info when you win — only uses binary win/loss signal
Input	features + bid_price → sigmoid

3.5 Parametric Distribution Fitting

Property	Detail
Paper	Referenced in RLB (Cai et al. 2017) — "Functional Bid Landscape Forecasting" (ECML-PKDD 2016)
Method	Fit log-normal or gamma distribution to observed winning prices; predict parameters from features using GBDT
Pros	Parametric assumptions reduce variance
Cons	Distribution assumption may not hold; doesn't properly handle censoring

3.6 Contextual Quantile-Based (2026)

Property	Detail
Paper	"Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback"
arXiv	2603.07207
Method	Models competing bid as d_t = α·x_t + z_t (linear contextual); quantile-based estimator for α
Key Trick	Splits samples by bid quantile and exploits identifiable conditional quantiles to circumvent full censoring
Pros	Theoretical guarantees in contextual setting
Cons	Linear contextual model only; very recent

3.7 Comparison Summary

Method	Contextual?	Handles Censoring?	Model Training?	Complexity
Empirical CDF	❌	N/A (full info)	None	Minimal
Censored Linear Reg	✅	✅ (proper likelihood)	Linear model	Low
Deep Survival (TorchSurv)	✅	✅ (proper likelihood)	Neural net	Medium
Win Prob Classifier	✅	❌ (binary only)	Neural net	Low
Parametric (log-normal)	Optional	❌	GBDT	Medium
Quantile Censored	✅	✅ (quantile trick)	Linear	Medium-High

4. Datasets

4.1 CTR Prediction Datasets

Dataset	HF Hub Path	Size	Fields	Label	Verified
Criteo_x4	`reczoo/Criteo_x4`	45.8M rows, 5.6GB	13 dense (I1-I13) + 26 categorical (C1-C26)	`Label` (0/1)	✅
Avazu_x4	`reczoo/Avazu_x4`	40.4M rows, 1.8GB	22 fields (mixed)	`click` (0/1)	✅
Criteo_x1	`reczoo/Criteo_x1`	~11M rows	Same as x4	`Label`	✅
Avazu_x1	`reczoo/Avazu_x1`	~10M rows	Same as x4	`click`	✅

Standard split: 80% train / 10% val / 10% test (BARS protocol).

4.2 RTB Bidding Datasets

Dataset	Source	Size	Format	Availability
iPinYou	data.computational-advertising.org	19.5M impressions, 9 campaigns, 10 days (2013)	Bid logs with market price	External download only (NOT on HF Hub)
YOYI	Various academic mirrors	~400M records	Bid logs	External download only

iPinYou format: (click, paying_price, bid_price, slot_id, user_tags, ...) — already includes market price info needed for bidding simulation.

Key Gap: No RTB bid-log datasets on HuggingFace Hub. Criteo/Avazu have click labels but no bid/price columns — they can only be used for CTR training and require synthetic price generation for bidding evaluation.

4.3 Data Requirements for Each Algorithm

Algorithm	Needs from Dataset
Dual OGD (Wang)	click labels (CTR training) + competing bids (or synthetic prices for simulation)
Dual Mirror Descent (Balseiro)	click labels + auction payment (second-price)
RLB (Cai)	click labels + market prices + impression features
CTR models (all)	click labels + features (Criteo/Avazu: ✅)
Clearing price models	observed prices (won auctions) + bids (lost auctions)

5. Codebases & Implementations

5.1 CTR Model Libraries

Library	URL	Models	Framework	Notes
FuxiCTR	https://github.com/reczoo/FuxiCTR	40+ (FinalMLP, DeepFM, DCNv2, GDCN, FCN, xDeepFM, AutoInt)	PyTorch	Config-driven (YAML). Used by all SOTA benchmark papers.
DeepCTR-Torch	https://github.com/shenweichen/DeepCTR-Torch	20+ (DeepFM, DCN, DIN, DIEN, xDeepFM)	PyTorch	Simpler API (Python class). Good for quick prototyping.
TorchSurv	https://github.com/Novartis/torchsurv	Cox PH, Weibull AFT, DeepSurv, DeepHit	PyTorch	Deep survival analysis for clearing price.
BARS	https://github.com/openbenchmark/BARS	Benchmarking	—	Standardized evaluation pipeline. 389★

5.2 Bidding Algorithm Implementations

Repo	URL	Algorithms	Notes
rlb-dp	https://github.com/han-cai/rlb-dp	RLB (MDP + DP)	188 stars. Original implementation of RL for RTB.
budget_constrained_bidding	https://github.com/dingmu365/budget_constrained_bidding	Budget-constrained RTB	Contains multiple budget-constrained bidding algorithms.
budget_constrained_bidding (fork)	https://github.com/GinNie23/budget_constrained_bidding	Same	Fork with modifications.
Budget_Constrained_Bidding	https://github.com/venkatacrc/Budget_Constrained_Bidding	Same	Another implementation.
hamverbot/rtb-bidding-comparison	https://huggingface.co/hamverbot/rtb-bidding-comparison	DualOGD, Linear, ORTB, Threshold, MPC	Your repo — already has a working comparison framework!

5.3 FuxiCTR Quick Start

pip install fuxictr

# config/criteo_finalmlp.yaml
dataset_id: Criteo_x4
model: FinalMLP
embedding_dim: 10
hidden_units: [400, 400, 400]
batch_size: 4096
learning_rate: 1e-3
epochs: 10
metrics: [auc, logloss]

from fuxictr import autotuner
autotuner.run("config/criteo_finalmlp.yaml", "Criteo_x4", "FinalMLP")

5.4 DeepCTR-Torch Quick Start

pip install deepctr-torch

from deepctr_torch.models import DeepFM
from deepctr_torch.inputs import SparseFeat, DenseFeat

sparse_features = [SparseFeat(f, vocab_size=df[f].nunique(), embedding_dim=10) 
                   for f in categorical_cols]
dense_features = [DenseFeat(f, 1) for f in numerical_cols]

model = DeepFM(linear_feature_columns=sparse_features + dense_features,
               dnn_feature_columns=sparse_features + dense_features,
               dnn_hidden_units=(400, 400, 400), device='cuda')
model.compile('adam', 'binary_crossentropy', metrics=['auc'])
model.fit(train_input, train_labels, batch_size=4096, epochs=10)

6. Benchmark Leaderboards

Leaderboard	URL	Description
BARS CTR Criteo_x4	https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x4.html	Definite CTR benchmark — 24 models compared
BARS CTR Criteo_x1	https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x1.html	Smaller Criteo subset
BARS CTR Avazu	https://openbenchmark.github.io/BARS/CTR/leaderboard/avazu_x4.html	Avazu benchmark
BARS Main	https://openbenchmark.github.io/BARS	Full recommender systems benchmark

Top Criteo_x4 AUC scores (from BARS):

FinalMLP: 0.8149
DCNv2: 0.8142
DeepFM: 0.8138
xDeepFM: 0.8136
AutoInt+: 0.8134

Key takeaway: Top 5 models are within 0.15% AUC of each other.

7. Recommended Architecture

For Your Problem: "Lagrangian Dual Multiplier with Online Error Gradient Descent"

┌─────────────────────────────────────────────────────────────┐
│                   BIDDING ALGORITHM                          │
│                                                              │
│  Dual OGD (Wang et al. 2023)                                 │
│  λ_{t+1} = Proj(λ_t - ε·(ρ - c̃_t(b_t)))                    │
│  b_t = argmax_b (r̃_t(v_t, b) - λ_t·c̃_t(b))                 │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│                 PREDICTION MODELS                            │
│                                                              │
│  ┌──────────────────┐    ┌──────────────────────┐            │
│  │  CTR Predictor   │    │  Clearing Price Est. │            │
│  │  (FinalMLP)      │    │  (Empirical CDF       │            │
│  │                   │    │   OR TorchSurv)      │            │
│  │  v_t = pCTR × V  │    │  G̃(b) = P(win | b)   │            │
│  └──────────────────┘    └──────────────────────┘            │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│                     DATASETS                                 │
│                                                              │
│  Criteo_x4 (CTR training) + iPinYou (bidding simulation)     │
│  OR: Criteo_x4 + synthetic price generation                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Implementation Priority

Phase 1: Improve CTR model — replace current LogisticRegression with FinalMLP trained on Criteo_x4 (via FuxiCTR)
Phase 2: Improve clearing price — add TorchSurv-based censored regression alongside current empirical CDF
Phase 3: Add Balseiro dual mirror descent for comparison (simpler baseline, no market price model)
Phase 4: Add two-sided budget constraint (cap + floor) with dual dual variables
Phase 5: Full sweep over hyperparameters: step size ε, budget fraction k%, value per click, CTR model architecture

Online Learning Note

For production RTB where the environment is non-stationary, implement periodic retraining:

Save model checkpoint every N hours
Reload and train on sliding window of most recent data
Deploy updated model without restarting the bidding algorithm

The Lagrangian multiplier λ is intrinsically online (updated per auction). The CTR model needs separate periodic retraining.

Paper Index (All Papers Referenced)

#	Paper	arXiv	Venue	Year	Citations
1	Wang et al. — Learning to Bid in Repeated First-Price Auctions with Budgets	2304.13477	NeurIPS	2023	Growing
2	Balseiro et al. — Dual Mirror Descent for Online Allocation	2011.10124	Ops Research	2020	135+
3	Feng et al. — Online Bidding for RoS Constrained Advertisers	2208.13713	ICML	2022	38+
4	Cai et al. — RTB by RL in Display Advertising	1701.02490	WSDM	2017	300+
5	Wang et al. — HiBid Hierarchical DRL Bidding	2312.17503	—	2023	New
6	— Online Bidding for Contextual First-Price (Quantile)	2603.07207	—	2026	New
7	Mao et al. — FinalMLP	2304.00902	AAAI	2023	Growing
8	Wang et al. — GDCN	2311.04635	CIKM	2023	Growing
9	Wang et al. — DCN V2	2008.13535	WWW	2021	500+
10	Guo et al. — DeepFM	—	IJCAI	2017	3000+
11	— FCN: Fusing Cross Network	2407.13349	—	2024	New
12	Zhu et al. — BARS-CTR Benchmark	2009.05794	CIKM	2021	100+
13	Wu et al. — Predicting Winning Price with Censored Data	—	KDD	2015	101
14	— Deep Censored Learning of Winning Price	—	WWW	2019	Well-cited
15	Katzman et al. — DeepSurv	—	BMC	2018	1000+
16	— TorchSurv	2404.10761	—	2024	New
17	— Robust Budget Pacing with a Single Sample	2302.02006	—	2023	Growing
18	— Multi-Channel Autobidding with Budget and ROI	2302.01523	—	2023	Growing
19	— No-Regret in Repeated FPA with Budgets	2205.14572	—	2022	14
20	— Dynamic Budget Throttling	2207.04690	—	2022	6
21	— AIGB: Generative Auto-bidding	2405.16141	—	2024	New
22	— Adaptive Bidding under Non-Stationarity	2505.02796	—	2025	2
23	— Joint Value Estimation and Bidding	2502.17292	—	2025	4
24	— Leveraging Hints: Adaptive Bidding	2211.06358	—	2022	13
25	Zhou et al. — DIN	1706.06978	KDD	2018	2000+
26	Zhou et al. — DIEN	1809.03672	AAAI	2019	1000+
27	Lian et al. — xDeepFM	1803.05170	KDD	2018	1000+
28	Song et al. — AutoInt	1810.11921	CIKM	2019	500+
29	Naumov et al. — DLRM (Meta)	1906.00091	—	2019	500+
30	Cheng et al. — Wide & Deep	1606.07792	RecSys	2016	4000+
31	McMahan et al. — Ad Click Prediction (FTRL)	—	KDD	2013	2000+
32	Zhang et al. — Optimal RTB for Display Advertising	—	KDD	2014	500+