RTB Bidding Algorithm Comparison β Complete Research Resource List
Generated: 2026-05-05 | Author: ML Intern for hamverbot
Repository: https://huggingface.co/hamverbot/rtb-bidding-comparison
Table of Contents
- Bidding Algorithms
- CTR Prediction Models
- Clearing Price / Market Price Prediction
- Datasets
- Codebases & Implementations
- Benchmark Leaderboards
- Recommended Architecture
1. Bidding Algorithms
1.1 Lagrangian Dual + Online Gradient Descent (BEST MATCH)
| Property |
Detail |
| Paper |
"Learning to Bid in Repeated First-Price Auctions with Budgets" |
| Authors |
Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong (2023) |
| Venue |
NeurIPS 2023 (implied) |
| arXiv |
2304.13477 |
| HF Papers |
https://huggingface.co/papers/2304.13477 |
| Algorithm |
DualOGD β Lagrangian dual multiplier updated by online error gradient descent |
| Auction Type |
First-price (also handles second-price) |
| Constraints |
Budget cap: total spend β€ ΟT |
| Regret Bound |
Γ(βT) for both full-information and one-sided feedback |
| Key Formula |
Ξ»_{t+1} = Proj_{Ξ»>0}(Ξ»_t β Ρ·(Ο β cΜ_t(b_t))) |
| Bid Rule |
b_t = argmax_b (rΜ_t(v_t, b) β Ξ»_tΒ·cΜ_t(b)) |
| Prediction Models Needed |
CTR predictor (for v_t), empirical CDF of competing bids (GΜ) |
| Why It's The Best Match |
You explicitly described "Lagrangian dual multiplier and updating the dual variables online by error gradient descent" β this is exactly Algorithm 1, line 7. |
1.2 Dual Mirror Descent (Second-Price)
| Property |
Detail |
| Paper |
"The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems" |
| Authors |
Santiago Balseiro, Haihao Lu, Vahab Mirrokni (2020) |
| Venue |
Operations Research (2023) / NeurIPS 2020 Workshop |
| arXiv |
2011.10124 |
| HF Papers |
https://huggingface.co/papers/2011.10124 |
| Citations |
135+ |
| Algorithm |
Dual mirror descent β generalizes OGD with Bregman divergences |
| Auction Type |
Second-price (truthful) |
| Bid Rule |
b_t = v_t / (1 + ΞΌ_t) |
| Dual Update |
ΞΌ_{t+1} = Proj(ΞΌ_t β Ξ·Β·(Ο β payment_t)) |
| Key Insight |
In second-price auctions, you don't need a market price model. The dual multiplier naturally paces spending. |
| Prediction Models |
CTR predictor only (no market price model needed) |
1.3 Dual Descent with RoS + Budget (Multi-Constraint)
| Property |
Detail |
| Paper |
"Online Bidding Algorithms for Return-on-Spend Constrained Advertisers" |
| Authors |
Zhe Feng, Swati Padmanabhan, Di Wang (2022) |
| Venue |
ICML 2022 |
| arXiv |
2208.13713 |
| Citations |
38+ |
| Algorithm |
Two dual variables: Ξ» for RoS, ΞΌ for budget |
| Bid Rule |
b_t = ((1+Ξ»_t)/(ΞΌ_t+Ξ»_t)) Β· v_t |
| Updates |
Ξ»_{t+1} = Ξ»_tΒ·exp(-Ξ±Β·(v_tΒ·x_t(b_t) β p_t(b_t))) [multiplicative]; ΞΌ_{t+1} = Proj(ΞΌ_t β Ξ·Β·(Ο β p_t(b_t))) [sub-gradient] |
| Key Insight |
Can be adapted for your "ensure k% spend" floor β use second dual variable to enforce minimum spend |
| Prediction Models |
CTR predictor (v_t), payment observed |
1.4 RLB β Reinforcement Learning Bidding
| Property |
Detail |
| Paper |
"Real-Time Bidding by Reinforcement Learning in Display Advertising" |
| Authors |
Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo (2017) |
| Venue |
WSDM 2017 |
| arXiv |
1701.02490 |
| HF Papers |
https://huggingface.co/papers/1701.02490 |
| GitHub |
https://github.com/han-cai/rlb-dp (188 stars) |
| Algorithm |
MDP + Dynamic Programming + Neural value function approximation |
| State |
(t remaining auctions, b remaining budget, x feature vector) |
| Action |
bid price a β [0, b] |
| Results |
+22% clicks over linear bidding at tight budgets on iPinYou |
| Prediction Models |
CTR ΞΈ(x) + market price distribution m(Ξ΄, x) |
| Key Insight |
Foundational; explicitly models the budget-depletion tradeoff via DP. Superseded by dual methods for budget pacing but still influential. |
1.5 HiBid β Industrial Hierarchical Dual-RL
| Property |
Detail |
| Paper |
"HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning" |
| Authors |
Yuhang Wang et al. (2023) |
| arXiv |
2312.17503 |
| HF Papers |
https://huggingface.co/papers/2312.17503 |
| Algorithm |
High-level RL budget allocation + Low-level Ξ»-parameterized bidding |
| Scale |
64K advertisers, 70M requests/day, 4 channels, deployed at Meituan |
| Results |
Outperforms RL-based baselines (R-BCQ, BCQ, CQL) on clicks, CPC, CSR, ROI |
1.6 Contextual First-Price Extension (Very Recent!)
| Property |
Detail |
| Paper |
"Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" |
| Authors |
(2026) |
| arXiv |
2603.07207 |
| Algorithm |
Dual OGD + quantile-based contextual censored regression |
| Key Innovation |
Extends Wang et al. (2023) to handle contextual (feature-based) auctions with a novel quantile trick for censored data |
| Regret |
Γ(βT) in contextual first-price auctions |
1.7 Unified View of Lagrangian Dual Multiplier Methods
All dual methods follow the same template:
For each auction t:
1. Observe value v_t (from CTR prediction Γ click value)
2. Compute bid: b_t = f(v_t, dual_multiplier_t)
3. Observe outcome: payment c_t (if won) or 0 (if lost)
4. Compute gradient: g_t = Ο β c_t
5. Update multiplier: Ξ»_{t+1} = Proj_{Ξ»β₯0}(Ξ»_t β Ξ·Β·g_t)
| Method |
Auction |
Bid Function f(v, Ξ») |
| Wang 2023 |
First-price |
argmax_b (rΜ(v,b) β λ·cΜ(b)) |
| Balseiro 2020 |
Second-price |
v / (1+Ξ») |
| Feng 2022 |
Second-price |
((1+Ξ»_RoS)/(Ξ»_RoS+Ξ»_budget)) Β· v |
1.8 Additional Papers (Supplementary)
| Paper |
Key Idea |
arXiv |
| Dynamic Budget Throttling |
Throttle participation rate to control spend |
2207.04690 |
| No-Regret Learning in Repeated First-Price Auctions |
General no-regret framework for first-price |
2205.14572 |
| Robust Budget Pacing with a Single Sample |
Near-optimal regret from 1 sample per distribution |
2302.02006 |
| Learning to Bid Optimally in Adversarial First-Price |
Adversarial (non-i.i.d.) setting |
2007.04568 |
| Optimal No-Regret Learning in Repeated FPA |
Minimax optimal bounds |
2003.09795 |
| Multi-Channel Autobidding with Budget and ROI |
Per-channel optimization optimality |
2302.01523 |
| Leveraging the Hints: Adaptive Bidding |
Uses hints/forecasts for better bidding |
2211.06358 |
| Adaptive Bidding under Non-stationarity |
Handles distribution shift |
2505.02796 |
| Joint Value Estimation and Bidding |
Simultaneous CTR learning + bidding |
2502.17292 |
| No-Regret is not enough! |
Adaptive regret for constrained bandits |
2405.06575 |
| AIGB: Generative Auto-bidding |
Diffusion models for bid trajectory generation |
2405.16141 |
Two-Sided Budget Constraint (Your Specific Need)
You need: maximize clicks s.t. spend β€ B AND spend β₯ kΒ·B.
This requires two dual variables:
- ΞΌ for the budget cap: ΞΌ_{t+1} = Proj(ΞΌ_t β Ξ·βΒ·(Ο β spend_t))
- Ξ½ for the spend floor: Ξ½_{t+1} = Proj(Ξ½_t β Ξ·βΒ·(spend_t β kΟ))
Bid function: b_t = v_t Β· f(ΞΌ_t, Ξ½_t) where f decreases with ΞΌ and increases with Ξ½.
2. CTR Prediction Models
2.1 FinalMLP (RECOMMENDED β Best AUC, Fastest Inference)
| Property |
Detail |
| Paper |
"FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction" |
| Authors |
Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong (2023) |
| Venue |
AAAI 2023 |
| arXiv |
2304.00902 |
| HF Papers |
https://huggingface.co/papers/2304.00902 |
| Datasets |
reczoo/Criteo_x1, reczoo/Avazu_x1, reczoo/MovielensLatest_x1, reczoo/Frappe_x1 |
| Criteo AUC |
0.8149 |
| Avazu AUC |
0.7666 |
| Architecture |
Two-stream MLP: two independent MLP towers + feature gating (soft selection) + bilinear fusion |
| Inference Speed |
Fastest among SOTA (pure MLP, ~400-dim hidden, no attention) |
| Why Best for RTB |
Pure feed-forward, <1ms inference, easy to deploy |
2.2 GDCN β Gated Deep Cross Network
| Property |
Detail |
| Paper |
"Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction" |
| Authors |
Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu (2023) |
| Venue |
CIKM 2023 |
| arXiv |
2311.04635 |
| HF Papers |
https://huggingface.co/papers/2311.04635 |
| Criteo AUC |
0.8161 (own split β not directly comparable) |
| Architecture |
DCNv2 + learned information gate per cross layer + Field-level Dimension Optimization (FDO) |
| Key Insight |
Gate filters noisy interactions; FDO compresses embeddings 60%+. Good for memory-constrained RTB. |
2.3 DCNv2 β Industry Workhorse
| Property |
Detail |
| Paper |
"DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems" |
| Authors |
Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi (2021) |
| Venue |
WWW 2021 |
| arXiv |
2008.13535 |
| HF Papers |
https://huggingface.co/papers/2008.13535 |
| Criteo AUC |
0.8142-0.8144 (retuned) |
| Architecture |
Embedding β parallel CrossNetV2 + DNN β concat β sigmoid |
| Key Insight |
Mixture-of-Experts-style low-rank decomposition. Battle-tested at Google. |
2.4 DeepFM β Simple, Strong Baseline
| Property |
Detail |
| Paper |
"DeepFM: A Factorization-Machine based Neural Network for CTR Prediction" |
| Authors |
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He (2017) |
| Venue |
IJCAI 2017 |
| Criteo AUC |
0.8138 (retuned) |
| Architecture |
Shared embedding β parallel FM (2nd-order) + DNN β sum β sigmoid |
| Key Insight |
Shared embedding between FM and DNN is the secret. End-to-end, no pre-training. |
2.5 FCN β Fusing Cross Network (Most Recent)
| Property |
Detail |
| Paper |
"FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction" |
| Authors |
(2024) |
| arXiv |
2407.13349 |
| HF Papers |
https://huggingface.co/papers/2407.13349 |
| Architecture |
Two explicit cross sub-networks: LCN (linear, order grows linearly) + ECN (exponential, order doubles per layer) |
| Key Insight |
No DNN needed β all interactions explicit. 50% fewer params, 23% lower latency than DCNv2. |
| Caveat |
Newer paper with less community validation. GitHub: https://github.com/salmon1802/FCN |
2.6 BARS Meta-Finding
| Property |
Detail |
| Paper |
"BARS-CTR: Open Benchmarking for Click-Through Rate Prediction" |
| Authors |
Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He (2021) |
| Venue |
CIKM 2021 |
| arXiv |
2009.05794 |
| HF Papers |
https://huggingface.co/papers/2009.05794 |
| Key Result |
After 7,000+ experiments and 12,000 GPU hours: differences between SOTA deep CTR models are surprisingly small (~0.1-0.3% AUC). Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering. All models converge to ~0.814 AUC on Criteo after proper tuning. |
2.7 Additional CTR Papers
| Paper |
Key Idea |
arXiv |
| DIN (KDD 2018) |
Attention over user behavior sequence |
1706.06978 |
| DIEN (AAAI 2019) |
Interest evolution with GRU + attention |
1809.03672 |
| xDeepFM (KDD 2018) |
Compressed Interaction Network (CIN) for vector-wise crosses |
1803.05170 |
| AutoInt (CIKM 2019) |
Multi-head self-attention for feature interactions |
1810.11921 |
| DLRM (Meta, 2019) |
Specialized for recommendation: MLP for dense + embedding for sparse |
1906.00091 |
| Wide & Deep (Google, 2016) |
Memorization (wide) + generalization (deep) |
1606.07792 |
| FTRL-Proximal (KDD 2013) |
"Ad Click Prediction: a View from the Trenches" β online learning for linear CTR |
β |
| Streaming CTR (2023) |
Online CTR with non-stationary data |
2307.07509 |
2.8 Latency Considerations for RTB
| Model |
Architecture |
Inference Speed |
RTB-Suitable |
| FinalMLP |
Pure MLP |
βββββ (<1ms) |
β
Best |
| DCNv2 |
CrossNet + DNN |
ββββ |
β
|
| GDCN |
Gated Cross + DNN |
ββββ |
β
|
| DeepFM |
FM + DNN |
ββββ |
β
|
| FCN |
LCN + ECN (no DNN) |
ββββ |
β
|
| DIN |
Attention (user history) |
ββ |
β Too slow |
| DIEN |
GRU + attention |
β |
β Too slow |
| AutoInt |
Multi-head attention |
ββ |
β Too slow |
3. Clearing Price / Market Price Prediction
3.1 Non-Parametric Empirical CDF (RECOMMENDED BASELINE)
| Property |
Detail |
| Source |
Wang et al. (2023), Algorithm 1, Section 3.1 |
| arXiv |
2304.13477 |
| Method |
Maintain array of observed competing bids d_s, estimate GΜ_t(b) = (1/(t-1))βπ{b β₯ d_s} |
| Win Probability |
P(win|b) = GΜ_t(b) |
| Expected Cost |
E[cost|win,b] = (1/GΜ_t(b)) Β· mean of {d_s : d_s β€ b} |
| Pros |
No model training needed, theoretically sound (Γ(βT) regret), handles distribution shift naturally |
| Cons |
No context/features, cold-start when t is small |
3.2 Censored Linear Regression (Wu et al. 2015)
| Property |
Detail |
| Paper |
"Predicting Winning Price in Real Time Bidding with Censored Data" |
| Authors |
Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen (2015) |
| Venue |
KDD 2015 |
| Citations |
~101 |
| Method |
Tobit-like model: log(market_price) = Ξ²Β·x + Ξ΅, Ξ΅ ~ N(0, ΟΒ²) |
| Key Insight |
Properly handles censoring via likelihood: winning auctions contribute f(price|x), losing auctions contribute S(bid|x) |
| Pros |
Contextual, simple, computationally cheap |
| Cons |
Linear model β limited capacity for complex interactions |
3.3 Deep Censored Learning / Survival Analysis
| Property |
Detail |
| Paper |
"Deep Censored Learning of the Winning Price" (Zhu et al., WWW 2019) |
| Method |
Neural network trained with censored survival loss |
| Loss |
Winning: -log f(price|x); Losing: -log S(bid|x) |
| Library |
TorchSurv (arXiv:2404.10761, Novartis, 200β
GitHub) |
| TorchSurv URL |
https://github.com/Novartis/torchsurv |
| TorchSurv Docs |
https://opensource.nibr.com/torchsurv/ |
| PyPI |
pip install torchsurv |
| Key Insight |
Proper survival framework handles censoring. Win = exact price observed (uncensored). Loss = only lower bound (censored at bid). |
| Architecture |
Deep FC predicting either hazard rate Ξ»(t|x) (Cox PH) or distribution parameters (Weibull/log-normal AFT) |
from torchsurv.loss import cox
log_hazard = model(features)
loss = cox.neg_partial_log_likelihood(log_hazard, event, time)
3.4 Win Probability Neural Network (Simplest ML)
| Property |
Detail |
| Method |
Direct binary classification: P(win|bid_price, features) |
| Pros |
Dead simple, works with standard BCELoss |
| Cons |
Ignores censored price info when you win β only uses binary win/loss signal |
| Input |
features + bid_price β sigmoid |
3.5 Parametric Distribution Fitting
| Property |
Detail |
| Paper |
Referenced in RLB (Cai et al. 2017) β "Functional Bid Landscape Forecasting" (ECML-PKDD 2016) |
| Method |
Fit log-normal or gamma distribution to observed winning prices; predict parameters from features using GBDT |
| Pros |
Parametric assumptions reduce variance |
| Cons |
Distribution assumption may not hold; doesn't properly handle censoring |
3.6 Contextual Quantile-Based (2026)
| Property |
Detail |
| Paper |
"Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" |
| arXiv |
2603.07207 |
| Method |
Models competing bid as d_t = Ξ±Β·x_t + z_t (linear contextual); quantile-based estimator for Ξ± |
| Key Trick |
Splits samples by bid quantile and exploits identifiable conditional quantiles to circumvent full censoring |
| Pros |
Theoretical guarantees in contextual setting |
| Cons |
Linear contextual model only; very recent |
3.7 Comparison Summary
| Method |
Contextual? |
Handles Censoring? |
Model Training? |
Complexity |
| Empirical CDF |
β |
N/A (full info) |
None |
Minimal |
| Censored Linear Reg |
β
|
β
(proper likelihood) |
Linear model |
Low |
| Deep Survival (TorchSurv) |
β
|
β
(proper likelihood) |
Neural net |
Medium |
| Win Prob Classifier |
β
|
β (binary only) |
Neural net |
Low |
| Parametric (log-normal) |
Optional |
β |
GBDT |
Medium |
| Quantile Censored |
β
|
β
(quantile trick) |
Linear |
Medium-High |
4. Datasets
4.1 CTR Prediction Datasets
| Dataset |
HF Hub Path |
Size |
Fields |
Label |
Verified |
| Criteo_x4 |
reczoo/Criteo_x4 |
45.8M rows, 5.6GB |
13 dense (I1-I13) + 26 categorical (C1-C26) |
Label (0/1) |
β
|
| Avazu_x4 |
reczoo/Avazu_x4 |
40.4M rows, 1.8GB |
22 fields (mixed) |
click (0/1) |
β
|
| Criteo_x1 |
reczoo/Criteo_x1 |
~11M rows |
Same as x4 |
Label |
β
|
| Avazu_x1 |
reczoo/Avazu_x1 |
~10M rows |
Same as x4 |
click |
β
|
Standard split: 80% train / 10% val / 10% test (BARS protocol).
4.2 RTB Bidding Datasets
| Dataset |
Source |
Size |
Format |
Availability |
| iPinYou |
data.computational-advertising.org |
19.5M impressions, 9 campaigns, 10 days (2013) |
Bid logs with market price |
External download only (NOT on HF Hub) |
| YOYI |
Various academic mirrors |
~400M records |
Bid logs |
External download only |
iPinYou format: (click, paying_price, bid_price, slot_id, user_tags, ...) β already includes market price info needed for bidding simulation.
Key Gap: No RTB bid-log datasets on HuggingFace Hub. Criteo/Avazu have click labels but no bid/price columns β they can only be used for CTR training and require synthetic price generation for bidding evaluation.
4.3 Data Requirements for Each Algorithm
| Algorithm |
Needs from Dataset |
| Dual OGD (Wang) |
click labels (CTR training) + competing bids (or synthetic prices for simulation) |
| Dual Mirror Descent (Balseiro) |
click labels + auction payment (second-price) |
| RLB (Cai) |
click labels + market prices + impression features |
| CTR models (all) |
click labels + features (Criteo/Avazu: β
) |
| Clearing price models |
observed prices (won auctions) + bids (lost auctions) |
5. Codebases & Implementations
5.1 CTR Model Libraries
| Library |
URL |
Models |
Framework |
Notes |
| FuxiCTR |
https://github.com/reczoo/FuxiCTR |
40+ (FinalMLP, DeepFM, DCNv2, GDCN, FCN, xDeepFM, AutoInt) |
PyTorch |
Config-driven (YAML). Used by all SOTA benchmark papers. |
| DeepCTR-Torch |
https://github.com/shenweichen/DeepCTR-Torch |
20+ (DeepFM, DCN, DIN, DIEN, xDeepFM) |
PyTorch |
Simpler API (Python class). Good for quick prototyping. |
| TorchSurv |
https://github.com/Novartis/torchsurv |
Cox PH, Weibull AFT, DeepSurv, DeepHit |
PyTorch |
Deep survival analysis for clearing price. |
| BARS |
https://github.com/openbenchmark/BARS |
Benchmarking |
β |
Standardized evaluation pipeline. 389β
|
5.2 Bidding Algorithm Implementations
5.3 FuxiCTR Quick Start
pip install fuxictr
dataset_id: Criteo_x4
model: FinalMLP
embedding_dim: 10
hidden_units: [400, 400, 400]
batch_size: 4096
learning_rate: 1e-3
epochs: 10
metrics: [auc, logloss]
from fuxictr import autotuner
autotuner.run("config/criteo_finalmlp.yaml", "Criteo_x4", "FinalMLP")
5.4 DeepCTR-Torch Quick Start
pip install deepctr-torch
from deepctr_torch.models import DeepFM
from deepctr_torch.inputs import SparseFeat, DenseFeat
sparse_features = [SparseFeat(f, vocab_size=df[f].nunique(), embedding_dim=10)
for f in categorical_cols]
dense_features = [DenseFeat(f, 1) for f in numerical_cols]
model = DeepFM(linear_feature_columns=sparse_features + dense_features,
dnn_feature_columns=sparse_features + dense_features,
dnn_hidden_units=(400, 400, 400), device='cuda')
model.compile('adam', 'binary_crossentropy', metrics=['auc'])
model.fit(train_input, train_labels, batch_size=4096, epochs=10)
6. Benchmark Leaderboards
Top Criteo_x4 AUC scores (from BARS):
- FinalMLP: 0.8149
- DCNv2: 0.8142
- DeepFM: 0.8138
- xDeepFM: 0.8136
- AutoInt+: 0.8134
Key takeaway: Top 5 models are within 0.15% AUC of each other.
7. Recommended Architecture
For Your Problem: "Lagrangian Dual Multiplier with Online Error Gradient Descent"
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BIDDING ALGORITHM β
β β
β Dual OGD (Wang et al. 2023) β
β Ξ»_{t+1} = Proj(Ξ»_t - Ρ·(Ο - cΜ_t(b_t))) β
β b_t = argmax_b (rΜ_t(v_t, b) - Ξ»_tΒ·cΜ_t(b)) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PREDICTION MODELS β
β β
β ββββββββββββββββββββ ββββββββββββββββββββββββ β
β β CTR Predictor β β Clearing Price Est. β β
β β (FinalMLP) β β (Empirical CDF β β
β β β β OR TorchSurv) β β
β β v_t = pCTR Γ V β β GΜ(b) = P(win | b) β β
β ββββββββββββββββββββ ββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β DATASETS β
β β
β Criteo_x4 (CTR training) + iPinYou (bidding simulation) β
β OR: Criteo_x4 + synthetic price generation β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation Priority
- Phase 1: Improve CTR model β replace current LogisticRegression with FinalMLP trained on Criteo_x4 (via FuxiCTR)
- Phase 2: Improve clearing price β add TorchSurv-based censored regression alongside current empirical CDF
- Phase 3: Add Balseiro dual mirror descent for comparison (simpler baseline, no market price model)
- Phase 4: Add two-sided budget constraint (cap + floor) with dual dual variables
- Phase 5: Full sweep over hyperparameters: step size Ξ΅, budget fraction k%, value per click, CTR model architecture
Online Learning Note
For production RTB where the environment is non-stationary, implement periodic retraining:
- Save model checkpoint every N hours
- Reload and train on sliding window of most recent data
- Deploy updated model without restarting the bidding algorithm
The Lagrangian multiplier Ξ» is intrinsically online (updated per auction). The CTR model needs separate periodic retraining.
Paper Index (All Papers Referenced)
| # |
Paper |
arXiv |
Venue |
Year |
Citations |
| 1 |
Wang et al. β Learning to Bid in Repeated First-Price Auctions with Budgets |
2304.13477 |
NeurIPS |
2023 |
Growing |
| 2 |
Balseiro et al. β Dual Mirror Descent for Online Allocation |
2011.10124 |
Ops Research |
2020 |
135+ |
| 3 |
Feng et al. β Online Bidding for RoS Constrained Advertisers |
2208.13713 |
ICML |
2022 |
38+ |
| 4 |
Cai et al. β RTB by RL in Display Advertising |
1701.02490 |
WSDM |
2017 |
300+ |
| 5 |
Wang et al. β HiBid Hierarchical DRL Bidding |
2312.17503 |
β |
2023 |
New |
| 6 |
β Online Bidding for Contextual First-Price (Quantile) |
2603.07207 |
β |
2026 |
New |
| 7 |
Mao et al. β FinalMLP |
2304.00902 |
AAAI |
2023 |
Growing |
| 8 |
Wang et al. β GDCN |
2311.04635 |
CIKM |
2023 |
Growing |
| 9 |
Wang et al. β DCN V2 |
2008.13535 |
WWW |
2021 |
500+ |
| 10 |
Guo et al. β DeepFM |
β |
IJCAI |
2017 |
3000+ |
| 11 |
β FCN: Fusing Cross Network |
2407.13349 |
β |
2024 |
New |
| 12 |
Zhu et al. β BARS-CTR Benchmark |
2009.05794 |
CIKM |
2021 |
100+ |
| 13 |
Wu et al. β Predicting Winning Price with Censored Data |
β |
KDD |
2015 |
101 |
| 14 |
β Deep Censored Learning of Winning Price |
β |
WWW |
2019 |
Well-cited |
| 15 |
Katzman et al. β DeepSurv |
β |
BMC |
2018 |
1000+ |
| 16 |
β TorchSurv |
2404.10761 |
β |
2024 |
New |
| 17 |
β Robust Budget Pacing with a Single Sample |
2302.02006 |
β |
2023 |
Growing |
| 18 |
β Multi-Channel Autobidding with Budget and ROI |
2302.01523 |
β |
2023 |
Growing |
| 19 |
β No-Regret in Repeated FPA with Budgets |
2205.14572 |
β |
2022 |
14 |
| 20 |
β Dynamic Budget Throttling |
2207.04690 |
β |
2022 |
6 |
| 21 |
β AIGB: Generative Auto-bidding |
2405.16141 |
β |
2024 |
New |
| 22 |
β Adaptive Bidding under Non-Stationarity |
2505.02796 |
β |
2025 |
2 |
| 23 |
β Joint Value Estimation and Bidding |
2502.17292 |
β |
2025 |
4 |
| 24 |
β Leveraging Hints: Adaptive Bidding |
2211.06358 |
β |
2022 |
13 |
| 25 |
Zhou et al. β DIN |
1706.06978 |
KDD |
2018 |
2000+ |
| 26 |
Zhou et al. β DIEN |
1809.03672 |
AAAI |
2019 |
1000+ |
| 27 |
Lian et al. β xDeepFM |
1803.05170 |
KDD |
2018 |
1000+ |
| 28 |
Song et al. β AutoInt |
1810.11921 |
CIKM |
2019 |
500+ |
| 29 |
Naumov et al. β DLRM (Meta) |
1906.00091 |
β |
2019 |
500+ |
| 30 |
Cheng et al. β Wide & Deep |
1606.07792 |
RecSys |
2016 |
4000+ |
| 31 |
McMahan et al. β Ad Click Prediction (FTRL) |
β |
KDD |
2013 |
2000+ |
| 32 |
Zhang et al. β Optimal RTB for Display Advertising |
β |
KDD |
2014 |
500+ |