rtb-bidding-comparison / RESEARCH_RESOURCES.md
hamverbot's picture
Upload RESEARCH_RESOURCES.md
7028d46 verified

RTB Bidding Algorithm Comparison β€” Complete Research Resource List

Generated: 2026-05-05 | Author: ML Intern for hamverbot Repository: https://huggingface.co/hamverbot/rtb-bidding-comparison


Table of Contents

  1. Bidding Algorithms
  2. CTR Prediction Models
  3. Clearing Price / Market Price Prediction
  4. Datasets
  5. Codebases & Implementations
  6. Benchmark Leaderboards
  7. Recommended Architecture

1. Bidding Algorithms

1.1 Lagrangian Dual + Online Gradient Descent (BEST MATCH)

Property Detail
Paper "Learning to Bid in Repeated First-Price Auctions with Budgets"
Authors Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong (2023)
Venue NeurIPS 2023 (implied)
arXiv 2304.13477
HF Papers https://huggingface.co/papers/2304.13477
Algorithm DualOGD β€” Lagrangian dual multiplier updated by online error gradient descent
Auction Type First-price (also handles second-price)
Constraints Budget cap: total spend ≀ ρT
Regret Bound Γ•(√T) for both full-information and one-sided feedback
Key Formula Ξ»_{t+1} = Proj_{Ξ»>0}(Ξ»_t βˆ’ Ρ·(ρ βˆ’ cΜƒ_t(b_t)))
Bid Rule b_t = argmax_b (rΜƒ_t(v_t, b) βˆ’ Ξ»_tΒ·cΜƒ_t(b))
Prediction Models Needed CTR predictor (for v_t), empirical CDF of competing bids (G̃)
Why It's The Best Match You explicitly described "Lagrangian dual multiplier and updating the dual variables online by error gradient descent" β€” this is exactly Algorithm 1, line 7.

1.2 Dual Mirror Descent (Second-Price)

Property Detail
Paper "The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems"
Authors Santiago Balseiro, Haihao Lu, Vahab Mirrokni (2020)
Venue Operations Research (2023) / NeurIPS 2020 Workshop
arXiv 2011.10124
HF Papers https://huggingface.co/papers/2011.10124
Citations 135+
Algorithm Dual mirror descent β€” generalizes OGD with Bregman divergences
Auction Type Second-price (truthful)
Bid Rule b_t = v_t / (1 + ΞΌ_t)
Dual Update ΞΌ_{t+1} = Proj(ΞΌ_t βˆ’ Ξ·Β·(ρ βˆ’ payment_t))
Key Insight In second-price auctions, you don't need a market price model. The dual multiplier naturally paces spending.
Prediction Models CTR predictor only (no market price model needed)

1.3 Dual Descent with RoS + Budget (Multi-Constraint)

Property Detail
Paper "Online Bidding Algorithms for Return-on-Spend Constrained Advertisers"
Authors Zhe Feng, Swati Padmanabhan, Di Wang (2022)
Venue ICML 2022
arXiv 2208.13713
Citations 38+
Algorithm Two dual variables: Ξ» for RoS, ΞΌ for budget
Bid Rule b_t = ((1+Ξ»_t)/(ΞΌ_t+Ξ»_t)) Β· v_t
Updates Ξ»_{t+1} = Ξ»_tΒ·exp(-Ξ±Β·(v_tΒ·x_t(b_t) βˆ’ p_t(b_t))) [multiplicative]; ΞΌ_{t+1} = Proj(ΞΌ_t βˆ’ Ξ·Β·(ρ βˆ’ p_t(b_t))) [sub-gradient]
Key Insight Can be adapted for your "ensure k% spend" floor β€” use second dual variable to enforce minimum spend
Prediction Models CTR predictor (v_t), payment observed

1.4 RLB β€” Reinforcement Learning Bidding

Property Detail
Paper "Real-Time Bidding by Reinforcement Learning in Display Advertising"
Authors Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo (2017)
Venue WSDM 2017
arXiv 1701.02490
HF Papers https://huggingface.co/papers/1701.02490
GitHub https://github.com/han-cai/rlb-dp (188 stars)
Algorithm MDP + Dynamic Programming + Neural value function approximation
State (t remaining auctions, b remaining budget, x feature vector)
Action bid price a ∈ [0, b]
Results +22% clicks over linear bidding at tight budgets on iPinYou
Prediction Models CTR ΞΈ(x) + market price distribution m(Ξ΄, x)
Key Insight Foundational; explicitly models the budget-depletion tradeoff via DP. Superseded by dual methods for budget pacing but still influential.

1.5 HiBid β€” Industrial Hierarchical Dual-RL

Property Detail
Paper "HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning"
Authors Yuhang Wang et al. (2023)
arXiv 2312.17503
HF Papers https://huggingface.co/papers/2312.17503
Algorithm High-level RL budget allocation + Low-level Ξ»-parameterized bidding
Scale 64K advertisers, 70M requests/day, 4 channels, deployed at Meituan
Results Outperforms RL-based baselines (R-BCQ, BCQ, CQL) on clicks, CPC, CSR, ROI

1.6 Contextual First-Price Extension (Very Recent!)

Property Detail
Paper "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback"
Authors (2026)
arXiv 2603.07207
Algorithm Dual OGD + quantile-based contextual censored regression
Key Innovation Extends Wang et al. (2023) to handle contextual (feature-based) auctions with a novel quantile trick for censored data
Regret Γ•(√T) in contextual first-price auctions

1.7 Unified View of Lagrangian Dual Multiplier Methods

All dual methods follow the same template:

For each auction t:
1. Observe value v_t (from CTR prediction Γ— click value)
2. Compute bid: b_t = f(v_t, dual_multiplier_t)
3. Observe outcome: payment c_t (if won) or 0 (if lost)
4. Compute gradient: g_t = ρ βˆ’ c_t
5. Update multiplier: Ξ»_{t+1} = Proj_{Ξ»β‰₯0}(Ξ»_t βˆ’ Ξ·Β·g_t)
Method Auction Bid Function f(v, Ξ»)
Wang 2023 First-price argmax_b (rΜƒ(v,b) βˆ’ λ·cΜƒ(b))
Balseiro 2020 Second-price v / (1+Ξ»)
Feng 2022 Second-price ((1+Ξ»_RoS)/(Ξ»_RoS+Ξ»_budget)) Β· v

1.8 Additional Papers (Supplementary)

Paper Key Idea arXiv
Dynamic Budget Throttling Throttle participation rate to control spend 2207.04690
No-Regret Learning in Repeated First-Price Auctions General no-regret framework for first-price 2205.14572
Robust Budget Pacing with a Single Sample Near-optimal regret from 1 sample per distribution 2302.02006
Learning to Bid Optimally in Adversarial First-Price Adversarial (non-i.i.d.) setting 2007.04568
Optimal No-Regret Learning in Repeated FPA Minimax optimal bounds 2003.09795
Multi-Channel Autobidding with Budget and ROI Per-channel optimization optimality 2302.01523
Leveraging the Hints: Adaptive Bidding Uses hints/forecasts for better bidding 2211.06358
Adaptive Bidding under Non-stationarity Handles distribution shift 2505.02796
Joint Value Estimation and Bidding Simultaneous CTR learning + bidding 2502.17292
No-Regret is not enough! Adaptive regret for constrained bandits 2405.06575
AIGB: Generative Auto-bidding Diffusion models for bid trajectory generation 2405.16141

Two-Sided Budget Constraint (Your Specific Need)

You need: maximize clicks s.t. spend ≀ B AND spend β‰₯ kΒ·B.

This requires two dual variables:

  • ΞΌ for the budget cap: ΞΌ_{t+1} = Proj(ΞΌ_t βˆ’ η₁·(ρ βˆ’ spend_t))
  • Ξ½ for the spend floor: Ξ½_{t+1} = Proj(Ξ½_t βˆ’ Ξ·β‚‚Β·(spend_t βˆ’ kρ))

Bid function: b_t = v_t Β· f(ΞΌ_t, Ξ½_t) where f decreases with ΞΌ and increases with Ξ½.


2. CTR Prediction Models

2.1 FinalMLP (RECOMMENDED β€” Best AUC, Fastest Inference)

Property Detail
Paper "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction"
Authors Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong (2023)
Venue AAAI 2023
arXiv 2304.00902
HF Papers https://huggingface.co/papers/2304.00902
Datasets reczoo/Criteo_x1, reczoo/Avazu_x1, reczoo/MovielensLatest_x1, reczoo/Frappe_x1
Criteo AUC 0.8149
Avazu AUC 0.7666
Architecture Two-stream MLP: two independent MLP towers + feature gating (soft selection) + bilinear fusion
Inference Speed Fastest among SOTA (pure MLP, ~400-dim hidden, no attention)
Why Best for RTB Pure feed-forward, <1ms inference, easy to deploy

2.2 GDCN β€” Gated Deep Cross Network

Property Detail
Paper "Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction"
Authors Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu (2023)
Venue CIKM 2023
arXiv 2311.04635
HF Papers https://huggingface.co/papers/2311.04635
Criteo AUC 0.8161 (own split β€” not directly comparable)
Architecture DCNv2 + learned information gate per cross layer + Field-level Dimension Optimization (FDO)
Key Insight Gate filters noisy interactions; FDO compresses embeddings 60%+. Good for memory-constrained RTB.

2.3 DCNv2 β€” Industry Workhorse

Property Detail
Paper "DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems"
Authors Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi (2021)
Venue WWW 2021
arXiv 2008.13535
HF Papers https://huggingface.co/papers/2008.13535
Criteo AUC 0.8142-0.8144 (retuned)
Architecture Embedding β†’ parallel CrossNetV2 + DNN β†’ concat β†’ sigmoid
Key Insight Mixture-of-Experts-style low-rank decomposition. Battle-tested at Google.

2.4 DeepFM β€” Simple, Strong Baseline

Property Detail
Paper "DeepFM: A Factorization-Machine based Neural Network for CTR Prediction"
Authors Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He (2017)
Venue IJCAI 2017
Criteo AUC 0.8138 (retuned)
Architecture Shared embedding β†’ parallel FM (2nd-order) + DNN β†’ sum β†’ sigmoid
Key Insight Shared embedding between FM and DNN is the secret. End-to-end, no pre-training.

2.5 FCN β€” Fusing Cross Network (Most Recent)

Property Detail
Paper "FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction"
Authors (2024)
arXiv 2407.13349
HF Papers https://huggingface.co/papers/2407.13349
Architecture Two explicit cross sub-networks: LCN (linear, order grows linearly) + ECN (exponential, order doubles per layer)
Key Insight No DNN needed β€” all interactions explicit. 50% fewer params, 23% lower latency than DCNv2.
Caveat Newer paper with less community validation. GitHub: https://github.com/salmon1802/FCN

2.6 BARS Meta-Finding

Property Detail
Paper "BARS-CTR: Open Benchmarking for Click-Through Rate Prediction"
Authors Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He (2021)
Venue CIKM 2021
arXiv 2009.05794
HF Papers https://huggingface.co/papers/2009.05794
Key Result After 7,000+ experiments and 12,000 GPU hours: differences between SOTA deep CTR models are surprisingly small (~0.1-0.3% AUC). Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering. All models converge to ~0.814 AUC on Criteo after proper tuning.

2.7 Additional CTR Papers

Paper Key Idea arXiv
DIN (KDD 2018) Attention over user behavior sequence 1706.06978
DIEN (AAAI 2019) Interest evolution with GRU + attention 1809.03672
xDeepFM (KDD 2018) Compressed Interaction Network (CIN) for vector-wise crosses 1803.05170
AutoInt (CIKM 2019) Multi-head self-attention for feature interactions 1810.11921
DLRM (Meta, 2019) Specialized for recommendation: MLP for dense + embedding for sparse 1906.00091
Wide & Deep (Google, 2016) Memorization (wide) + generalization (deep) 1606.07792
FTRL-Proximal (KDD 2013) "Ad Click Prediction: a View from the Trenches" β€” online learning for linear CTR β€”
Streaming CTR (2023) Online CTR with non-stationary data 2307.07509

2.8 Latency Considerations for RTB

Model Architecture Inference Speed RTB-Suitable
FinalMLP Pure MLP ⭐⭐⭐⭐⭐ (<1ms) βœ… Best
DCNv2 CrossNet + DNN ⭐⭐⭐⭐ βœ…
GDCN Gated Cross + DNN ⭐⭐⭐⭐ βœ…
DeepFM FM + DNN ⭐⭐⭐⭐ βœ…
FCN LCN + ECN (no DNN) ⭐⭐⭐⭐ βœ…
DIN Attention (user history) ⭐⭐ ❌ Too slow
DIEN GRU + attention ⭐ ❌ Too slow
AutoInt Multi-head attention ⭐⭐ ❌ Too slow

3. Clearing Price / Market Price Prediction

3.1 Non-Parametric Empirical CDF (RECOMMENDED BASELINE)

Property Detail
Source Wang et al. (2023), Algorithm 1, Section 3.1
arXiv 2304.13477
Method Maintain array of observed competing bids d_s, estimate GΜƒ_t(b) = (1/(t-1))βˆ‘πŸ™{b β‰₯ d_s}
Win Probability P(win|b) = G̃_t(b)
Expected Cost E[cost|win,b] = (1/GΜƒ_t(b)) Β· mean of {d_s : d_s ≀ b}
Pros No model training needed, theoretically sound (Γ•(√T) regret), handles distribution shift naturally
Cons No context/features, cold-start when t is small

3.2 Censored Linear Regression (Wu et al. 2015)

Property Detail
Paper "Predicting Winning Price in Real Time Bidding with Censored Data"
Authors Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen (2015)
Venue KDD 2015
Citations ~101
Method Tobit-like model: log(market_price) = Ξ²Β·x + Ξ΅, Ξ΅ ~ N(0, σ²)
Key Insight Properly handles censoring via likelihood: winning auctions contribute f(price|x), losing auctions contribute S(bid|x)
Pros Contextual, simple, computationally cheap
Cons Linear model β€” limited capacity for complex interactions

3.3 Deep Censored Learning / Survival Analysis

Property Detail
Paper "Deep Censored Learning of the Winning Price" (Zhu et al., WWW 2019)
Method Neural network trained with censored survival loss
Loss Winning: -log f(price|x); Losing: -log S(bid|x)
Library TorchSurv (arXiv:2404.10761, Novartis, 200β˜… GitHub)
TorchSurv URL https://github.com/Novartis/torchsurv
TorchSurv Docs https://opensource.nibr.com/torchsurv/
PyPI pip install torchsurv
Key Insight Proper survival framework handles censoring. Win = exact price observed (uncensored). Loss = only lower bound (censored at bid).
Architecture Deep FC predicting either hazard rate Ξ»(t|x) (Cox PH) or distribution parameters (Weibull/log-normal AFT)
# TorchSurv pattern for market price:
from torchsurv.loss import cox
log_hazard = model(features)  # shape (batch,)
# event=1 if won, 0 if lost (censored)
# time = market_price if won, bid if lost
loss = cox.neg_partial_log_likelihood(log_hazard, event, time)

3.4 Win Probability Neural Network (Simplest ML)

Property Detail
Method Direct binary classification: P(win|bid_price, features)
Pros Dead simple, works with standard BCELoss
Cons Ignores censored price info when you win β€” only uses binary win/loss signal
Input features + bid_price β†’ sigmoid

3.5 Parametric Distribution Fitting

Property Detail
Paper Referenced in RLB (Cai et al. 2017) β€” "Functional Bid Landscape Forecasting" (ECML-PKDD 2016)
Method Fit log-normal or gamma distribution to observed winning prices; predict parameters from features using GBDT
Pros Parametric assumptions reduce variance
Cons Distribution assumption may not hold; doesn't properly handle censoring

3.6 Contextual Quantile-Based (2026)

Property Detail
Paper "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback"
arXiv 2603.07207
Method Models competing bid as d_t = Ξ±Β·x_t + z_t (linear contextual); quantile-based estimator for Ξ±
Key Trick Splits samples by bid quantile and exploits identifiable conditional quantiles to circumvent full censoring
Pros Theoretical guarantees in contextual setting
Cons Linear contextual model only; very recent

3.7 Comparison Summary

Method Contextual? Handles Censoring? Model Training? Complexity
Empirical CDF ❌ N/A (full info) None Minimal
Censored Linear Reg βœ… βœ… (proper likelihood) Linear model Low
Deep Survival (TorchSurv) βœ… βœ… (proper likelihood) Neural net Medium
Win Prob Classifier βœ… ❌ (binary only) Neural net Low
Parametric (log-normal) Optional ❌ GBDT Medium
Quantile Censored βœ… βœ… (quantile trick) Linear Medium-High

4. Datasets

4.1 CTR Prediction Datasets

Dataset HF Hub Path Size Fields Label Verified
Criteo_x4 reczoo/Criteo_x4 45.8M rows, 5.6GB 13 dense (I1-I13) + 26 categorical (C1-C26) Label (0/1) βœ…
Avazu_x4 reczoo/Avazu_x4 40.4M rows, 1.8GB 22 fields (mixed) click (0/1) βœ…
Criteo_x1 reczoo/Criteo_x1 ~11M rows Same as x4 Label βœ…
Avazu_x1 reczoo/Avazu_x1 ~10M rows Same as x4 click βœ…

Standard split: 80% train / 10% val / 10% test (BARS protocol).

4.2 RTB Bidding Datasets

Dataset Source Size Format Availability
iPinYou data.computational-advertising.org 19.5M impressions, 9 campaigns, 10 days (2013) Bid logs with market price External download only (NOT on HF Hub)
YOYI Various academic mirrors ~400M records Bid logs External download only

iPinYou format: (click, paying_price, bid_price, slot_id, user_tags, ...) β€” already includes market price info needed for bidding simulation.

Key Gap: No RTB bid-log datasets on HuggingFace Hub. Criteo/Avazu have click labels but no bid/price columns β€” they can only be used for CTR training and require synthetic price generation for bidding evaluation.

4.3 Data Requirements for Each Algorithm

Algorithm Needs from Dataset
Dual OGD (Wang) click labels (CTR training) + competing bids (or synthetic prices for simulation)
Dual Mirror Descent (Balseiro) click labels + auction payment (second-price)
RLB (Cai) click labels + market prices + impression features
CTR models (all) click labels + features (Criteo/Avazu: βœ…)
Clearing price models observed prices (won auctions) + bids (lost auctions)

5. Codebases & Implementations

5.1 CTR Model Libraries

Library URL Models Framework Notes
FuxiCTR https://github.com/reczoo/FuxiCTR 40+ (FinalMLP, DeepFM, DCNv2, GDCN, FCN, xDeepFM, AutoInt) PyTorch Config-driven (YAML). Used by all SOTA benchmark papers.
DeepCTR-Torch https://github.com/shenweichen/DeepCTR-Torch 20+ (DeepFM, DCN, DIN, DIEN, xDeepFM) PyTorch Simpler API (Python class). Good for quick prototyping.
TorchSurv https://github.com/Novartis/torchsurv Cox PH, Weibull AFT, DeepSurv, DeepHit PyTorch Deep survival analysis for clearing price.
BARS https://github.com/openbenchmark/BARS Benchmarking β€” Standardized evaluation pipeline. 389β˜…

5.2 Bidding Algorithm Implementations

Repo URL Algorithms Notes
rlb-dp https://github.com/han-cai/rlb-dp RLB (MDP + DP) 188 stars. Original implementation of RL for RTB.
budget_constrained_bidding https://github.com/dingmu365/budget_constrained_bidding Budget-constrained RTB Contains multiple budget-constrained bidding algorithms.
budget_constrained_bidding (fork) https://github.com/GinNie23/budget_constrained_bidding Same Fork with modifications.
Budget_Constrained_Bidding https://github.com/venkatacrc/Budget_Constrained_Bidding Same Another implementation.
hamverbot/rtb-bidding-comparison https://huggingface.co/hamverbot/rtb-bidding-comparison DualOGD, Linear, ORTB, Threshold, MPC Your repo β€” already has a working comparison framework!

5.3 FuxiCTR Quick Start

pip install fuxictr
# config/criteo_finalmlp.yaml
dataset_id: Criteo_x4
model: FinalMLP
embedding_dim: 10
hidden_units: [400, 400, 400]
batch_size: 4096
learning_rate: 1e-3
epochs: 10
metrics: [auc, logloss]
from fuxictr import autotuner
autotuner.run("config/criteo_finalmlp.yaml", "Criteo_x4", "FinalMLP")

5.4 DeepCTR-Torch Quick Start

pip install deepctr-torch
from deepctr_torch.models import DeepFM
from deepctr_torch.inputs import SparseFeat, DenseFeat

sparse_features = [SparseFeat(f, vocab_size=df[f].nunique(), embedding_dim=10) 
                   for f in categorical_cols]
dense_features = [DenseFeat(f, 1) for f in numerical_cols]

model = DeepFM(linear_feature_columns=sparse_features + dense_features,
               dnn_feature_columns=sparse_features + dense_features,
               dnn_hidden_units=(400, 400, 400), device='cuda')
model.compile('adam', 'binary_crossentropy', metrics=['auc'])
model.fit(train_input, train_labels, batch_size=4096, epochs=10)

6. Benchmark Leaderboards

Leaderboard URL Description
BARS CTR Criteo_x4 https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x4.html Definite CTR benchmark β€” 24 models compared
BARS CTR Criteo_x1 https://openbenchmark.github.io/BARS/CTR/leaderboard/criteo_x1.html Smaller Criteo subset
BARS CTR Avazu https://openbenchmark.github.io/BARS/CTR/leaderboard/avazu_x4.html Avazu benchmark
BARS Main https://openbenchmark.github.io/BARS Full recommender systems benchmark

Top Criteo_x4 AUC scores (from BARS):

  • FinalMLP: 0.8149
  • DCNv2: 0.8142
  • DeepFM: 0.8138
  • xDeepFM: 0.8136
  • AutoInt+: 0.8134

Key takeaway: Top 5 models are within 0.15% AUC of each other.


7. Recommended Architecture

For Your Problem: "Lagrangian Dual Multiplier with Online Error Gradient Descent"

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   BIDDING ALGORITHM                          β”‚
β”‚                                                              β”‚
β”‚  Dual OGD (Wang et al. 2023)                                 β”‚
β”‚  Ξ»_{t+1} = Proj(Ξ»_t - Ρ·(ρ - cΜƒ_t(b_t)))                    β”‚
│  b_t = argmax_b (r̃_t(v_t, b) - λ_t·c̃_t(b))                 │
β”‚                                                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                 PREDICTION MODELS                            β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚  CTR Predictor   β”‚    β”‚  Clearing Price Est. β”‚            β”‚
β”‚  β”‚  (FinalMLP)      β”‚    β”‚  (Empirical CDF       β”‚            β”‚
β”‚  β”‚                   β”‚    β”‚   OR TorchSurv)      β”‚            β”‚
│  │  v_t = pCTR × V  │    │  G̃(b) = P(win | b)   │            │
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                     DATASETS                                 β”‚
β”‚                                                              β”‚
β”‚  Criteo_x4 (CTR training) + iPinYou (bidding simulation)     β”‚
β”‚  OR: Criteo_x4 + synthetic price generation                  β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Priority

  1. Phase 1: Improve CTR model β€” replace current LogisticRegression with FinalMLP trained on Criteo_x4 (via FuxiCTR)
  2. Phase 2: Improve clearing price β€” add TorchSurv-based censored regression alongside current empirical CDF
  3. Phase 3: Add Balseiro dual mirror descent for comparison (simpler baseline, no market price model)
  4. Phase 4: Add two-sided budget constraint (cap + floor) with dual dual variables
  5. Phase 5: Full sweep over hyperparameters: step size Ξ΅, budget fraction k%, value per click, CTR model architecture

Online Learning Note

For production RTB where the environment is non-stationary, implement periodic retraining:

  • Save model checkpoint every N hours
  • Reload and train on sliding window of most recent data
  • Deploy updated model without restarting the bidding algorithm

The Lagrangian multiplier Ξ» is intrinsically online (updated per auction). The CTR model needs separate periodic retraining.


Paper Index (All Papers Referenced)

# Paper arXiv Venue Year Citations
1 Wang et al. β€” Learning to Bid in Repeated First-Price Auctions with Budgets 2304.13477 NeurIPS 2023 Growing
2 Balseiro et al. β€” Dual Mirror Descent for Online Allocation 2011.10124 Ops Research 2020 135+
3 Feng et al. β€” Online Bidding for RoS Constrained Advertisers 2208.13713 ICML 2022 38+
4 Cai et al. β€” RTB by RL in Display Advertising 1701.02490 WSDM 2017 300+
5 Wang et al. β€” HiBid Hierarchical DRL Bidding 2312.17503 β€” 2023 New
6 β€” Online Bidding for Contextual First-Price (Quantile) 2603.07207 β€” 2026 New
7 Mao et al. β€” FinalMLP 2304.00902 AAAI 2023 Growing
8 Wang et al. β€” GDCN 2311.04635 CIKM 2023 Growing
9 Wang et al. β€” DCN V2 2008.13535 WWW 2021 500+
10 Guo et al. β€” DeepFM β€” IJCAI 2017 3000+
11 β€” FCN: Fusing Cross Network 2407.13349 β€” 2024 New
12 Zhu et al. β€” BARS-CTR Benchmark 2009.05794 CIKM 2021 100+
13 Wu et al. β€” Predicting Winning Price with Censored Data β€” KDD 2015 101
14 β€” Deep Censored Learning of Winning Price β€” WWW 2019 Well-cited
15 Katzman et al. β€” DeepSurv β€” BMC 2018 1000+
16 β€” TorchSurv 2404.10761 β€” 2024 New
17 β€” Robust Budget Pacing with a Single Sample 2302.02006 β€” 2023 Growing
18 β€” Multi-Channel Autobidding with Budget and ROI 2302.01523 β€” 2023 Growing
19 β€” No-Regret in Repeated FPA with Budgets 2205.14572 β€” 2022 14
20 β€” Dynamic Budget Throttling 2207.04690 β€” 2022 6
21 β€” AIGB: Generative Auto-bidding 2405.16141 β€” 2024 New
22 β€” Adaptive Bidding under Non-Stationarity 2505.02796 β€” 2025 2
23 β€” Joint Value Estimation and Bidding 2502.17292 β€” 2025 4
24 β€” Leveraging Hints: Adaptive Bidding 2211.06358 β€” 2022 13
25 Zhou et al. β€” DIN 1706.06978 KDD 2018 2000+
26 Zhou et al. β€” DIEN 1809.03672 AAAI 2019 1000+
27 Lian et al. β€” xDeepFM 1803.05170 KDD 2018 1000+
28 Song et al. β€” AutoInt 1810.11921 CIKM 2019 500+
29 Naumov et al. β€” DLRM (Meta) 1906.00091 β€” 2019 500+
30 Cheng et al. β€” Wide & Deep 1606.07792 RecSys 2016 4000+
31 McMahan et al. β€” Ad Click Prediction (FTRL) β€” KDD 2013 2000+
32 Zhang et al. β€” Optimal RTB for Display Advertising β€” KDD 2014 500+