bidding_algorithms_benchmark / RESEARCH_RESOURCES.md
hamverbot's picture
Upload RESEARCH_RESOURCES.md
a40d07a verified

Bidding Algorithms Benchmark β€” Research Resources

First-Price Auction Focus | Generated: 2026-05-05 Repo: https://huggingface.co/hamverbot/bidding_algorithms_benchmark


1. Bidding Algorithms for First-Price Auctions

1.1 DualOGD β€” Lagrangian Dual + Online Gradient Descent ⭐ PRIMARY

Property Detail
Paper "Learning to Bid in Repeated First-Price Auctions with Budgets"
Authors Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong
Venue 2023
arXiv 2304.13477
HF Papers https://huggingface.co/papers/2304.13477
Algorithm DualOGD β€” Lagrangian dual multiplier updated by online error gradient descent
Auction Type First-price
Constraints Budget cap: total spend ≀ ρT
Regret Bound Γ•(√T) for both full-information and one-sided feedback
Key Formula Ξ»_{t+1} = Proj_{Ξ»>0}(Ξ»_t βˆ’ Ρ·(ρ βˆ’ cΜƒ_t(b_t)))
Bid Rule b_t = argmax_b (rΜƒ_t(v_t, b) βˆ’ Ξ»_tΒ·cΜƒ_t(b))
Prediction Models CTR predictor (v_t) + empirical CDF of competing bids (G̃_t)
Code Pattern (Wang 2023, Algorithm 1)
# Initialization
λ = 0.0; Ρ = 1.0 / sqrt(T); ρ = B / T

for t in range(T):
    v_t = pCTR(features_t) * click_value  # from CTR model
    
    # Estimate reward and cost from historical competing bids
    r_tilde = lambda b: mean([(v_t - b) for d in d_history if b >= d])
    c_tilde = lambda b: mean([b for d in d_history if b >= d])
    
    # Bid: maximize cost-adjusted reward
    b_t = argmax_b (r_tilde(v_t, b) - Ξ» * c_tilde(b))
    
    # Observe maximum competing bid d_t (full feedback)
    won = (b_t >= d_t); cost = b_t if won else 0
    
    # Online gradient descent on dual multiplier
    λ = max(0, λ - Ρ * (ρ - c_tilde(b_t)))

1.2 TwoSidedDual β€” Budget Cap + Spend Floor

Property Detail
Base Extension of Wang et al. (2023)
Constraints Total spend ≀ B (cap) AND spend β‰₯ kΒ·B (floor)
Dual Variables ΞΌ for cap, Ξ½ for floor
Updates
ΞΌ_{t+1} = Proj_{ΞΌβ‰₯0}(ΞΌ_t βˆ’ η₁·(ρ βˆ’ cΜƒ_t(b_t))) cap penalty
Ξ½_{t+1} = Proj_{Ξ½β‰₯0}(Ξ½_t βˆ’ Ξ·β‚‚Β·(cΜƒ_t(b_t) βˆ’ kρ)) floor incentive
Bid Rule b_t = argmax_b (rΜƒ_t(v_t, b) βˆ’ (ΞΌ_t βˆ’ Ξ½_t)Β·cΜƒ_t(b))
Key Insight When ΞΌ > Ξ½: bidding is restrained (ahead on spend). When Ξ½ > ΞΌ: bidding is encouraged (behind on spend floor).

1.3 Adversarial Bidding β€” Non-Stationary Environments

Property Detail
Paper "Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity"
arXiv 2505.02796
Algorithm Adaptive dual OGD with change-point detection
Key Insight When distribution shifts, resets dual multiplier and restarts learning

1.4 Contextual First-Price (2026)

Property Detail
Paper "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback"
arXiv 2603.07207
Algorithm Dual OGD + quantile-based contextual censored regression
Key Innovation Extends Wang to contextual (feature-based) auctions

1.5 Joint Value Estimation + Bidding

Property Detail
Paper "Joint Value Estimation and Bidding in Repeated First-Price Auctions"
arXiv 2502.17292
Key Insight Simultaneously learn CTR and bidding strategy β€” no separate CTR model training phase

1.6 RLB β€” Reinforcement Learning (Baseline)

Property Detail
Paper "Real-Time Bidding by Reinforcement Learning in Display Advertising"
Authors Han Cai et al., WSDM 2017
arXiv 1701.02490
GitHub https://github.com/han-cai/rlb-dp (188 stars)
Algorithm MDP + Dynamic Programming + Neural value function
State (remaining auctions, remaining budget, features)
Action bid price a ∈ [0, budget]
Prediction Models CTR ΞΈ(x) + market price distribution m(Ξ΄, x)

1.7 Static Baselines

Algorithm Bid Rule Notes
Linear b = base_bid Γ— (pCTR / avg_pCTR) Simple proportional
Threshold b = fixed_bid if pCTR > Ο„ else 0 Binary decision
ValueShading b = v_t / (1 + Ξ») From second-price literature, adapted

Algorithm Comparison Matrix

Algorithm Adaptive? Market Price Model? Two-Sided? Provable Regret? Complexity
DualOGD βœ… Online Empirical CDF ❌ (cap only) βœ… Γ•(√T) Medium
TwoSidedDual βœ… Online Empirical CDF βœ… (cap+floor) ❌ (heuristic) Medium
RLB βœ… DP Neural dist. model ❌ ❌ High
Linear ❌ None ❌ ❌ Minimal
Threshold ❌ None ❌ ❌ Minimal

2. CTR Prediction Models

2.1 FinalMLP ⭐ RECOMMENDED

Property Detail
Paper "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction"
Authors Kelong Mao et al., AAAI 2023
arXiv 2304.00902
Criteo AUC 0.8149
Architecture Two independent MLP towers + feature gating (soft selection) + bilinear fusion
Why Best for RTB Pure feed-forward MLP β€” <1ms inference, no attention/RNN overhead
Library FuxiCTR (pip install fuxictr) or DeepCTR-Torch
# FinalMLP architecture
# Stream 1: MLP(features * gate_weights) β†’ feature selection
# Stream 2: MLP(features * (1-gate_weights)) β†’ complementary view
# Fusion: Bilinear(stream1_output, stream2_output) β†’ sigmoid

2.2 DeepFM β€” Simple Baseline

Property Detail
Paper "DeepFM: A Factorization-Machine based Neural Network"
Criteo AUC 0.8138
Architecture Shared embedding β†’ FM (2nd-order) + DNN β†’ sum β†’ sigmoid

2.3 DCNv2 β€” Industry Standard

Property Detail
Paper "DCN V2: Improved Deep & Cross Network" (WWW 2021)
arXiv 2008.13535
Criteo AUC 0.8142-0.8144
Architecture CrossNetV2 (low-rank) + DNN in parallel

2.4 BARS Meta-Finding ⚠️ IMPORTANT

Property Detail
Paper "BARS-CTR: Open Benchmarking" [2009.05794]
Finding After 7,000+ experiments: differences between SOTA CTR models are ≀0.1-0.3% AUC
Implication Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering

CTR Model Comparison for RTB

Model AUC (Criteo) Inference Speed RTB Latency OK?
FinalMLP 0.8149 ⭐⭐⭐⭐⭐ βœ… Yes
DCNv2 0.8142 ⭐⭐⭐⭐ βœ… Yes
DeepFM 0.8138 ⭐⭐⭐⭐ βœ… Yes
GDCN 0.8161* ⭐⭐⭐⭐ βœ… Yes
DIN β€” ⭐⭐ ❌ No
DIEN β€” ⭐ ❌ No

*Own data split, not directly comparable.


3. Clearing Price / Win Probability Prediction

3.1 Empirical CDF (Non-Parametric) ⭐ BASELINE

Property Detail
Source Wang et al. (2023), Algorithm 1, Section 3.1
Method Maintain array of observed competing bids d_s
Win Probability P(win
Expected Cost E[cost
Expected Reward r̃_t(v,b) = (v - b) · G̃_t(b)
Expected Cost (dual) c̃_t(b) = b · G̃_t(b)
Pros No training, theoretically sound, adapts online
Cons No context, cold-start issue, requires full feedback
class EmpiricalCDF:
    def __init__(self):
        self.competing_bids = []
    
    def update(self, d_t):
        """d_t = maximum competing bid (observed under full feedback)"""
        self.competing_bids.append(d_t)
    
    def win_prob(self, b):
        if not self.competing_bids:
            return 0.5
        return np.mean([1.0 if b >= d else 0.0 for d in self.competing_bids])
    
    def expected_cost(self, b):
        wins = [d for d in self.competing_bids if b >= d]
        if not wins:
            return b
        return np.mean(wins)

3.2 TorchSurv β€” Deep Censored Learning

Property Detail
Library TorchSurv (Novartis, 200β˜…)
Paper 2404.10761
GitHub https://github.com/Novartis/torchsurv
Install pip install torchsurv
Method Neural network with Cox PH or Weibull AFT loss
Censoring Win = uncensored (exact price), Loss = right-censored (price > bid)
Output Survival function S(b
Win Prob P(win
from torchsurv.loss import cox

# log_hazard = model(features)  # shape (batch,)
# event = 1 if won (uncensored), 0 if lost (censored)
# time = market_price if won, bid if lost
loss = cox.neg_partial_log_likelihood(log_hazard, event, time)

3.3 Win Probability NN (Simplest ML)

Property Detail
Method Binary classifier: P(win
Pros Dead simple, BCELoss
Cons Ignores price magnitude info when winning
Architecture features βŠ• bid_price β†’ MLP β†’ sigmoid

4. Datasets

Verified on HuggingFace Hub

Dataset HF Path Rows Features Label Status
Criteo_x4 reczoo/Criteo_x4 45.8M 13 dense + 26 cat Label βœ… Ready
Avazu_x4 reczoo/Avazu_x4 40.4M 22 fields click βœ… Ready

RTB-Specific (External Only)

Dataset Description Availability
iPinYou 19.5M impressions, 9 campaigns, market prices included data.computational-advertising.org
YOYI ~400M bid log records Various mirrors

Key Gap: No first-price auction bid logs on HF Hub. Criteo/Avazu have click labels but no bid/market price columns. We use synthetic market price generation conditioned on Criteo features for evaluation.


5. Codebases & Libraries

Library URL Use
FuxiCTR github.com/reczoo/FuxiCTR 40+ CTR models, config-driven
DeepCTR-Torch github.com/shenweichen/DeepCTR-Torch 20+ CTR models
TorchSurv github.com/Novartis/torchsurv Survival analysis for clearing price
BARS github.com/openbenchmark/BARS Standardized CTR benchmark
rlb-dp github.com/han-cai/rlb-dp RL for RTB (baseline reference)

6. Recommended Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FIRST-PRICE BIDDING ENGINE               β”‚
β”‚                                                       β”‚
β”‚  Dual OGD: Ξ»_{t+1} = max(0, Ξ»_t - Ρ·(ρ - cΜƒ_t(b_t))) β”‚
β”‚  Two-Sided: ΞΌ (cap) + Ξ½ (floor) dual variables       β”‚
β”‚                                                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                PREDICTION MODELS                       β”‚
β”‚                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ CTR: FinalMLP   β”‚  β”‚ Win Prob: Empirical CDF   β”‚   β”‚
β”‚  β”‚ v_t = pCTR Γ— V  β”‚  β”‚ GΜƒ_t(b) = frac of d_s ≀ b β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Optional: TorchSurv for contextual win prob  β”‚    β”‚
β”‚  β”‚ P(win|b,x) = 1 - S(b|x)                      β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   DATASETS                            β”‚
β”‚  Criteo_x4 (CTR training) + synthetic market prices  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Paper Index

# Paper arXiv Focus
1 Wang et al. β€” Learning to Bid in Repeated FPA with Budgets 2304.13477 ⭐ Primary algorithm
2 β€” Adaptive Bidding under Non-Stationarity 2505.02796 Distribution shift
3 β€” Contextual First-Price (Quantile) 2603.07207 Contextual extension
4 β€” Joint Value Estimation and Bidding 2502.17292 Simultaneous CTR + bidding
5 β€” No-Regret in Repeated FPA with Budgets 2205.14572 General framework
6 Cai et al. β€” RLB 1701.02490 RL baseline
7 β€” Leveraging Hints: Adaptive Bidding 2211.06358 Hints/forecasts
8 Mao et al. β€” FinalMLP 2304.00902 CTR model
9 Wang et al. β€” DCN V2 2008.13535 CTR model
10 Guo et al. β€” DeepFM β€” CTR model
11 Zhu et al. β€” BARS-CTR 2009.05794 CTR benchmark
12 Wu et al. β€” Censored Price Prediction β€” Clearing price
13 β€” TorchSurv 2404.10761 Survival analysis library