bidding_algorithms_benchmark / RESEARCH_RESOURCES.md

Upload RESEARCH_RESOURCES.md

a40d07a verified 3 days ago

14.4 kB

	# Bidding Algorithms Benchmark — Research Resources

	> First-Price Auction Focus \| Generated: 2026-05-05
	> Repo: https://huggingface.co/hamverbot/bidding_algorithms_benchmark

	---

	## 1. Bidding Algorithms for First-Price Auctions

	### 1.1 DualOGD — Lagrangian Dual + Online Gradient Descent ⭐ PRIMARY

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Learning to Bid in Repeated First-Price Auctions with Budgets" \|
	\| Authors \| Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong \|
	\| Venue \| 2023 \|
	\| arXiv \| [2304.13477](https://arxiv.org/abs/2304.13477) \|
	\| HF Papers \| https://huggingface.co/papers/2304.13477 \|
	\| Algorithm \| DualOGD — Lagrangian dual multiplier updated by online error gradient descent \|
	\| Auction Type \| First-price \|
	\| Constraints \| Budget cap: total spend ≤ ρT \|
	\| Regret Bound \| Õ(√T) for both full-information and one-sided feedback \|
	\| Key Formula \| λ_{t+1} = Proj_{λ>0}(λ_t − ε·(ρ − c̃_t(b_t))) \|
	\| Bid Rule \| b_t = argmax_b (r̃_t(v_t, b) − λ_t·c̃_t(b)) \|
	\| Prediction Models \| CTR predictor (v_t) + empirical CDF of competing bids (G̃_t) \|
	\| Code Pattern (Wang 2023, Algorithm 1) \|
	```python
	# Initialization
	λ = 0.0; ε = 1.0 / sqrt(T); ρ = B / T

	for t in range(T):
	v_t = pCTR(features_t) * click_value # from CTR model

	# Estimate reward and cost from historical competing bids
	r_tilde = lambda b: mean([(v_t - b) for d in d_history if b >= d])
	c_tilde = lambda b: mean([b for d in d_history if b >= d])

	# Bid: maximize cost-adjusted reward
	b_t = argmax_b (r_tilde(v_t, b) - λ * c_tilde(b))

	# Observe maximum competing bid d_t (full feedback)
	won = (b_t >= d_t); cost = b_t if won else 0

	# Online gradient descent on dual multiplier
	λ = max(0, λ - ε * (ρ - c_tilde(b_t)))
	```

	### 1.2 TwoSidedDual — Budget Cap + Spend Floor

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Base \| Extension of Wang et al. (2023) \|
	\| Constraints \| Total spend ≤ B (cap) AND spend ≥ k·B (floor) \|
	\| Dual Variables \| μ for cap, ν for floor \|
	\| Updates \|
	\| μ_{t+1} = Proj_{μ≥0}(μ_t − η₁·(ρ − c̃_t(b_t))) \| cap penalty \|
	\| ν_{t+1} = Proj_{ν≥0}(ν_t − η₂·(c̃_t(b_t) − kρ)) \| floor incentive \|
	\| Bid Rule \| b_t = argmax_b (r̃_t(v_t, b) − (μ_t − ν_t)·c̃_t(b)) \|
	\| Key Insight \| When μ > ν: bidding is restrained (ahead on spend). When ν > μ: bidding is encouraged (behind on spend floor). \|

	### 1.3 Adversarial Bidding — Non-Stationary Environments

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity" \|
	\| arXiv \| [2505.02796](https://arxiv.org/abs/2505.02796) \|
	\| Algorithm \| Adaptive dual OGD with change-point detection \|
	\| Key Insight \| When distribution shifts, resets dual multiplier and restarts learning \|

	### 1.4 Contextual First-Price (2026)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" \|
	\| arXiv \| [2603.07207](https://arxiv.org/abs/2603.07207) \|
	\| Algorithm \| Dual OGD + quantile-based contextual censored regression \|
	\| Key Innovation \| Extends Wang to contextual (feature-based) auctions \|

	### 1.5 Joint Value Estimation + Bidding

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Joint Value Estimation and Bidding in Repeated First-Price Auctions" \|
	\| arXiv \| [2502.17292](https://arxiv.org/abs/2502.17292) \|
	\| Key Insight \| Simultaneously learn CTR and bidding strategy — no separate CTR model training phase \|

	### 1.6 RLB — Reinforcement Learning (Baseline)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "Real-Time Bidding by Reinforcement Learning in Display Advertising" \|
	\| Authors \| Han Cai et al., WSDM 2017 \|
	\| arXiv \| [1701.02490](https://arxiv.org/abs/1701.02490) \|
	\| GitHub \| https://github.com/han-cai/rlb-dp (188 stars) \|
	\| Algorithm \| MDP + Dynamic Programming + Neural value function \|
	\| State \| (remaining auctions, remaining budget, features) \|
	\| Action \| bid price a ∈ [0, budget] \|
	\| Prediction Models \| CTR θ(x) + market price distribution m(δ, x) \|

	### 1.7 Static Baselines

	\| Algorithm \| Bid Rule \| Notes \|
	\|-----------\|----------\|-------\|
	\| Linear \| b = base_bid × (pCTR / avg_pCTR) \| Simple proportional \|
	\| Threshold \| b = fixed_bid if pCTR > τ else 0 \| Binary decision \|
	\| ValueShading \| b = v_t / (1 + λ) \| From second-price literature, adapted \|

	### Algorithm Comparison Matrix

	\| Algorithm \| Adaptive? \| Market Price Model? \| Two-Sided? \| Provable Regret? \| Complexity \|
	\|-----------\|-----------\|-------------------\|------------\|-----------------\|------------\|
	\| DualOGD \| ✅ Online \| Empirical CDF \| ❌ (cap only) \| ✅ Õ(√T) \| Medium \|
	\| TwoSidedDual \| ✅ Online \| Empirical CDF \| ✅ (cap+floor) \| ❌ (heuristic) \| Medium \|
	\| RLB \| ✅ DP \| Neural dist. model \| ❌ \| ❌ \| High \|
	\| Linear \| ❌ \| None \| ❌ \| ❌ \| Minimal \|
	\| Threshold \| ❌ \| None \| ❌ \| ❌ \| Minimal \|

	---

	## 2. CTR Prediction Models

	### 2.1 FinalMLP ⭐ RECOMMENDED

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction" \|
	\| Authors \| Kelong Mao et al., AAAI 2023 \|
	\| arXiv \| [2304.00902](https://arxiv.org/abs/2304.00902) \|
	\| Criteo AUC \| 0.8149 \|
	\| Architecture \| Two independent MLP towers + feature gating (soft selection) + bilinear fusion \|
	\| Why Best for RTB \| Pure feed-forward MLP — <1ms inference, no attention/RNN overhead \|
	\| Library \| FuxiCTR (`pip install fuxictr`) or DeepCTR-Torch \|

	```python
	# FinalMLP architecture
	# Stream 1: MLP(features * gate_weights) → feature selection
	# Stream 2: MLP(features * (1-gate_weights)) → complementary view
	# Fusion: Bilinear(stream1_output, stream2_output) → sigmoid
	```

	### 2.2 DeepFM — Simple Baseline

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "DeepFM: A Factorization-Machine based Neural Network" \|
	\| Criteo AUC \| 0.8138 \|
	\| Architecture \| Shared embedding → FM (2nd-order) + DNN → sum → sigmoid \|

	### 2.3 DCNv2 — Industry Standard

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "DCN V2: Improved Deep & Cross Network" (WWW 2021) \|
	\| arXiv \| [2008.13535](https://arxiv.org/abs/2008.13535) \|
	\| Criteo AUC \| 0.8142-0.8144 \|
	\| Architecture \| CrossNetV2 (low-rank) + DNN in parallel \|

	### 2.4 BARS Meta-Finding ⚠️ IMPORTANT

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Paper \| "BARS-CTR: Open Benchmarking" [2009.05794] \|
	\| Finding \| After 7,000+ experiments: differences between SOTA CTR models are ≤0.1-0.3% AUC \|
	\| Implication \| Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering \|

	### CTR Model Comparison for RTB

	\| Model \| AUC (Criteo) \| Inference Speed \| RTB Latency OK? \|
	\|-------\|-------------\|-----------------\|-----------------\|
	\| FinalMLP \| 0.8149 \| ⭐⭐⭐⭐⭐ \| ✅ Yes \|
	\| DCNv2 \| 0.8142 \| ⭐⭐⭐⭐ \| ✅ Yes \|
	\| DeepFM \| 0.8138 \| ⭐⭐⭐⭐ \| ✅ Yes \|
	\| GDCN \| 0.8161* \| ⭐⭐⭐⭐ \| ✅ Yes \|
	\| DIN \| — \| ⭐⭐ \| ❌ No \|
	\| DIEN \| — \| ⭐ \| ❌ No \|

	*Own data split, not directly comparable.

	---

	## 3. Clearing Price / Win Probability Prediction

	### 3.1 Empirical CDF (Non-Parametric) ⭐ BASELINE

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Source \| Wang et al. (2023), Algorithm 1, Section 3.1 \|
	\| Method \| Maintain array of observed competing bids d_s \|
	\| Win Probability \| P(win\|b) = G̃_t(b) = (1/(t-1))∑_{s=1}^{t-1} 𝟙{b ≥ d_s} \|
	\| Expected Cost \| E[cost\|win,b] = (1/G̃_t(b)) · mean({d_s : d_s ≤ b}) \|
	\| Expected Reward \| r̃_t(v,b) = (v - b) · G̃_t(b) \|
	\| Expected Cost (dual) \| c̃_t(b) = b · G̃_t(b) \|
	\| Pros \| No training, theoretically sound, adapts online \|
	\| Cons \| No context, cold-start issue, requires full feedback \|

	```python
	class EmpiricalCDF:
	def __init__(self):
	self.competing_bids = []

	def update(self, d_t):
	"""d_t = maximum competing bid (observed under full feedback)"""
	self.competing_bids.append(d_t)

	def win_prob(self, b):
	if not self.competing_bids:
	return 0.5
	return np.mean([1.0 if b >= d else 0.0 for d in self.competing_bids])

	def expected_cost(self, b):
	wins = [d for d in self.competing_bids if b >= d]
	if not wins:
	return b
	return np.mean(wins)
	```

	### 3.2 TorchSurv — Deep Censored Learning

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Library \| TorchSurv (Novartis, 200★) \|
	\| Paper \| [2404.10761](https://arxiv.org/abs/2404.10761) \|
	\| GitHub \| https://github.com/Novartis/torchsurv \|
	\| Install \| `pip install torchsurv` \|
	\| Method \| Neural network with Cox PH or Weibull AFT loss \|
	\| Censoring \| Win = uncensored (exact price), Loss = right-censored (price > bid) \|
	\| Output \| Survival function S(b\|x) = P(market_price > b \| features) \|
	\| Win Prob \| P(win\|b,x) = 1 − S(b\|x) \|

	```python
	from torchsurv.loss import cox

	# log_hazard = model(features) # shape (batch,)
	# event = 1 if won (uncensored), 0 if lost (censored)
	# time = market_price if won, bid if lost
	loss = cox.neg_partial_log_likelihood(log_hazard, event, time)
	```

	### 3.3 Win Probability NN (Simplest ML)

	\| Property \| Detail \|
	\|----------\|--------\|
	\| Method \| Binary classifier: P(win \| bid_price, features) \|
	\| Pros \| Dead simple, BCELoss \|
	\| Cons \| Ignores price magnitude info when winning \|
	\| Architecture \| features ⊕ bid_price → MLP → sigmoid \|

	---

	## 4. Datasets

	### Verified on HuggingFace Hub

	\| Dataset \| HF Path \| Rows \| Features \| Label \| Status \|
	\|---------\|---------\|------\|----------\|-------\|--------\|
	\| Criteo_x4 \| `reczoo/Criteo_x4` \| 45.8M \| 13 dense + 26 cat \| `Label` \| ✅ Ready \|
	\| Avazu_x4 \| `reczoo/Avazu_x4` \| 40.4M \| 22 fields \| `click` \| ✅ Ready \|

	### RTB-Specific (External Only)

	\| Dataset \| Description \| Availability \|
	\|---------\|-------------\|-------------\|
	\| iPinYou \| 19.5M impressions, 9 campaigns, market prices included \| data.computational-advertising.org \|
	\| YOYI \| ~400M bid log records \| Various mirrors \|

	Key Gap: No first-price auction bid logs on HF Hub. Criteo/Avazu have click labels but no bid/market price columns. We use synthetic market price generation conditioned on Criteo features for evaluation.

	---

	## 5. Codebases & Libraries

	\| Library \| URL \| Use \|
	\|---------\|-----\|-----\|
	\| FuxiCTR \| github.com/reczoo/FuxiCTR \| 40+ CTR models, config-driven \|
	\| DeepCTR-Torch \| github.com/shenweichen/DeepCTR-Torch \| 20+ CTR models \|
	\| TorchSurv \| github.com/Novartis/torchsurv \| Survival analysis for clearing price \|
	\| BARS \| github.com/openbenchmark/BARS \| Standardized CTR benchmark \|
	\| rlb-dp \| github.com/han-cai/rlb-dp \| RL for RTB (baseline reference) \|

	---

	## 6. Recommended Architecture

	```
	┌──────────────────────────────────────────────────────┐
	│ FIRST-PRICE BIDDING ENGINE │
	│ │
	│ Dual OGD: λ_{t+1} = max(0, λ_t - ε·(ρ - c̃_t(b_t))) │
	│ Two-Sided: μ (cap) + ν (floor) dual variables │
	│ │
	├──────────────────────────────────────────────────────┤
	│ PREDICTION MODELS │
	│ │
	│ ┌─────────────────┐ ┌──────────────────────────┐ │
	│ │ CTR: FinalMLP │ │ Win Prob: Empirical CDF │ │
	│ │ v_t = pCTR × V │ │ G̃_t(b) = frac of d_s ≤ b │ │
	│ └─────────────────┘ └──────────────────────────┘ │
	│ │
	│ ┌──────────────────────────────────────────────┐ │
	│ │ Optional: TorchSurv for contextual win prob │ │
	│ │ P(win\|b,x) = 1 - S(b\|x) │ │
	│ └──────────────────────────────────────────────┘ │
	├──────────────────────────────────────────────────────┤
	│ DATASETS │
	│ Criteo_x4 (CTR training) + synthetic market prices │
	└──────────────────────────────────────────────────────┘
	```

	---

	## Paper Index

	\| # \| Paper \| arXiv \| Focus \|
	\|---\|-------\|-------\|-------\|
	\| 1 \| Wang et al. — Learning to Bid in Repeated FPA with Budgets \| 2304.13477 \| ⭐ Primary algorithm \|
	\| 2 \| — Adaptive Bidding under Non-Stationarity \| 2505.02796 \| Distribution shift \|
	\| 3 \| — Contextual First-Price (Quantile) \| 2603.07207 \| Contextual extension \|
	\| 4 \| — Joint Value Estimation and Bidding \| 2502.17292 \| Simultaneous CTR + bidding \|
	\| 5 \| — No-Regret in Repeated FPA with Budgets \| 2205.14572 \| General framework \|
	\| 6 \| Cai et al. — RLB \| 1701.02490 \| RL baseline \|
	\| 7 \| — Leveraging Hints: Adaptive Bidding \| 2211.06358 \| Hints/forecasts \|
	\| 8 \| Mao et al. — FinalMLP \| 2304.00902 \| CTR model \|
	\| 9 \| Wang et al. — DCN V2 \| 2008.13535 \| CTR model \|
	\| 10 \| Guo et al. — DeepFM \| — \| CTR model \|
	\| 11 \| Zhu et al. — BARS-CTR \| 2009.05794 \| CTR benchmark \|
	\| 12 \| Wu et al. — Censored Price Prediction \| — \| Clearing price \|
	\| 13 \| — TorchSurv \| 2404.10761 \| Survival analysis library \|