bidding_algorithms_benchmark / RESEARCH_RESOURCES.md
hamverbot's picture
Upload RESEARCH_RESOURCES.md
a40d07a verified
# Bidding Algorithms Benchmark β€” Research Resources
> First-Price Auction Focus | Generated: 2026-05-05
> Repo: https://huggingface.co/hamverbot/bidding_algorithms_benchmark
---
## 1. Bidding Algorithms for First-Price Auctions
### 1.1 DualOGD β€” Lagrangian Dual + Online Gradient Descent ⭐ PRIMARY
| Property | Detail |
|----------|--------|
| **Paper** | "Learning to Bid in Repeated First-Price Auctions with Budgets" |
| **Authors** | Qian Wang, Zongjun Yang, Xiaotie Deng, Yuqing Kong |
| **Venue** | 2023 |
| **arXiv** | [2304.13477](https://arxiv.org/abs/2304.13477) |
| **HF Papers** | https://huggingface.co/papers/2304.13477 |
| **Algorithm** | DualOGD β€” Lagrangian dual multiplier updated by online error gradient descent |
| **Auction Type** | **First-price** |
| **Constraints** | Budget cap: total spend ≀ ρT |
| **Regret Bound** | Γ•(√T) for both full-information and one-sided feedback |
| **Key Formula** | Ξ»_{t+1} = Proj_{Ξ»>0}(Ξ»_t βˆ’ Ρ·(ρ βˆ’ cΜƒ_t(b_t))) |
| **Bid Rule** | b_t = argmax_b (rΜƒ_t(v_t, b) βˆ’ Ξ»_tΒ·cΜƒ_t(b)) |
| **Prediction Models** | CTR predictor (v_t) + empirical CDF of competing bids (G̃_t) |
| **Code Pattern (Wang 2023, Algorithm 1)** |
```python
# Initialization
λ = 0.0; Ρ = 1.0 / sqrt(T); ρ = B / T
for t in range(T):
v_t = pCTR(features_t) * click_value # from CTR model
# Estimate reward and cost from historical competing bids
r_tilde = lambda b: mean([(v_t - b) for d in d_history if b >= d])
c_tilde = lambda b: mean([b for d in d_history if b >= d])
# Bid: maximize cost-adjusted reward
b_t = argmax_b (r_tilde(v_t, b) - Ξ» * c_tilde(b))
# Observe maximum competing bid d_t (full feedback)
won = (b_t >= d_t); cost = b_t if won else 0
# Online gradient descent on dual multiplier
λ = max(0, λ - Ρ * (ρ - c_tilde(b_t)))
```
### 1.2 TwoSidedDual β€” Budget Cap + Spend Floor
| Property | Detail |
|----------|--------|
| **Base** | Extension of Wang et al. (2023) |
| **Constraints** | Total spend ≀ B (cap) AND spend β‰₯ kΒ·B (floor) |
| **Dual Variables** | ΞΌ for cap, Ξ½ for floor |
| **Updates** |
| ΞΌ_{t+1} = Proj_{ΞΌβ‰₯0}(ΞΌ_t βˆ’ η₁·(ρ βˆ’ cΜƒ_t(b_t))) | cap penalty |
| Ξ½_{t+1} = Proj_{Ξ½β‰₯0}(Ξ½_t βˆ’ Ξ·β‚‚Β·(cΜƒ_t(b_t) βˆ’ kρ)) | floor incentive |
| **Bid Rule** | b_t = argmax_b (rΜƒ_t(v_t, b) βˆ’ (ΞΌ_t βˆ’ Ξ½_t)Β·cΜƒ_t(b)) |
| **Key Insight** | When ΞΌ > Ξ½: bidding is restrained (ahead on spend). When Ξ½ > ΞΌ: bidding is encouraged (behind on spend floor). |
### 1.3 Adversarial Bidding β€” Non-Stationary Environments
| Property | Detail |
|----------|--------|
| **Paper** | "Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity" |
| **arXiv** | [2505.02796](https://arxiv.org/abs/2505.02796) |
| **Algorithm** | Adaptive dual OGD with change-point detection |
| **Key Insight** | When distribution shifts, resets dual multiplier and restarts learning |
### 1.4 Contextual First-Price (2026)
| Property | Detail |
|----------|--------|
| **Paper** | "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback" |
| **arXiv** | [2603.07207](https://arxiv.org/abs/2603.07207) |
| **Algorithm** | Dual OGD + quantile-based contextual censored regression |
| **Key Innovation** | Extends Wang to contextual (feature-based) auctions |
### 1.5 Joint Value Estimation + Bidding
| Property | Detail |
|----------|--------|
| **Paper** | "Joint Value Estimation and Bidding in Repeated First-Price Auctions" |
| **arXiv** | [2502.17292](https://arxiv.org/abs/2502.17292) |
| **Key Insight** | Simultaneously learn CTR and bidding strategy β€” no separate CTR model training phase |
### 1.6 RLB β€” Reinforcement Learning (Baseline)
| Property | Detail |
|----------|--------|
| **Paper** | "Real-Time Bidding by Reinforcement Learning in Display Advertising" |
| **Authors** | Han Cai et al., WSDM 2017 |
| **arXiv** | [1701.02490](https://arxiv.org/abs/1701.02490) |
| **GitHub** | https://github.com/han-cai/rlb-dp (188 stars) |
| **Algorithm** | MDP + Dynamic Programming + Neural value function |
| **State** | (remaining auctions, remaining budget, features) |
| **Action** | bid price a ∈ [0, budget] |
| **Prediction Models** | CTR ΞΈ(x) + market price distribution m(Ξ΄, x) |
### 1.7 Static Baselines
| Algorithm | Bid Rule | Notes |
|-----------|----------|-------|
| **Linear** | b = base_bid Γ— (pCTR / avg_pCTR) | Simple proportional |
| **Threshold** | b = fixed_bid if pCTR > Ο„ else 0 | Binary decision |
| **ValueShading** | b = v_t / (1 + Ξ») | From second-price literature, adapted |
### Algorithm Comparison Matrix
| Algorithm | Adaptive? | Market Price Model? | Two-Sided? | Provable Regret? | Complexity |
|-----------|-----------|-------------------|------------|-----------------|------------|
| **DualOGD** | βœ… Online | Empirical CDF | ❌ (cap only) | βœ… Γ•(√T) | Medium |
| **TwoSidedDual** | βœ… Online | Empirical CDF | βœ… (cap+floor) | ❌ (heuristic) | Medium |
| **RLB** | βœ… DP | Neural dist. model | ❌ | ❌ | High |
| **Linear** | ❌ | None | ❌ | ❌ | Minimal |
| **Threshold** | ❌ | None | ❌ | ❌ | Minimal |
---
## 2. CTR Prediction Models
### 2.1 FinalMLP ⭐ RECOMMENDED
| Property | Detail |
|----------|--------|
| **Paper** | "FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction" |
| **Authors** | Kelong Mao et al., AAAI 2023 |
| **arXiv** | [2304.00902](https://arxiv.org/abs/2304.00902) |
| **Criteo AUC** | **0.8149** |
| **Architecture** | Two independent MLP towers + feature gating (soft selection) + bilinear fusion |
| **Why Best for RTB** | Pure feed-forward MLP β€” <1ms inference, no attention/RNN overhead |
| **Library** | FuxiCTR (`pip install fuxictr`) or DeepCTR-Torch |
```python
# FinalMLP architecture
# Stream 1: MLP(features * gate_weights) β†’ feature selection
# Stream 2: MLP(features * (1-gate_weights)) β†’ complementary view
# Fusion: Bilinear(stream1_output, stream2_output) β†’ sigmoid
```
### 2.2 DeepFM β€” Simple Baseline
| Property | Detail |
|----------|--------|
| **Paper** | "DeepFM: A Factorization-Machine based Neural Network" |
| **Criteo AUC** | 0.8138 |
| **Architecture** | Shared embedding β†’ FM (2nd-order) + DNN β†’ sum β†’ sigmoid |
### 2.3 DCNv2 β€” Industry Standard
| Property | Detail |
|----------|--------|
| **Paper** | "DCN V2: Improved Deep & Cross Network" (WWW 2021) |
| **arXiv** | [2008.13535](https://arxiv.org/abs/2008.13535) |
| **Criteo AUC** | 0.8142-0.8144 |
| **Architecture** | CrossNetV2 (low-rank) + DNN in parallel |
### 2.4 BARS Meta-Finding ⚠️ IMPORTANT
| Property | Detail |
|----------|--------|
| **Paper** | "BARS-CTR: Open Benchmarking" [2009.05794] |
| **Finding** | After 7,000+ experiments: **differences between SOTA CTR models are ≀0.1-0.3% AUC** |
| **Implication** | Architecture choice matters less than data preprocessing, hyperparameter tuning, and feature engineering |
### CTR Model Comparison for RTB
| Model | AUC (Criteo) | Inference Speed | RTB Latency OK? |
|-------|-------------|-----------------|-----------------|
| **FinalMLP** | 0.8149 | ⭐⭐⭐⭐⭐ | βœ… Yes |
| DCNv2 | 0.8142 | ⭐⭐⭐⭐ | βœ… Yes |
| DeepFM | 0.8138 | ⭐⭐⭐⭐ | βœ… Yes |
| GDCN | 0.8161* | ⭐⭐⭐⭐ | βœ… Yes |
| DIN | β€” | ⭐⭐ | ❌ No |
| DIEN | β€” | ⭐ | ❌ No |
*Own data split, not directly comparable.
---
## 3. Clearing Price / Win Probability Prediction
### 3.1 Empirical CDF (Non-Parametric) ⭐ BASELINE
| Property | Detail |
|----------|--------|
| **Source** | Wang et al. (2023), Algorithm 1, Section 3.1 |
| **Method** | Maintain array of observed competing bids d_s |
| **Win Probability** | P(win|b) = GΜƒ_t(b) = (1/(t-1))βˆ‘_{s=1}^{t-1} πŸ™{b β‰₯ d_s} |
| **Expected Cost** | E[cost|win,b] = (1/GΜƒ_t(b)) Β· mean({d_s : d_s ≀ b}) |
| **Expected Reward** | r̃_t(v,b) = (v - b) · G̃_t(b) |
| **Expected Cost (dual)** | c̃_t(b) = b · G̃_t(b) |
| **Pros** | No training, theoretically sound, adapts online |
| **Cons** | No context, cold-start issue, requires full feedback |
```python
class EmpiricalCDF:
def __init__(self):
self.competing_bids = []
def update(self, d_t):
"""d_t = maximum competing bid (observed under full feedback)"""
self.competing_bids.append(d_t)
def win_prob(self, b):
if not self.competing_bids:
return 0.5
return np.mean([1.0 if b >= d else 0.0 for d in self.competing_bids])
def expected_cost(self, b):
wins = [d for d in self.competing_bids if b >= d]
if not wins:
return b
return np.mean(wins)
```
### 3.2 TorchSurv β€” Deep Censored Learning
| Property | Detail |
|----------|--------|
| **Library** | TorchSurv (Novartis, 200β˜…) |
| **Paper** | [2404.10761](https://arxiv.org/abs/2404.10761) |
| **GitHub** | https://github.com/Novartis/torchsurv |
| **Install** | `pip install torchsurv` |
| **Method** | Neural network with Cox PH or Weibull AFT loss |
| **Censoring** | Win = uncensored (exact price), Loss = right-censored (price > bid) |
| **Output** | Survival function S(b|x) = P(market_price > b | features) |
| **Win Prob** | P(win|b,x) = 1 βˆ’ S(b|x) |
```python
from torchsurv.loss import cox
# log_hazard = model(features) # shape (batch,)
# event = 1 if won (uncensored), 0 if lost (censored)
# time = market_price if won, bid if lost
loss = cox.neg_partial_log_likelihood(log_hazard, event, time)
```
### 3.3 Win Probability NN (Simplest ML)
| Property | Detail |
|----------|--------|
| **Method** | Binary classifier: P(win | bid_price, features) |
| **Pros** | Dead simple, BCELoss |
| **Cons** | Ignores price magnitude info when winning |
| **Architecture** | features βŠ• bid_price β†’ MLP β†’ sigmoid |
---
## 4. Datasets
### Verified on HuggingFace Hub
| Dataset | HF Path | Rows | Features | Label | Status |
|---------|---------|------|----------|-------|--------|
| **Criteo_x4** | `reczoo/Criteo_x4` | 45.8M | 13 dense + 26 cat | `Label` | βœ… Ready |
| Avazu_x4 | `reczoo/Avazu_x4` | 40.4M | 22 fields | `click` | βœ… Ready |
### RTB-Specific (External Only)
| Dataset | Description | Availability |
|---------|-------------|-------------|
| iPinYou | 19.5M impressions, 9 campaigns, market prices included | data.computational-advertising.org |
| YOYI | ~400M bid log records | Various mirrors |
**Key Gap**: No first-price auction bid logs on HF Hub. Criteo/Avazu have click labels but no bid/market price columns. We use **synthetic market price generation** conditioned on Criteo features for evaluation.
---
## 5. Codebases & Libraries
| Library | URL | Use |
|---------|-----|-----|
| FuxiCTR | github.com/reczoo/FuxiCTR | 40+ CTR models, config-driven |
| DeepCTR-Torch | github.com/shenweichen/DeepCTR-Torch | 20+ CTR models |
| TorchSurv | github.com/Novartis/torchsurv | Survival analysis for clearing price |
| BARS | github.com/openbenchmark/BARS | Standardized CTR benchmark |
| rlb-dp | github.com/han-cai/rlb-dp | RL for RTB (baseline reference) |
---
## 6. Recommended Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FIRST-PRICE BIDDING ENGINE β”‚
β”‚ β”‚
β”‚ Dual OGD: Ξ»_{t+1} = max(0, Ξ»_t - Ρ·(ρ - cΜƒ_t(b_t))) β”‚
β”‚ Two-Sided: ΞΌ (cap) + Ξ½ (floor) dual variables β”‚
β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ PREDICTION MODELS β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ CTR: FinalMLP β”‚ β”‚ Win Prob: Empirical CDF β”‚ β”‚
β”‚ β”‚ v_t = pCTR Γ— V β”‚ β”‚ GΜƒ_t(b) = frac of d_s ≀ b β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Optional: TorchSurv for contextual win prob β”‚ β”‚
β”‚ β”‚ P(win|b,x) = 1 - S(b|x) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DATASETS β”‚
β”‚ Criteo_x4 (CTR training) + synthetic market prices β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Paper Index
| # | Paper | arXiv | Focus |
|---|-------|-------|-------|
| 1 | Wang et al. β€” Learning to Bid in Repeated FPA with Budgets | 2304.13477 | ⭐ Primary algorithm |
| 2 | β€” Adaptive Bidding under Non-Stationarity | 2505.02796 | Distribution shift |
| 3 | β€” Contextual First-Price (Quantile) | 2603.07207 | Contextual extension |
| 4 | β€” Joint Value Estimation and Bidding | 2502.17292 | Simultaneous CTR + bidding |
| 5 | β€” No-Regret in Repeated FPA with Budgets | 2205.14572 | General framework |
| 6 | Cai et al. β€” RLB | 1701.02490 | RL baseline |
| 7 | β€” Leveraging Hints: Adaptive Bidding | 2211.06358 | Hints/forecasts |
| 8 | Mao et al. β€” FinalMLP | 2304.00902 | CTR model |
| 9 | Wang et al. β€” DCN V2 | 2008.13535 | CTR model |
| 10 | Guo et al. β€” DeepFM | β€” | CTR model |
| 11 | Zhu et al. β€” BARS-CTR | 2009.05794 | CTR benchmark |
| 12 | Wu et al. β€” Censored Price Prediction | β€” | Clearing price |
| 13 | β€” TorchSurv | 2404.10761 | Survival analysis library |