File size: 9,115 Bytes
03e57c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
"""
DualOGD Bidding Algorithm
Based on: Wang et al. "Learning to Bid in Repeated First-Price Auctions with Budgets" (2023)
arXiv: 2304.13477, Algorithm 1

The canonical Lagrangian dual multiplier approach with online gradient descent.

Core update:
  λ_{t+1} = Proj_{λ>0}(λ_t − ε · (ρ − c̃_t(b_t)))
  
Bid rule:
  b_t = argmax_b (r̃_t(v_t, b) − λ_t · c̃_t(b))
  
Where:
  v_t = value of winning = pCTR × value_per_click
  r̃_t(v,b) = (v-b) · G̃_t(b)   — empirical expected reward
  c̃_t(b) = b · G̃_t(b)          — empirical expected cost
  G̃_t(b) = empirical win probability from historical competing bids
  ρ = B/T = target spend per auction

The dual multiplier λ acts as a pace multiplier:
  - If you overspend → λ increases → future bids are penalized more → spend decreases
  - If you underspend → λ decreases → future bids are cheaper → spend increases
"""
import numpy as np


class DualOGDBidder:
    """
    Dual OGD bidder for first-price auctions with budget constraint.
    
    Full information feedback: observes all maximum competing bids d_t.
    """
    
    def __init__(
        self,
        budget,
        T,
        value_per_click,
        epsilon=None,
        empirical_cdf=None,
        name="DualOGD"
    ):
        """
        Args:
            budget: Total budget B
            T: Time horizon (number of auctions)
            value_per_click: Value of each click in currency units
            epsilon: Step size for dual update. Default: 1/sqrt(T)
            empirical_cdf: EmpiricalCDF instance for win prob estimation
            name: Algorithm name for logging
        """
        self.B = budget
        self.T = T
        self.rho = budget / T  # Target spend per auction
        self.vpc = value_per_click
        self.name = name
        
        # Dual multiplier λ
        self.lambd = 0.0
        
        # Step size
        self.epsilon = epsilon if epsilon is not None else 1.0 / np.sqrt(T)
        
        # Spend tracking
        self.total_spent = 0.0
        self.remaining_budget = budget
        self.t = 0
        self.total_wins = 0
        self.total_clicks = 0
        
        # History for empirical estimation
        self.competing_bids = []  # All observed d_t values
    
    def bid(self, pctr, features=None):
        """
        Compute bid for current auction.
        
        Args:
            pctr: Predicted click probability pCTR ∈ [0,1]
            features: Optional feature vector (unused in non-contextual version)
        
        Returns:
            bid_price: Optimal bid in [0, remaining_budget]
        """
        self.t += 1
        
        # Check if budget exhausted
        if self.remaining_budget <= 0:
            return 0.0
        
        v = pctr * self.vpc  # Value of winning this impression
        
        # Maximum possible bid: don't bid more than value or remaining budget
        max_bid = min(v * 2.0, self.remaining_budget)
        
        if max_bid <= 0.1:
            return 0.0
        
        # Find b_t = argmax_b (r̃_t(v,b) - λ · c̃_t(b))
        bid = self._find_optimal_bid(v, max_bid)
        
        return bid
    
    def _find_optimal_bid(self, v, max_bid, n_candidates=50):
        """Grid search for optimal bid."""
        if len(self.competing_bids) == 0:
            # No history: bid half of value as exploration
            return v * 0.5
        
        candidates = np.linspace(0.1, max_bid, n_candidates)
        best_score = -float('inf')
        best_bid = candidates[0]
        
        for b in candidates:
            win_prob = self._empirical_win_prob(b)
            reward = (v - b) * win_prob
            cost = b * win_prob
            score = reward - self.lambd * cost
            
            if score > best_score:
                best_score = score
                best_bid = b
        
        return float(best_bid)
    
    def _empirical_win_prob(self, b):
        """G̃_t(b) = fraction of historical competing bids ≤ b."""
        if not self.competing_bids:
            return 0.5
        return np.mean([1.0 if b >= d else 0.0 for d in self.competing_bids])
    
    def _empirical_expected_cost(self, b):
        """c̃_t(b) = b · G̃_t(b)."""
        return b * self._empirical_win_prob(b)
    
    def update(self, won, cost, pctr, d_t=None):
        """
        Update state after observing auction outcome.
        
        Args:
            won: bool, whether bid won
            cost: actual cost incurred (bid price in first-price)
            pctr: pCTR used (for logging)
            d_t: maximum competing bid (observed under full feedback)
        """
        if won:
            self.total_spent += cost
            self.remaining_budget -= cost
            self.total_wins += 1
        
        # Record competing bid for empirical estimation
        if d_t is not None:
            self.competing_bids.append(d_t)
        
        # Dual multiplier update: λ_{t+1} = max(0, λ_t - ε·(ρ - c̃_t(b_t)))
        # Use actual cost as feedback: gradient = ρ - cost
        cost_feedback = cost if won else 0.0
        gradient = self.rho - cost_feedback
        self.lambd = max(0.0, self.lambd - self.epsilon * gradient)
    
    def get_stats(self):
        """Get current algorithm statistics."""
        return {
            'name': self.name,
            'lambda': float(self.lambd),
            'spent': float(self.total_spent),
            'remaining': float(self.remaining_budget),
            'budget_used': float(self.total_spent / self.B) if self.B > 0 else 0,
            'wins': self.total_wins,
            't': self.t,
            'epsilon': float(self.epsilon),
            'rho': float(self.rho),
        }


class TwoSidedDualBidder(DualOGDBidder):
    """
    Two-sided dual multiplier bidder: budget cap + spend floor.
    
    Adds a second dual variable ν to enforce minimum spend (k%):
      μ: cap penalty    — restrains when ahead on spend
      ν: floor incentive — encourages when behind on spend
    
    Updates:
      μ_{t+1} = Proj(μ_t - η₁·(ρ - c̃_t(b_t)))     # cap
      ν_{t+1} = Proj(ν_t - η₂·(c̃_t(b_t) - kρ))    # floor
    
    Bid rule:
      b_t = argmax_b (r̃_t(v,b) - (μ_t - ν_t)·c̃_t(b))
    
    When μ > ν: cap dominates → bid conservatively
    When ν > μ: floor dominates → bid aggressively
    """
    
    def __init__(
        self,
        budget,
        T,
        value_per_click,
        k=0.8,  # Minimum spend fraction
        epsilon_cap=None,
        epsilon_floor=None,
        empirical_cdf=None,
        name="TwoSidedDual"
    ):
        super().__init__(budget, T, value_per_click, epsilon_cap, empirical_cdf, name)
        self.k = k  # Minimum spend fraction
        self.k_rho = k * self.rho  # Target minimum spend per auction
        
        # Floor dual multiplier ν
        self.nu = 0.0
        
        # Floor step size
        self.epsilon_floor = epsilon_floor if epsilon_floor is not None else 1.0 / np.sqrt(T)
        
        # Rename for clarity
        self.mu = self.lambd  # Cap multiplier
        self.epsilon_cap = self.epsilon
    
    def _find_optimal_bid(self, v, max_bid, n_candidates=50):
        """Bid with combined cap+floor penalty: (μ - ν) multiplier."""
        if len(self.competing_bids) == 0:
            return v * 0.5
        
        candidates = np.linspace(0.1, max_bid, n_candidates)
        best_score = -float('inf')
        best_bid = candidates[0]
        
        effective_multiplier = self.mu - self.nu
        
        for b in candidates:
            win_prob = self._empirical_win_prob(b)
            reward = (v - b) * win_prob
            cost = b * win_prob
            score = reward - effective_multiplier * cost
            
            if score > best_score:
                best_score = score
                best_bid = b
        
        return float(best_bid)
    
    def update(self, won, cost, pctr, d_t=None):
        """Update both dual variables."""
        if won:
            self.total_spent += cost
            self.remaining_budget -= cost
            self.total_wins += 1
        
        if d_t is not None:
            self.competing_bids.append(d_t)
        
        cost_feedback = cost if won else 0.0
        
        # Cap update: μ_{t+1} = max(0, μ_t - η₁·(ρ - cost))
        cap_gradient = self.rho - cost_feedback
        self.mu = max(0.0, self.mu - self.epsilon_cap * cap_gradient)
        
        # Floor update: ν_{t+1} = max(0, ν_t - η₂·(cost - kρ))
        floor_gradient = cost_feedback - self.k_rho
        self.nu = max(0.0, self.nu - self.epsilon_floor * floor_gradient)
        
        # Keep lambd in sync for stats
        self.lambd = self.mu
    
    def get_stats(self):
        stats = super().get_stats()
        stats.update({
            'mu': float(self.mu),
            'nu': float(self.nu),
            'effective_multiplier': float(self.mu - self.nu),
            'k': float(self.k),
            'k_rho': float(self.k_rho),
        })
        return stats