techfreakworm commited on
Commit
444a596
·
unverified ·
1 Parent(s): 1e054fe

fix(spaces): clamp _duration_for to 120s (ZeroGPU per-call hard max)

Browse files

After fixing the contextvar propagation in 1e054fe, requests now reach
the scheduler with the user's X-IP-Token. The next failure mode surfaced:

ZeroGPU illegal duration (raised at spaces/zero/client.py:137 when
res.wait < timedelta(0) from the scheduler)

The server enforces a per-call duration cap of 120s on the standard
ZeroGPU tier, regardless of the user's remaining daily quota. Our
_duration_for was clamping to 900s which the scheduler rejected for any
preset+mode combination above ~2 minutes of estimated work. Drop the
ceiling to 120s. On H200 every preset we ship completes well under that;
longer requested durations just fail the guard check before any GPU is
attached.

Files changed (1) hide show
  1. backend.py +6 -3
backend.py CHANGED
@@ -109,14 +109,17 @@ def _duration_for(
109
  ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
110
 
111
  Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
112
- decode time) × retry multiplier, clamped to [60s, 900s]. The 900s ceiling
113
- keeps a single failed call from torching the daily quota.
 
 
 
114
  """
115
  base = _BASE_DURATION_S.get(mode, 180)
116
  mult = _PRESET_MULT.get(preset.lower(), 1.5)
117
  frames = _frames_from_workflow(workflow)
118
  est = int((base * mult + 60 + frames * 0.3) * multiplier)
119
- return max(60, min(est, 900))
120
 
121
 
122
  # Decorate at module load time so ZeroGPU's startup analyzer detects it.
 
109
  ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
110
 
111
  Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
112
+ decode time) × retry multiplier, clamped to [60s, 120s]. The 120s ceiling
113
+ is ZeroGPU's per-call hard maximum server rejects requested durations
114
+ above it with "ZeroGPU illegal duration" (client.py:137, triggered when
115
+ res.wait < timedelta(0) from the scheduler). 120s on H200 is enough for
116
+ every preset we ship; longer estimates would just fail the guard check.
117
  """
118
  base = _BASE_DURATION_S.get(mode, 180)
119
  mult = _PRESET_MULT.get(preset.lower(), 1.5)
120
  frames = _frames_from_workflow(workflow)
121
  est = int((base * mult + 60 + frames * 0.3) * multiplier)
122
+ return max(60, min(est, 120))
123
 
124
 
125
  # Decorate at module load time so ZeroGPU's startup analyzer detects it.