Spaces:
Running on Zero
Running on Zero
fix(spaces): clamp _duration_for to 120s (ZeroGPU per-call hard max)
Browse filesAfter fixing the contextvar propagation in 1e054fe, requests now reach
the scheduler with the user's X-IP-Token. The next failure mode surfaced:
ZeroGPU illegal duration (raised at spaces/zero/client.py:137 when
res.wait < timedelta(0) from the scheduler)
The server enforces a per-call duration cap of 120s on the standard
ZeroGPU tier, regardless of the user's remaining daily quota. Our
_duration_for was clamping to 900s which the scheduler rejected for any
preset+mode combination above ~2 minutes of estimated work. Drop the
ceiling to 120s. On H200 every preset we ship completes well under that;
longer requested durations just fail the guard check before any GPU is
attached.
- backend.py +6 -3
backend.py
CHANGED
|
@@ -109,14 +109,17 @@ def _duration_for(
|
|
| 109 |
ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
|
| 110 |
|
| 111 |
Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
|
| 112 |
-
decode time) × retry multiplier, clamped to [60s,
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
| 114 |
"""
|
| 115 |
base = _BASE_DURATION_S.get(mode, 180)
|
| 116 |
mult = _PRESET_MULT.get(preset.lower(), 1.5)
|
| 117 |
frames = _frames_from_workflow(workflow)
|
| 118 |
est = int((base * mult + 60 + frames * 0.3) * multiplier)
|
| 119 |
-
return max(60, min(est,
|
| 120 |
|
| 121 |
|
| 122 |
# Decorate at module load time so ZeroGPU's startup analyzer detects it.
|
|
|
|
| 109 |
ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
|
| 110 |
|
| 111 |
Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
|
| 112 |
+
decode time) × retry multiplier, clamped to [60s, 120s]. The 120s ceiling
|
| 113 |
+
is ZeroGPU's per-call hard maximum — server rejects requested durations
|
| 114 |
+
above it with "ZeroGPU illegal duration" (client.py:137, triggered when
|
| 115 |
+
res.wait < timedelta(0) from the scheduler). 120s on H200 is enough for
|
| 116 |
+
every preset we ship; longer estimates would just fail the guard check.
|
| 117 |
"""
|
| 118 |
base = _BASE_DURATION_S.get(mode, 180)
|
| 119 |
mult = _PRESET_MULT.get(preset.lower(), 1.5)
|
| 120 |
frames = _frames_from_workflow(workflow)
|
| 121 |
est = int((base * mult + 60 + frames * 0.3) * multiplier)
|
| 122 |
+
return max(60, min(est, 120))
|
| 123 |
|
| 124 |
|
| 125 |
# Decorate at module load time so ZeroGPU's startup analyzer detects it.
|