Spaces:

techfreakworm
/

LTX2.3-Studio

Running on Zero

App Files Files Community

techfreakworm commited on 15 days ago

Commit

444a596

unverified ·

1 Parent(s): 1e054fe

fix(spaces): clamp _duration_for to 120s (ZeroGPU per-call hard max)

Browse files

After fixing the contextvar propagation in 1e054fe, requests now reach
the scheduler with the user's X-IP-Token. The next failure mode surfaced:

ZeroGPU illegal duration (raised at spaces/zero/client.py:137 when
res.wait < timedelta(0) from the scheduler)

The server enforces a per-call duration cap of 120s on the standard
ZeroGPU tier, regardless of the user's remaining daily quota. Our
_duration_for was clamping to 900s which the scheduler rejected for any
preset+mode combination above ~2 minutes of estimated work. Drop the
ceiling to 120s. On H200 every preset we ship completes well under that;
longer requested durations just fail the guard check before any GPU is
attached.

Files changed (1) hide show

backend.py +6 -3

backend.py CHANGED Viewed

@@ -109,14 +109,17 @@ def _duration_for(
     ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
     Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
-    decode time) × retry multiplier, clamped to [60s, 900s]. The 900s ceiling
-    keeps a single failed call from torching the daily quota.
     """
     base = _BASE_DURATION_S.get(mode, 180)
     mult = _PRESET_MULT.get(preset.lower(), 1.5)
     frames = _frames_from_workflow(workflow)
     est = int((base * mult + 60 + frames * 0.3) * multiplier)
-    return max(60, min(est, 900))
 # Decorate at module load time so ZeroGPU's startup analyzer detects it.

     ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
     Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
+    decode time) × retry multiplier, clamped to [60s, 120s]. The 120s ceiling
+    is ZeroGPU's per-call hard maximum — server rejects requested durations
+    above it with "ZeroGPU illegal duration" (client.py:137, triggered when
+    res.wait < timedelta(0) from the scheduler). 120s on H200 is enough for
+    every preset we ship; longer estimates would just fail the guard check.
     """
     base = _BASE_DURATION_S.get(mode, 180)
     mult = _PRESET_MULT.get(preset.lower(), 1.5)
     frames = _frames_from_workflow(workflow)
     est = int((base * mult + 60 + frames * 0.3) * multiplier)
+    return max(60, min(est, 120))
 # Decorate at module load time so ZeroGPU's startup analyzer detects it.