Spaces:

techfreakworm
/

LTX2.3-Studio

Running on Zero

App Files Files Community

techfreakworm commited on 15 days ago

Commit

3c69062

unverified ·

1 Parent(s): b3e0fc9

fix(spaces): bump duration cap, drop broken auto-retry, add friendly errors + like banner

Browse files

Logs from run #202 (style transfer) confirmed the cascade:
1. style submitted, ran ~127s of GPU time
2. @spaces.GPU(duration=120) cap hit -> 'GPU task aborted' (line 217)
3. our auto-retry fired with the SAME captured request
4. retry's schedule call -> 'Expired ZeroGPU proxy token' (401, line 226)
because the captured request's X-IP-Token had aged past TTL during run 1

The visible 'expired token' was the symptom of the abort. Two real bugs:
- 120s cap was too tight for style+lipsync detailer paths (~120-180s
actual on H200)
- auto-retry captured stale request tokens; second attempt always 401'd

Fixes:
- _duration_for: clamp [60, 240]. Pro identity accepts these; if a
server rejects, the user sees a clear illegal_duration error.
- _on_generate: drop the for attempt in (0,1) retry. Single attempt;
timeout/expired surfaces as a friendly message; user clicks Generate
again -> fresh request -> fresh token -> succeeds.
- _classify expanded: expired_token, illegal_duration, unlogged,
quota_exceeded categories surface distinctly.
- _FRIENDLY_ERRORS dict + _friendly_error helper: error popups now read
'Hit the GPU time limit / Session timed out / Daily quota used up'
etc with actionable next steps, instead of raw exception strings.
- aio-tipbar at the top: 'Drop a heart at the top of this page to
support it' — quick visibility nudge for the HF like button.

Files changed (2) hide show

app.py +86 -41
backend.py +15 -8

app.py CHANGED Viewed

@@ -280,6 +280,19 @@ _CUSTOM_CSS = """
     border-radius: 4px;
 }
 /* === Drawer === */
 .aio-shell { position: relative; }
 .aio-drawer {
@@ -461,6 +474,13 @@ def build_app() -> gr.Blocks:
             '  <span class="aio-mode-tag" id="aio-mode-tag">T2V</span>'
             '</div>'
         )
         with gr.Row(elem_classes=["aio-shell"]):
             # Drawer (drawer behaves as fixed sidebar ≥1024 px;
@@ -748,6 +768,58 @@ def _stage_to_comfy_input(file_path) -> str | None:
 PRESET_DURATION = {"Fast": 60, "Balanced": 120, "Quality": 300}
 def _seconds_to_frames(seconds: float, fps: int) -> int:
     return max(9, int(round(float(seconds) * float(fps) / 8) * 8) + 1)
@@ -841,55 +913,28 @@ async def _on_generate(mode_name: str, *, progress: Any = None, **inputs: Any):
             video_update = event.video_path if event.video_path else gr.update()
             return (ui._render_idle(), video_update)
         if isinstance(event, backend_module.ErrorEvent):
             return (
                 f'<div class="status-card status-error">'
-                f'  <div class="status-row"><span class="status-stage">Error · {event.category}</span></div>'
-                f"  <div>{event.message}</div>"
                 f"</div>",
                 gr.update(),
             )
         return None
-    # Tier 1 + Tier 2: one normal attempt; if it aborts on ZeroGPU duration
-    # cap, retry once with a 2× duration multiplier. Each multiplier is
-    # capped at 900s server-side, so the second attempt never exceeds that.
     started = time.time()
-    multiplier = 1.0
-    timed_out = False
-    for attempt in (0, 1):
-        if attempt == 1:
-            # Show a friendly retry banner before the second submit
-            yield (
-                '<div class="status-card status-error">'
-                '  <div class="status-row"><span class="status-stage">'
-                "Retrying with extended GPU budget</span></div>"
-                "  <div>First attempt hit the per-call duration cap "
-                "(usually a cold model cache or a heavier mode than estimated). "
-                "Reserving 2× the budget and trying once more.</div>"
-                "</div>",
-                gr.update(),
-            )
-            multiplier = 2.0
-            started = time.time()  # reset so progress ETAs are sensible
-        timed_out = False
-        async for event in backend.submit(
-            mode_name, workflow,
-            preset=preset, duration_multiplier=multiplier,
-            progress=progress,
-        ):
-            if (
-                isinstance(event, backend_module.ErrorEvent)
-                and event.category == "gpu_timeout"
-                and attempt == 0
-            ):
-                timed_out = True
-                break  # don't yield the timeout error — auto-retry instead
-            translated = await _translate(event, started)
-            if translated is not None:
-                yield translated
-        if not timed_out:
-            return
 def _input_keys_for_mode(mode_name: str, h: dict) -> list[str]:

     border-radius: 4px;
 }
+.aio-tipbar {
+    margin: 0 0 6px 0;
+    padding: 6px 14px;
+    font-family: 'IBM Plex Sans', system-ui, sans-serif;
+    font-size: 12px;
+    color: #B5BCC6;
+    background: #1A1F26;
+    border-bottom: 1px solid #262C35;
+    text-align: center;
+}
+.aio-tipbar strong { color: #E6E8EB; font-weight: 500; }
+.aio-tipbar .aio-heart { color: #E55B6E; }
 /* === Drawer === */
 .aio-shell { position: relative; }
 .aio-drawer {
             '  <span class="aio-mode-tag" id="aio-mode-tag">T2V</span>'
             '</div>'
         )
+        gr.HTML(
+            '<div class="aio-tipbar">'
+            'Liking this project? '
+            '<strong>Drop a <span class="aio-heart">♥</span> at the top of this page</strong> '
+            'to support it.'
+            '</div>'
+        )
         with gr.Row(elem_classes=["aio-shell"]):
             # Drawer (drawer behaves as fixed sidebar ≥1024 px;
 PRESET_DURATION = {"Fast": 60, "Balanced": 120, "Quality": 300}
+_FRIENDLY_ERRORS: dict[str, tuple[str, str]] = {
+    "gpu_timeout": (
+        "Hit the GPU time limit",
+        "This run took longer than the GPU budget. Try the Fast preset, a "
+        "shorter video, or a smaller resolution — then click Generate again.",
+    ),
+    "expired_token": (
+        "Session timed out",
+        "Your sign-in session expired. Refresh the page and try again — "
+        "you'll keep your spot in the GPU queue.",
+    ),
+    "illegal_duration": (
+        "GPU budget too high",
+        "The estimator asked for more GPU time than the server allows. "
+        "Try Fast preset or a shorter video.",
+    ),
+    "unlogged": (
+        "Sign-in not detected",
+        "Make sure you're signed into huggingface.co (top-right avatar), "
+        "then refresh this page. Pro accounts get 25 min of GPU per day.",
+    ),
+    "quota_exceeded": (
+        "Daily GPU quota used up",
+        "You've used today's GPU minutes. Wait for the rolling 24-hour "
+        "reset, or upgrade Pro at huggingface.co/subscribe/pro for more.",
+    ),
+    "oom": (
+        "Ran out of GPU memory",
+        "Try a smaller resolution, fewer frames, or the Fast preset.",
+    ),
+    "interrupt": (
+        "Cancelled",
+        "Generation was cancelled. Click Generate to start a fresh run.",
+    ),
+    "download": (
+        "Model download failed",
+        "Couldn't fetch a required model file. Check your internet and try again.",
+    ),
+}
+def _friendly_error(category: str, raw_message: str) -> tuple[str, str]:
+    """Translate a backend error category into (title, body) the user can act on."""
+    if category in _FRIENDLY_ERRORS:
+        return _FRIENDLY_ERRORS[category]
+    return (
+        "Generation failed",
+        "Something went wrong. Click Generate to retry, or check the Space "
+        "logs if it keeps happening.",
+    )
 def _seconds_to_frames(seconds: float, fps: int) -> int:
     return max(9, int(round(float(seconds) * float(fps) / 8) * 8) + 1)
             video_update = event.video_path if event.video_path else gr.update()
             return (ui._render_idle(), video_update)
         if isinstance(event, backend_module.ErrorEvent):
+            title, body = _friendly_error(event.category, event.message)
             return (
                 f'<div class="status-card status-error">'
+                f'  <div class="status-row"><span class="status-stage">{title}</span></div>'
+                f"  <div>{body}</div>"
                 f"</div>",
                 gr.update(),
             )
         return None
+    # Single attempt. ZeroGPU-side abort (duration cap) and 401 expired-token
+    # surface as friendly messages via _friendly_error; user clicks Generate
+    # again to retry with a fresh request and fresh X-IP-Token.
     started = time.time()
+    async for event in backend.submit(
+        mode_name, workflow,
+        preset=preset, duration_multiplier=1.0,
+        progress=progress,
+    ):
+        translated = await _translate(event, started)
+        if translated is not None:
+            yield translated
 def _input_keys_for_mode(mode_name: str, h: dict) -> list[str]:

backend.py CHANGED Viewed

@@ -109,17 +109,17 @@ def _duration_for(
     ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
     Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
-    decode time) × retry multiplier, clamped to [60s, 120s]. The 120s ceiling
-    is ZeroGPU's per-call hard maximum — server rejects requested durations
-    above it with "ZeroGPU illegal duration" (client.py:137, triggered when
-    res.wait < timedelta(0) from the scheduler). 120s on H200 is enough for
-    every preset we ship; longer estimates would just fail the guard check.
     """
     base = _BASE_DURATION_S.get(mode, 180)
     mult = _PRESET_MULT.get(preset.lower(), 1.5)
     frames = _frames_from_workflow(workflow)
     est = int((base * mult + 60 + frames * 0.3) * multiplier)
-    return max(60, min(est, 120))
 # Decorate at module load time so ZeroGPU's startup analyzer detects it.
@@ -529,9 +529,16 @@ def _classify(exc: Exception) -> str:
     msg = str(exc).lower()
     if "outofmemory" in name or "cuda out of memory" in msg:
         return "oom"
     # ZeroGPU enforces the @spaces.GPU(duration=N) cap and re-raises as
-    # gradio.exceptions.Error('GPU task aborted'). Surface a distinct
-    # category so the handler can offer a retry with a bigger budget.
     if "gpu task aborted" in msg or ("gpu" in msg and "aborted" in msg):
         return "gpu_timeout"
     if "interrupt" in name:

     ZeroGPU can call us with the same arg list it'll use for _execute_workflow.
     Estimate = (base × preset multiplier + cold-cache buffer + per-frame VAE
+    decode time) × retry multiplier, clamped to [60s, 240s]. ZeroGPU rejects
+    durations above the server's per-call max with "ZeroGPU illegal duration"
+    (client.py:137); 240s is observed to work for Pro identity (~2 min runs
+    needed for style + lipsync detailer paths). If the server rejects values
+    in this range, the user will see a clear error and can retry.
     """
     base = _BASE_DURATION_S.get(mode, 180)
     mult = _PRESET_MULT.get(preset.lower(), 1.5)
     frames = _frames_from_workflow(workflow)
     est = int((base * mult + 60 + frames * 0.3) * multiplier)
+    return max(60, min(est, 240))
 # Decorate at module load time so ZeroGPU's startup analyzer detects it.
     msg = str(exc).lower()
     if "outofmemory" in name or "cuda out of memory" in msg:
         return "oom"
+    if "expired zerogpu proxy token" in msg or "expired" in msg and "token" in msg:
+        return "expired_token"
+    if "illegal duration" in msg:
+        return "illegal_duration"
+    if "unlogged user" in msg:
+        return "unlogged"
+    if "exceeded your" in msg and "gpu" in msg:
+        return "quota_exceeded"
     # ZeroGPU enforces the @spaces.GPU(duration=N) cap and re-raises as
+    # gradio.exceptions.Error('GPU task aborted').
     if "gpu task aborted" in msg or ("gpu" in msg and "aborted" in msg):
         return "gpu_timeout"
     if "interrupt" in name: