Spaces:

techfreakworm
/

LTX2.3-Studio

Running on Zero

App Files Files Community

techfreakworm commited on 15 days ago

Commit

1e054fe

unverified ·

1 Parent(s): 0a98b65

fix(spaces): propagate Gradio request contextvar to GPU worker thread

Browse files

Real root cause of 'Space app has reached its GPU limit. Try re-running
outside of examples' (raised after a few gens, while the user dashboard
still shows 0/25 minutes used).

ZeroGPU's @spaces.GPU wrapper reads the user's identity from the current
Gradio request via gradio.context.LocalContext.request — a contextvar.
The wrapper at spaces/zero/wrappers.py:187 does:

request = request_var.get(None)
schedule_response = client.schedule(task_id, request=request, ...)

Then spaces/zero/client.py:_get_token_and_payload reads X-IP-Token from
the request headers (set by HF's reverse proxy when the user is signed
into huggingface.co). If the token is None, client.schedule:138 raises
the 'Space app has reached its GPU limit' message and falls back to the
per-Space anonymous rate limit — which exhausts after a few calls.

Our backend ran _execute_workflow from a plain threading.Thread for the
event-streaming pattern. Python threads don't inherit contextvars, so
inside the worker the request was None even though the Gradio handler
saw the real request. Token never reached the scheduler; user got
bucketed as Space-anonymous.

Fix: copy the calling task's contextvar context (contextvars.copy_context())
and run the worker inside it. The request — and therefore the user's Pro
quota attribution — now survives the thread boundary.

The earlier hf_oauth + LoginButton change was unrelated (red herring).
X-IP-Token is set from the standard HF login cookie at the proxy, not
via OAuth. The OAuth changes stay as harmless decoration.

Files changed (1) hide show

backend.py +11 -1

backend.py CHANGED Viewed

@@ -7,6 +7,7 @@ divergence between local and HF Spaces deployment.
 from __future__ import annotations
 import asyncio
 import os
 import pathlib
 import sys
@@ -492,7 +493,16 @@ class ComfyUILibraryBackend:
                 _free_memory()
                 _push(None)  # sentinel: stop the consumer
-        thread = threading.Thread(target=_worker, daemon=True)
         thread.start()
         while True:

 from __future__ import annotations
 import asyncio
+import contextvars
 import os
 import pathlib
 import sys
                 _free_memory()
                 _push(None)  # sentinel: stop the consumer
+        # ZeroGPU's @spaces.GPU wrapper reads the user's identity from the
+        # current Gradio request via gradio.context.LocalContext.request,
+        # which is a contextvar. Plain threads don't inherit contextvars, so
+        # without this the worker sees request=None, X-IP-Token never gets
+        # read, and `client.schedule` raises "Space app has reached its GPU
+        # limit" (token-is-None branch in spaces/zero/client.py:138). Copy
+        # the calling task's context so the request — and therefore the Pro
+        # user's quota attribution — survives the thread boundary.
+        ctx = contextvars.copy_context()
+        thread = threading.Thread(target=ctx.run, args=(_worker,), daemon=True)
         thread.start()
         while True: