Spaces:
Running on Zero
fix(spaces): propagate Gradio request contextvar to GPU worker thread
Browse filesReal root cause of 'Space app has reached its GPU limit. Try re-running
outside of examples' (raised after a few gens, while the user dashboard
still shows 0/25 minutes used).
ZeroGPU's @spaces.GPU wrapper reads the user's identity from the current
Gradio request via gradio.context.LocalContext.request — a contextvar.
The wrapper at spaces/zero/wrappers.py:187 does:
request = request_var.get(None)
schedule_response = client.schedule(task_id, request=request, ...)
Then spaces/zero/client.py:_get_token_and_payload reads X-IP-Token from
the request headers (set by HF's reverse proxy when the user is signed
into huggingface.co). If the token is None, client.schedule:138 raises
the 'Space app has reached its GPU limit' message and falls back to the
per-Space anonymous rate limit — which exhausts after a few calls.
Our backend ran _execute_workflow from a plain threading.Thread for the
event-streaming pattern. Python threads don't inherit contextvars, so
inside the worker the request was None even though the Gradio handler
saw the real request. Token never reached the scheduler; user got
bucketed as Space-anonymous.
Fix: copy the calling task's contextvar context (contextvars.copy_context())
and run the worker inside it. The request — and therefore the user's Pro
quota attribution — now survives the thread boundary.
The earlier hf_oauth + LoginButton change was unrelated (red herring).
X-IP-Token is set from the standard HF login cookie at the proxy, not
via OAuth. The OAuth changes stay as harmless decoration.
- backend.py +11 -1
|
@@ -7,6 +7,7 @@ divergence between local and HF Spaces deployment.
|
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
import asyncio
|
|
|
|
| 10 |
import os
|
| 11 |
import pathlib
|
| 12 |
import sys
|
|
@@ -492,7 +493,16 @@ class ComfyUILibraryBackend:
|
|
| 492 |
_free_memory()
|
| 493 |
_push(None) # sentinel: stop the consumer
|
| 494 |
|
| 495 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 496 |
thread.start()
|
| 497 |
|
| 498 |
while True:
|
|
|
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
import asyncio
|
| 10 |
+
import contextvars
|
| 11 |
import os
|
| 12 |
import pathlib
|
| 13 |
import sys
|
|
|
|
| 493 |
_free_memory()
|
| 494 |
_push(None) # sentinel: stop the consumer
|
| 495 |
|
| 496 |
+
# ZeroGPU's @spaces.GPU wrapper reads the user's identity from the
|
| 497 |
+
# current Gradio request via gradio.context.LocalContext.request,
|
| 498 |
+
# which is a contextvar. Plain threads don't inherit contextvars, so
|
| 499 |
+
# without this the worker sees request=None, X-IP-Token never gets
|
| 500 |
+
# read, and `client.schedule` raises "Space app has reached its GPU
|
| 501 |
+
# limit" (token-is-None branch in spaces/zero/client.py:138). Copy
|
| 502 |
+
# the calling task's context so the request — and therefore the Pro
|
| 503 |
+
# user's quota attribution — survives the thread boundary.
|
| 504 |
+
ctx = contextvars.copy_context()
|
| 505 |
+
thread = threading.Thread(target=ctx.run, args=(_worker,), daemon=True)
|
| 506 |
thread.start()
|
| 507 |
|
| 508 |
while True:
|