techfreakworm commited on
Commit
1e054fe
·
unverified ·
1 Parent(s): 0a98b65

fix(spaces): propagate Gradio request contextvar to GPU worker thread

Browse files

Real root cause of 'Space app has reached its GPU limit. Try re-running
outside of examples' (raised after a few gens, while the user dashboard
still shows 0/25 minutes used).

ZeroGPU's @spaces.GPU wrapper reads the user's identity from the current
Gradio request via gradio.context.LocalContext.request — a contextvar.
The wrapper at spaces/zero/wrappers.py:187 does:

request = request_var.get(None)
schedule_response = client.schedule(task_id, request=request, ...)

Then spaces/zero/client.py:_get_token_and_payload reads X-IP-Token from
the request headers (set by HF's reverse proxy when the user is signed
into huggingface.co). If the token is None, client.schedule:138 raises
the 'Space app has reached its GPU limit' message and falls back to the
per-Space anonymous rate limit — which exhausts after a few calls.

Our backend ran _execute_workflow from a plain threading.Thread for the
event-streaming pattern. Python threads don't inherit contextvars, so
inside the worker the request was None even though the Gradio handler
saw the real request. Token never reached the scheduler; user got
bucketed as Space-anonymous.

Fix: copy the calling task's contextvar context (contextvars.copy_context())
and run the worker inside it. The request — and therefore the user's Pro
quota attribution — now survives the thread boundary.

The earlier hf_oauth + LoginButton change was unrelated (red herring).
X-IP-Token is set from the standard HF login cookie at the proxy, not
via OAuth. The OAuth changes stay as harmless decoration.

Files changed (1) hide show
  1. backend.py +11 -1
backend.py CHANGED
@@ -7,6 +7,7 @@ divergence between local and HF Spaces deployment.
7
  from __future__ import annotations
8
 
9
  import asyncio
 
10
  import os
11
  import pathlib
12
  import sys
@@ -492,7 +493,16 @@ class ComfyUILibraryBackend:
492
  _free_memory()
493
  _push(None) # sentinel: stop the consumer
494
 
495
- thread = threading.Thread(target=_worker, daemon=True)
 
 
 
 
 
 
 
 
 
496
  thread.start()
497
 
498
  while True:
 
7
  from __future__ import annotations
8
 
9
  import asyncio
10
+ import contextvars
11
  import os
12
  import pathlib
13
  import sys
 
493
  _free_memory()
494
  _push(None) # sentinel: stop the consumer
495
 
496
+ # ZeroGPU's @spaces.GPU wrapper reads the user's identity from the
497
+ # current Gradio request via gradio.context.LocalContext.request,
498
+ # which is a contextvar. Plain threads don't inherit contextvars, so
499
+ # without this the worker sees request=None, X-IP-Token never gets
500
+ # read, and `client.schedule` raises "Space app has reached its GPU
501
+ # limit" (token-is-None branch in spaces/zero/client.py:138). Copy
502
+ # the calling task's context so the request — and therefore the Pro
503
+ # user's quota attribution — survives the thread boundary.
504
+ ctx = contextvars.copy_context()
505
+ thread = threading.Thread(target=ctx.run, args=(_worker,), daemon=True)
506
  thread.start()
507
 
508
  while True: