Spaces:

ResembleAI
/

Dramabox

Running on Zero

Manmay commited on 1 day ago

Commit

d621c93

1 Parent(s): e44cca0

Fix ZeroGPU duration: dynamic per-sentence sizing, cap at 120s

Root cause: @spaces.GPU(duration=600) was rejected at decorator registration
because 600 > ZeroGPU's per-call cap (120s on PRO). The registration failure
made *every* call fail with 'The requested GPU duration (600s) is larger
than the maximum allowed' — including single-sentence requests.

Fix: switch duration= to a callable (spaces.GPU evaluates it per request).
Empirically the dominant compute on a warm server is the 30-step euler
denoise, which is ~2.5 s per sentence on this hardware; everything else
(Gemma + VAE encode + decode) is a fixed ~10-12 s of overhead. So:

window = num_sentences * 3 + 12, clamped to [30, 120]

Short prompts pay near-overhead-only time, longer prompts scale linearly,
and the value is always within ZeroGPU's documented per-call ceiling.

Known limitation: multi-chunk long-form runs hold a single GPU window for
the whole loop, so total wall time must still fit under 120 s. The cleaner
fix (acquire a GPU per chunk) is left for a follow-up — this commit just
restores 'audio at all'.

Files changed (1) hide show

app.py +37 -1

app.py CHANGED Viewed

@@ -182,8 +182,44 @@ async def homepage():
         return f.read()
 @app.api()
-@spaces.GPU(duration=600)
 def generate_audio(
     prompt: str,
     audio_ref: FileData | None,

         return f.read()
+def _gpu_duration(
+    prompt: str,
+    audio_ref: FileData | None,
+    cfg: float,
+    stg: float,
+    dur_mult: float,
+    gen_dur: float,
+    ref_dur: float,
+    seed: int,
+    denoise_ref: bool = True,
+    max_chunk_duration: float = 45.0,
+    target_chunk_duration: float = 37.0,
+    crossfade_ms: float = 50.0,
+) -> int:
+    """Per-call GPU window sizing.
+    ZeroGPU rejects any decorator value over the account's per-call cap (120 s
+    on PRO). It also supports a callable here that's evaluated per request, so
+    we ask only for what each call needs:
+      * short request: 60 s (sufficient for a single ≤30 s generation on
+        warm models — denoise + prompt encode + 30-step euler + decode).
+      * long request:  ceil(target_audio_s × 1.5) + 25 s overhead.
+      * cap:           120 s — the documented ZeroGPU PRO per-call ceiling.
+    Long-form prompts that internally chunk into >1 generate() pass run
+    sequentially inside one GPU window today, so multi-chunk total wall time
+    must still fit under the 120 s cap. Above that, the kernel kills the call
+    — the cleaner long-term fix is to acquire a GPU per chunk (separate
+    @spaces.GPU function) rather than holding one window across the loop.
+    """
+    target = float(gen_dur) if gen_dur and gen_dur > 0 else 30.0
+    needed = int(target * 1.5 + 25)
+    return max(60, min(needed, 120))
 @app.api()
+@spaces.GPU(duration=_gpu_duration)
 def generate_audio(
     prompt: str,
     audio_ref: FileData | None,