Spaces:
Running on Zero
Fix ZeroGPU duration: dynamic per-sentence sizing, cap at 120s
Browse filesRoot cause: @spaces.GPU(duration=600) was rejected at decorator registration
because 600 > ZeroGPU's per-call cap (120s on PRO). The registration failure
made *every* call fail with 'The requested GPU duration (600s) is larger
than the maximum allowed' — including single-sentence requests.
Fix: switch duration= to a callable (spaces.GPU evaluates it per request).
Empirically the dominant compute on a warm server is the 30-step euler
denoise, which is ~2.5 s per sentence on this hardware; everything else
(Gemma + VAE encode + decode) is a fixed ~10-12 s of overhead. So:
window = num_sentences * 3 + 12, clamped to [30, 120]
Short prompts pay near-overhead-only time, longer prompts scale linearly,
and the value is always within ZeroGPU's documented per-call ceiling.
Known limitation: multi-chunk long-form runs hold a single GPU window for
the whole loop, so total wall time must still fit under 120 s. The cleaner
fix (acquire a GPU per chunk) is left for a follow-up — this commit just
restores 'audio at all'.
|
@@ -182,8 +182,44 @@ async def homepage():
|
|
| 182 |
return f.read()
|
| 183 |
|
| 184 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
@app.api()
|
| 186 |
-
@spaces.GPU(duration=
|
| 187 |
def generate_audio(
|
| 188 |
prompt: str,
|
| 189 |
audio_ref: FileData | None,
|
|
|
|
| 182 |
return f.read()
|
| 183 |
|
| 184 |
|
| 185 |
+
def _gpu_duration(
|
| 186 |
+
prompt: str,
|
| 187 |
+
audio_ref: FileData | None,
|
| 188 |
+
cfg: float,
|
| 189 |
+
stg: float,
|
| 190 |
+
dur_mult: float,
|
| 191 |
+
gen_dur: float,
|
| 192 |
+
ref_dur: float,
|
| 193 |
+
seed: int,
|
| 194 |
+
denoise_ref: bool = True,
|
| 195 |
+
max_chunk_duration: float = 45.0,
|
| 196 |
+
target_chunk_duration: float = 37.0,
|
| 197 |
+
crossfade_ms: float = 50.0,
|
| 198 |
+
) -> int:
|
| 199 |
+
"""Per-call GPU window sizing.
|
| 200 |
+
|
| 201 |
+
ZeroGPU rejects any decorator value over the account's per-call cap (120 s
|
| 202 |
+
on PRO). It also supports a callable here that's evaluated per request, so
|
| 203 |
+
we ask only for what each call needs:
|
| 204 |
+
|
| 205 |
+
* short request: 60 s (sufficient for a single ≤30 s generation on
|
| 206 |
+
warm models — denoise + prompt encode + 30-step euler + decode).
|
| 207 |
+
* long request: ceil(target_audio_s × 1.5) + 25 s overhead.
|
| 208 |
+
* cap: 120 s — the documented ZeroGPU PRO per-call ceiling.
|
| 209 |
+
|
| 210 |
+
Long-form prompts that internally chunk into >1 generate() pass run
|
| 211 |
+
sequentially inside one GPU window today, so multi-chunk total wall time
|
| 212 |
+
must still fit under the 120 s cap. Above that, the kernel kills the call
|
| 213 |
+
— the cleaner long-term fix is to acquire a GPU per chunk (separate
|
| 214 |
+
@spaces.GPU function) rather than holding one window across the loop.
|
| 215 |
+
"""
|
| 216 |
+
target = float(gen_dur) if gen_dur and gen_dur > 0 else 30.0
|
| 217 |
+
needed = int(target * 1.5 + 25)
|
| 218 |
+
return max(60, min(needed, 120))
|
| 219 |
+
|
| 220 |
+
|
| 221 |
@app.api()
|
| 222 |
+
@spaces.GPU(duration=_gpu_duration)
|
| 223 |
def generate_audio(
|
| 224 |
prompt: str,
|
| 225 |
audio_ref: FileData | None,
|