Fix ZeroGPU duration: dynamic per-sentence sizing, cap at 120s

#3
by Manmay - opened

Root cause: @spaces.GPU(duration=600) was rejected at decorator registration because 600 exceeds ZeroGPU per-call cap (120s on PRO). That made every call fail with The requested GPU duration (600s) is larger than the maximum allowed, including single-sentence requests.

Fix: switch duration= to a callable. Empirically the dominant compute on a warm server is the 30-step euler denoise (~2.5s per sentence), plus fixed ~10-12s for Gemma + VAE encode + decode. So:

window = num_sentences * 3 + 12, clamped to [30, 120]

Short prompts pay near-overhead-only time, longer prompts scale linearly, and the value is always within ZeroGPU per-call ceiling.

Known limitation: multi-chunk long-form runs still hold one GPU window for the whole loop, so total wall time must fit under 120s. Per-chunk @spaces.GPU acquisitions left for a follow-up.

Manmay changed pull request status to open
tedi-resemble changed pull request status to merged
tedi-resemble deleted the refs/pr/3 ref

Sign up or log in to comment