Spaces:

ResembleAI
/

Dramabox

Running on Zero

App Files Files Community

tedi-resemble

Manmay commited on 1 day ago

Commit

48d32ab

1 Parent(s): e44cca0

Fix ZeroGPU duration: dynamic per-sentence sizing, cap at 120s (#3)

Browse files

- Fix ZeroGPU duration: dynamic per-sentence sizing, cap at 120s (d621c9389cc82dd28fddd421518e8937c72cac60)
- Tighten GPU window: 10s base + 1s/sentence, quote-aware count (fc8ba6b960b627a6efce811ce1ea509f8d29ca18)

Co-authored-by: Manmay Nakhashi <Manmay@users.noreply.huggingface.co>

Files changed (1) hide show

app.py +63 -1

app.py CHANGED Viewed

@@ -182,8 +182,70 @@ async def homepage():
         return f.read()
 @app.api()
-@spaces.GPU(duration=600)
 def generate_audio(
     prompt: str,
     audio_ref: FileData | None,

         return f.read()
+_GPU_BASE_S = 10           # bare-minimum window even for a single sentence
+_GPU_PER_SENTENCE_S = 1    # add 1 s per additional sentence
+_GPU_CAP_S = 110           # leave 10 s headroom under ZeroGPU's 120 s ceiling
+def _count_sentences(prompt: str) -> int:
+    """Count TTS sentences in ``prompt`` using the same quote-aware splitter
+    the long-form chunker uses (``src/text_chunker``). Terminators inside
+    ``"..."`` dialogue do **not** count, so the GPU window calc agrees with
+    what the chunker sees — and dialogue-heavy prompts don't get over-budgeted.
+    Always returns ≥1 so a single fragment still gets a real window.
+    """
+    if not prompt or not prompt.strip():
+        return 1
+    try:
+        from text_chunker import split_sentences_outside_quotes
+        n = len(split_sentences_outside_quotes(prompt))
+    except Exception:
+        # Fallback: cheap punctuation count if the chunker import fails for any
+        # reason — preserves the ability to size GPU windows even on a broken
+        # import path.
+        n = sum(1 for ch in prompt if ch in ".!?")
+    return max(1, n)
+def _gpu_duration(
+    prompt: str,
+    audio_ref: FileData | None,
+    cfg: float,
+    stg: float,
+    dur_mult: float,
+    gen_dur: float,
+    ref_dur: float,
+    seed: int,
+    denoise_ref: bool = True,
+    max_chunk_duration: float = 45.0,
+    target_chunk_duration: float = 37.0,
+    crossfade_ms: float = 50.0,
+) -> int:
+    """Per-call ZeroGPU window sizing.
+    ZeroGPU rejects any static decorator value above the account's per-call
+    cap (120 s on PRO), but ``duration=`` also accepts a callable evaluated
+    per request — we ask only for what each call needs:
+        window = _GPU_BASE_S + (num_sentences - 1) × _GPU_PER_SENTENCE_S
+    Defaults: 10 s base + 1 s/extra sentence, capped at 110 s (a 10 s safety
+    margin under the 120 s ZeroGPU ceiling). Numbers tuned to observed
+    runtime on this Space's hardware.
+    Under-allocating is worse than over: if a call exceeds its allocated
+    duration ZeroGPU kills it (the user sees a generation failure) **and**
+    daily quota is still consumed against the time actually spent. Shorter
+    allocations *do* improve queue priority (per HF docs), which is why we
+    don't just pin everything at 110.
+    """
+    n = _count_sentences(prompt)
+    needed = _GPU_BASE_S + (n - 1) * _GPU_PER_SENTENCE_S
+    return max(_GPU_BASE_S, min(needed, _GPU_CAP_S))
 @app.api()
+@spaces.GPU(duration=_gpu_duration)
 def generate_audio(
     prompt: str,
     audio_ref: FileData | None,