Spaces:

ResembleAI
/

Dramabox

Running on Zero

Manmay Nakhashi commited on 27 days ago

Commit

96ef84a

1 Parent(s): ac99a44

Use richer duration estimator in warm server (sentence + non-verbal aware)

The simple len*0.065+1.5 formula in inference_server.py undercounted long
expressive prompts (e.g. 09_villain_sinister_laugh estimated 19.2s but
actual content needs ~28s, so output was clipped). Delegate to
inference.estimate_speech_duration which budgets per-sentence pauses,
laugh repetitions, sighs/gasps and a 2s base padding.

Files changed (1) hide show

src/inference_server.py +5 -3

src/inference_server.py CHANGED Viewed

@@ -53,9 +53,11 @@ DEFAULT_NEG = "worst quality, inconsistent, robotic, distorted, noise, static, m
 def estimate_duration(prompt, multiplier=1.1):
-    quoted = re.findall(r'"([^"]*)"', prompt) or re.findall(r"'([^']*)'", prompt)
-    text = " ".join(quoted) if quoted else prompt
-    return max(3.0, round((len(text) * 0.065 + 1.5) * multiplier, 1))
 class TTSServer:

 def estimate_duration(prompt, multiplier=1.1):
+    """Defer to the richer CLI estimator (sentence-aware + non-verbal action
+    budget) so warm-server outputs match the lengths of the per-call CLI runs."""
+    from inference import estimate_speech_duration
+    base = estimate_speech_duration(prompt)
+    return max(3.0, round(base * multiplier, 1))
 class TTSServer: