Spaces:
Running on Zero
Running on Zero
Manmay Nakhashi commited on
Commit ·
96ef84a
1
Parent(s): ac99a44
Use richer duration estimator in warm server (sentence + non-verbal aware)
Browse filesThe simple len*0.065+1.5 formula in inference_server.py undercounted long
expressive prompts (e.g. 09_villain_sinister_laugh estimated 19.2s but
actual content needs ~28s, so output was clipped). Delegate to
inference.estimate_speech_duration which budgets per-sentence pauses,
laugh repetitions, sighs/gasps and a 2s base padding.
- src/inference_server.py +5 -3
src/inference_server.py
CHANGED
|
@@ -53,9 +53,11 @@ DEFAULT_NEG = "worst quality, inconsistent, robotic, distorted, noise, static, m
|
|
| 53 |
|
| 54 |
|
| 55 |
def estimate_duration(prompt, multiplier=1.1):
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
|
|
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
class TTSServer:
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
def estimate_duration(prompt, multiplier=1.1):
|
| 56 |
+
"""Defer to the richer CLI estimator (sentence-aware + non-verbal action
|
| 57 |
+
budget) so warm-server outputs match the lengths of the per-call CLI runs."""
|
| 58 |
+
from inference import estimate_speech_duration
|
| 59 |
+
base = estimate_speech_duration(prompt)
|
| 60 |
+
return max(3.0, round(base * multiplier, 1))
|
| 61 |
|
| 62 |
|
| 63 |
class TTSServer:
|