Spaces:

lablab-ai-amd-developer-hackathon
/

signbridge

Running

LucasLooTan Claude Opus 4.7 (1M context) commited on 1 day ago

Commit

2ce7b16

1 Parent(s): e7597f3

feat: gTTS as primary TTS — Coqui XTTS-v2 was never installed on Space

The HF Space had been emitting silent-stub WAV for every Speak click
because the local Coqui TTS package isn't in requirements (2GB+ install
+ slow CPU inference + the XTTS-via-AMD path was promised in a comment
but never wired). Switch to gTTS — Google's free TTS, tiny dep, real
audio in <1s.

Three-tier fallback:
1. Coqui XTTS-v2 if installed locally (best quality, slow).
2. gTTS — fast cloud TTS, free, MP3 output.
3. Silent stub — last resort so gradio Audio doesn't error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show

requirements.txt +4 -1
signbridge/voice/tts.py +36 -15

requirements.txt CHANGED Viewed

@@ -25,7 +25,10 @@ opencv-python-headless>=4.10
 # Inference clients
 openai>=1.54
-# Audio (silent-stub fallback uses soundfile)
 soundfile>=0.12
 # Vision pipeline (MediaPipe Holistic for the pose-debug overlay)

 # Inference clients
 openai>=1.54
+# Audio: gTTS = primary (free, fast, ~tiny dep, real audio).
+# Coqui XTTS-v2 would be higher quality but is 2GB and slow on basic CPU.
+gtts>=2.5
+# soundfile is the silent-stub fallback when even gTTS fails.
 soundfile>=0.12
 # Vision pipeline (MediaPipe Holistic for the pose-debug overlay)

signbridge/voice/tts.py CHANGED Viewed

@@ -72,27 +72,48 @@ class _TTSEngine:
     def synthesize(self, text: str) -> str | None:
         if not text:
             return None
-        self._ensure_loaded()
-        if self._tts is None:
-            return self._silent_stub(text)
-        out_path = self._cache_dir / f"{_cache_key(text)}.wav"
-        if out_path.exists():
-            return str(out_path)
         try:
-            self._tts.tts_to_file(
-                text=text,
-                file_path=str(out_path),
-                language="en",
-                # XTTS-v2 needs a speaker reference; omit to use the default voice.
-            )
         except Exception as exc:  # noqa: BLE001
             logger.warning(
-                "XTTS synthesis failed (%s); emitting silent stub.", type(exc).__name__
             )
-            return self._silent_stub(text)
-        return str(out_path)
     def _silent_stub(self, text: str) -> str | None:
         """Emit a 0.5 s silent WAV so the Gradio audio component has something to play.

     def synthesize(self, text: str) -> str | None:
         if not text:
             return None
+        # Disk cache hit — same text already synthesised this session.
+        cached_wav = self._cache_dir / f"{_cache_key(text)}.wav"
+        if cached_wav.exists():
+            return str(cached_wav)
+        cached_mp3 = self._cache_dir / f"{_cache_key(text)}.mp3"
+        if cached_mp3.exists():
+            return str(cached_mp3)
+        # Tier 1: Coqui XTTS-v2 if installed locally (full quality, slow).
+        self._ensure_loaded()
+        if self._tts is not None:
+            try:
+                self._tts.tts_to_file(
+                    text=text,
+                    file_path=str(cached_wav),
+                    language="en",
+                )
+                return str(cached_wav)
+            except Exception as exc:  # noqa: BLE001
+                logger.warning(
+                    "XTTS synthesis failed (%s); falling through to gTTS.",
+                    type(exc).__name__,
+                )
+        # Tier 2: gTTS — tiny dep, free, fast (Google's TTS API).
         try:
+            from gtts import gTTS  # type: ignore[import-not-found]
+            tts = gTTS(text=text, lang="en", tld="com")
+            tts.save(str(cached_mp3))
+            print(f"[tts] gTTS synthesised: {cached_mp3}", flush=True)
+            return str(cached_mp3)
+        except ImportError:
+            logger.warning("gTTS not installed; falling through to silent stub.")
         except Exception as exc:  # noqa: BLE001
             logger.warning(
+                "gTTS synthesis failed (%s); falling through to silent stub.",
+                type(exc).__name__,
             )
+        # Tier 3: silent placeholder — better than crashing the audio component.
+        return self._silent_stub(text)
     def _silent_stub(self, text: str) -> str | None:
         """Emit a 0.5 s silent WAV so the Gradio audio component has something to play.