Spaces:

lablab-ai-amd-developer-hackathon
/

signbridge

Build error

LucasLooTan commited on about 11 hours ago

Commit

2de9ac2

1 Parent(s): e831a7f

fix: 4 BLOCKERs + 6 IMPORTANTs from deep-check audit

BLOCKERs:
- vlm.py: filter VLM-returned tokens against the closed vocabulary set.
Off-vocab strings ('letter', 'no_sign', 'n/a') were getting stamped
with a fake 0.85 confidence and leaking into the demo.
- space.py: pass the _new_session FACTORY (callable) to gr.State, not
the result. Guarantees per-tab isolation; the previous form has
observed state-sharing across browser sessions in some Gradio 4.x
configurations.
- app.py: bind 0.0.0.0 by default. HF Spaces' reverse proxy expects
the app on the container's external interface; 127.0.0.1-only
silently rejects all incoming traffic on the live Space. Local
sandboxes still override via SIGNBRIDGE_HOST=127.0.0.1.
- README.md: bump frontmatter sdk_version 4.44.0 -> 4.44.1 to match
requirements.txt pin (avoid HF Spaces gradio-skew build issues).

IMPORTANTs:
- imageio.py: apply EXIF rotation via ImageOps.exif_transpose so
phone-camera photos arrive at the VLM upright (was silently feeding
rotated images on every iPhone gold-set sample).
- imageio.py: handle float dtype properly in array_to_rgb. Naive
.astype(np.uint8) on float[0,1] was truncating to all-zeros — the
same black-frame failure mode the alpha fix already eliminated for
the path-load codepath.
- tts.py: silent-stub returns None on import failure, not '' (the
empty string was a type-annotation lie and broke gr.Audio).
- tts.py: cap retry attempts on transient XTTS load failure (3) and
use threading.Lock for the singleton — prevents a single bad cold
start from permanently muting the demo and prevents concurrent
double-loads of the 2 GB model.
- tts.py: switch cache key from abs(hash(text)) to sha256. Python's
hash() is salted per-process by default, so the cache effectively
reset every cold start; sha256 is stable across processes.
- composer/sentence.py + recognizer/vlm.py: log only exception type
name (not full message). httpx surfaces request URLs with embedded
credentials in error messages; logger.exception() was a credential-
leak surface in public HF Space stdout.
- composer/sentence.py: fall back to naive_join when the LLM returns
empty content. Previously this path returned '' silently and the UI
played no audio with no error.

68 tests still passing; ruff clean; live VLM smoke test confirms
recognizer + composer still work end-to-end (sign A -> 'A' 0.85,
composer 'Hello, my name is Lucas.').

Files changed (7) hide show

README.md +1 -1
app.py +6 -4
signbridge/composer/sentence.py +15 -4
signbridge/imageio.py +35 -7
signbridge/recognizer/vlm.py +15 -3
signbridge/space.py +7 -1
signbridge/voice/tts.py +52 -25

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🤟
 colorFrom: indigo
 colorTo: pink
 sdk: gradio
-sdk_version: 4.44.0
 app_file: app.py
 pinned: false
 thumbnail: assets/cover.png

 colorFrom: indigo
 colorTo: pink
 sdk: gradio
+sdk_version: 4.44.1
 app_file: app.py
 pinned: false
 thumbnail: assets/cover.png

app.py CHANGED Viewed

@@ -17,10 +17,12 @@ from signbridge.space import build_demo
 def main() -> None:
     load_dotenv()
     demo = build_demo()
-    # On HF Spaces the SERVER_NAME env defaults to 0.0.0.0; for local dev we
-    # honour SIGNBRIDGE_HOST (default 127.0.0.1) so the boot test isn't blocked
-    # by sandbox/proxy localhost-accessibility checks.
-    host = os.getenv("SIGNBRIDGE_HOST", "127.0.0.1")
     port = int(os.getenv("SIGNBRIDGE_PORT", "7860"))
     demo.launch(
         server_name=host,

 def main() -> None:
     load_dotenv()
     demo = build_demo()
+    # Bind 0.0.0.0 by default — HF Spaces' reverse proxy expects the app to
+    # listen on the container's external interface, and 127.0.0.1-only would
+    # silently reject all incoming requests on the live Space. For local dev
+    # boot-tests inside sandboxes that can't talk to 0.0.0.0, override with
+    # `SIGNBRIDGE_HOST=127.0.0.1`.
+    host = os.getenv("SIGNBRIDGE_HOST", "0.0.0.0")
     port = int(os.getenv("SIGNBRIDGE_PORT", "7860"))
     demo.launch(
         server_name=host,

signbridge/composer/sentence.py CHANGED Viewed

@@ -125,12 +125,23 @@ def compose_sentence(signs: Sequence[str]) -> str:
             temperature=0.2,
             max_tokens=120,
         )
-        text = resp.choices[0].message.content or ""
-        return _strip_quotes(text.strip())
-    except Exception:  # noqa: BLE001 — broad catch is intentional at the boundary
-        logger.exception("composer LLM call failed; falling back to naive joiner.")
         return _naive_join(signs)
 def _strip_quotes(text: str) -> str:
     return re.sub(r'^["\']|["\']$', "", text).strip()

             temperature=0.2,
             max_tokens=120,
         )
+        text = (resp.choices[0].message.content or "").strip()
+    except Exception as exc:  # noqa: BLE001 — broad catch is intentional at the boundary
+        # Log only the exception type; full message can include the request
+        # URL with embedded credentials when the OpenAI-compatible client
+        # surfaces an httpx error.
+        logger.warning("composer LLM call failed: %s", type(exc).__name__)
         return _naive_join(signs)
+    cleaned = _strip_quotes(text)
+    if not cleaned:
+        # LLM returned empty content — fall back to the naive joiner so the
+        # demo still produces *something* readable instead of silently
+        # playing no audio.
+        logger.info("composer LLM returned empty content; using naive joiner.")
+        return _naive_join(signs)
+    return cleaned
 def _strip_quotes(text: str) -> str:
     return re.sub(r'^["\']|["\']$', "", text).strip()

signbridge/imageio.py CHANGED Viewed

@@ -17,10 +17,11 @@ import numpy as np
 def load_rgb(source: str | Path | bytes | io.IOBase) -> np.ndarray:
     """Load an image as an RGB ndarray, compositing any alpha onto white.
-    Accepts a filesystem path, raw bytes, or any file-like object PIL
-    knows how to open.
     """
-    from PIL import Image
     if isinstance(source, (str, Path)):
         img = Image.open(source)
@@ -29,6 +30,14 @@ def load_rgb(source: str | Path | bytes | io.IOBase) -> np.ndarray:
     else:
         img = Image.open(source)
     return _composite_to_rgb(img)
@@ -36,21 +45,40 @@ def array_to_rgb(arr: np.ndarray) -> np.ndarray:
     """Convert an arbitrary-shape ndarray (H,W,3 or H,W,4) to RGB on white.
     Used at the recognizer's API boundary in case a caller hands us a
-    pre-decoded RGBA array.
     """
     from PIL import Image
     if arr.ndim == 2:
-        img = Image.fromarray(arr).convert("RGB")
         return np.asarray(img)
     if arr.shape[-1] == 3:
-        return arr if arr.dtype == np.uint8 else arr.astype(np.uint8)
     if arr.shape[-1] == 4:
-        img = Image.fromarray(arr, mode="RGBA")
         return _composite_to_rgb(img)
     raise ValueError(f"unsupported array shape for RGB conversion: {arr.shape}")
 def _composite_to_rgb(img) -> np.ndarray:  # noqa: ANN001
     from PIL import Image

 def load_rgb(source: str | Path | bytes | io.IOBase) -> np.ndarray:
     """Load an image as an RGB ndarray, compositing any alpha onto white.
+    Also applies EXIF rotation so phone-camera photos arrive at the VLM
+    upright. Accepts a filesystem path, raw bytes, or any file-like object
+    PIL knows how to open.
     """
+    from PIL import Image, ImageOps
     if isinstance(source, (str, Path)):
         img = Image.open(source)
     else:
         img = Image.open(source)
+    # Force-load before EXIF transpose so the image is in memory and the
+    # source file/buffer can be released. Required because the rest of the
+    # pipeline holds the array, not the PIL handle.
+    img.load()
+    # Honour EXIF orientation. Phone cameras often store landscape rotation
+    # in EXIF rather than rotating the pixel data; without this every
+    # portrait-mode photo arrives at the VLM rotated 90°/180°/270°.
+    img = ImageOps.exif_transpose(img)
     return _composite_to_rgb(img)
     """Convert an arbitrary-shape ndarray (H,W,3 or H,W,4) to RGB on white.
     Used at the recognizer's API boundary in case a caller hands us a
+    pre-decoded RGBA array. Float arrays in [0, 1] are scaled to uint8;
+    naive `.astype(np.uint8)` would truncate to all-zeros (the same
+    black-frame failure mode the alpha fix already eliminated for paths).
     """
     from PIL import Image
     if arr.ndim == 2:
+        img = Image.fromarray(_to_uint8(arr)).convert("RGB")
         return np.asarray(img)
+    if arr.ndim != 3:
+        raise ValueError(f"unsupported array shape for RGB conversion: {arr.shape}")
     if arr.shape[-1] == 3:
+        return _to_uint8(arr)
     if arr.shape[-1] == 4:
+        img = Image.fromarray(_to_uint8(arr), mode="RGBA")
         return _composite_to_rgb(img)
     raise ValueError(f"unsupported array shape for RGB conversion: {arr.shape}")
+def _to_uint8(arr: np.ndarray) -> np.ndarray:
+    """Coerce an ndarray to uint8 without truncating float [0, 1] to zero."""
+    if arr.dtype == np.uint8:
+        return arr
+    if np.issubdtype(arr.dtype, np.floating):
+        # Heuristic: if max is ≤ 1.0, it's a normalised [0, 1] image.
+        # Otherwise assume the caller already scaled to 0–255.
+        if arr.size and float(arr.max()) <= 1.0:
+            arr = arr * 255.0
+        return np.clip(arr, 0, 255).astype(np.uint8)
+    if np.issubdtype(arr.dtype, np.integer):
+        return np.clip(arr, 0, 255).astype(np.uint8)
+    return arr.astype(np.uint8)
 def _composite_to_rgb(img) -> np.ndarray:  # noqa: ANN001
     from PIL import Image

signbridge/recognizer/vlm.py CHANGED Viewed

@@ -41,6 +41,11 @@ _VLM_VOCAB = (
     "see know understand think feel happy sad tired hungry wait "
     "unknown"
 )
 _PROMPT = (
     "You are an expert in American Sign Language (ASL). Look at this image of a "
@@ -163,10 +168,17 @@ def recognize_sign_from_frame(frame: np.ndarray) -> tuple[str, float]:
         )
         raw = (resp.choices[0].message.content or "").strip()
         token = _normalise(raw)
-    except Exception:  # noqa: BLE001 — broad at the boundary on purpose
-        logger.exception("VLM recognition failed; returning stub.")
         return "", 0.0
-    if token in {"", "unknown"}:
         return "", 0.0
     return token, 0.85

     "see know understand think feel happy sad tired hungry wait "
     "unknown"
 )
+# Pre-built set for membership tests at recognition time. Tokens not in this
+# set get suppressed (confidence 0.0) — VLMs hallucinate strings like
+# "letter", "no_sign", "n/a" that would otherwise leak into the demo with a
+# fake 0.85 confidence.
+_VLM_VOCAB_SET = frozenset(_VLM_VOCAB.split())
 _PROMPT = (
     "You are an expert in American Sign Language (ASL). Look at this image of a "
         )
         raw = (resp.choices[0].message.content or "").strip()
         token = _normalise(raw)
+    except Exception as exc:  # noqa: BLE001 — broad at the boundary on purpose
+        # Log only the exception type — full message can include the request
+        # URL with embedded credentials when the OpenAI-compatible client
+        # bubbles up an httpx error. We pay log fidelity to avoid leaking the
+        # provider key into a public HF Space stdout.
+        logger.warning("VLM recognition failed: %s", type(exc).__name__)
         return "", 0.0
+    # Suppress any token that isn't in the closed vocabulary the prompt
+    # explicitly requested. Without this, a VLM that returns "letter" or
+    # "no_sign" would be reported as a confident prediction.
+    if token in {"", "unknown"} or token not in _VLM_VOCAB_SET:
         return "", 0.0
     return token, 0.85

signbridge/space.py CHANGED Viewed

@@ -133,7 +133,13 @@ def build_demo() -> gr.Blocks:
             "sentence and hear it spoken aloud. Powered by AMD Instinct MI300X."
         )
-        state = gr.State(_new_session())
         with gr.Row():
             with gr.Column(scale=3):

             "sentence and hear it spoken aloud. Powered by AMD Instinct MI300X."
         )
+        # Pass the FACTORY (callable), not the result. Gradio invokes
+        # callable State values once per session — guarantees per-tab
+        # isolation. Using `gr.State(_new_session())` instead would create
+        # a single shared instance at module-load time, which has been
+        # observed to leak state across browser tabs in some Gradio 4.x
+        # configurations.
+        state = gr.State(_new_session)
         with gr.Row():
             with gr.Column(scale=3):

signbridge/voice/tts.py CHANGED Viewed

@@ -7,9 +7,11 @@ so the Gradio app still produces something playable.
 from __future__ import annotations
 import logging
 import os
 import tempfile
 from pathlib import Path
 logger = logging.getLogger(__name__)
@@ -17,37 +19,55 @@ logger = logging.getLogger(__name__)
 DEFAULT_MODEL = os.getenv(
     "SIGNBRIDGE_TTS_MODEL", "tts_models/multilingual/multi-dataset/xtts_v2"
 )
 class _TTSEngine:
     def __init__(self) -> None:
         self._tts = None
-        self._loaded = False
-        self._unavailable_logged = False
         self._cache_dir = Path(tempfile.gettempdir()) / "signbridge_tts"
         self._cache_dir.mkdir(parents=True, exist_ok=True)
     def _ensure_loaded(self) -> None:
-        if self._loaded:
             return
-        try:
-            from TTS.api import TTS  # type: ignore[import-not-found]
-        except ImportError:
-            if not self._unavailable_logged:
-                logger.info(
                     "TTS package not installed; voice output will be silent. "
                     "Install via `pip install TTS>=0.22`."
                 )
-                self._unavailable_logged = True
-            self._loaded = True
-            return
-        try:
-            self._tts = TTS(model_name=DEFAULT_MODEL, progress_bar=False)
-        except Exception:  # noqa: BLE001
-            logger.exception("XTTS-v2 load failed; voice output will be silent.")
-            self._tts = None
-        self._loaded = True
     def synthesize(self, text: str) -> str | None:
         if not text:
@@ -56,7 +76,7 @@ class _TTSEngine:
         if self._tts is None:
             return self._silent_stub(text)
-        out_path = self._cache_dir / f"{abs(hash(text))}.wav"
         if out_path.exists():
             return str(out_path)
@@ -67,21 +87,28 @@ class _TTSEngine:
                 language="en",
                 # XTTS-v2 needs a speaker reference; omit to use the default voice.
             )
-        except Exception:  # noqa: BLE001
-            logger.exception("XTTS synthesis failed for %r; emitting silent stub.", text)
             return self._silent_stub(text)
         return str(out_path)
-    def _silent_stub(self, text: str) -> str:
-        """Emit a 0.5 s silent WAV so the Gradio audio component has something to play."""
-        out_path = self._cache_dir / f"silent_{abs(hash(text))}.wav"
         if out_path.exists():
             return str(out_path)
         try:
             import numpy as np
             import soundfile as sf  # type: ignore[import-not-found]
         except ImportError:
-            return ""
         sf.write(str(out_path), np.zeros(8000, dtype="int16"), 16000)
         return str(out_path)

 from __future__ import annotations
+import hashlib
 import logging
 import os
 import tempfile
+import threading
 from pathlib import Path
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = os.getenv(
     "SIGNBRIDGE_TTS_MODEL", "tts_models/multilingual/multi-dataset/xtts_v2"
 )
+# Cap on consecutive transient load failures before we give up retrying.
+# Without this we permanently mute the demo on a single bad cold-start.
+_MAX_LOAD_FAILURES = 3
+def _cache_key(text: str) -> str:
+    """Stable per-text cache key. Python's `hash()` is salted per-process
+    (PYTHONHASHSEED) so it changes every cold start and defeats the cache."""
+    return hashlib.sha256(text.encode("utf-8")).hexdigest()[:16]
 class _TTSEngine:
     def __init__(self) -> None:
         self._tts = None
+        self._import_failed = False
+        self._load_failures = 0
         self._cache_dir = Path(tempfile.gettempdir()) / "signbridge_tts"
         self._cache_dir.mkdir(parents=True, exist_ok=True)
+        self._load_lock = threading.Lock()
     def _ensure_loaded(self) -> None:
+        if self._tts is not None or self._import_failed:
             return
+        if self._load_failures >= _MAX_LOAD_FAILURES:
+            return
+        with self._load_lock:
+            # Re-check after acquiring lock (double-checked locking).
+            if self._tts is not None or self._import_failed:
+                return
+            try:
+                from TTS.api import TTS  # type: ignore[import-not-found]
+            except ImportError:
+                logger.warning(
                     "TTS package not installed; voice output will be silent. "
                     "Install via `pip install TTS>=0.22`."
                 )
+                self._import_failed = True
+                return
+            try:
+                self._tts = TTS(model_name=DEFAULT_MODEL, progress_bar=False)
+            except Exception as exc:  # noqa: BLE001
+                self._load_failures += 1
+                logger.warning(
+                    "XTTS-v2 load failed (attempt %d/%d): %s",
+                    self._load_failures,
+                    _MAX_LOAD_FAILURES,
+                    type(exc).__name__,
+                )
     def synthesize(self, text: str) -> str | None:
         if not text:
         if self._tts is None:
             return self._silent_stub(text)
+        out_path = self._cache_dir / f"{_cache_key(text)}.wav"
         if out_path.exists():
             return str(out_path)
                 language="en",
                 # XTTS-v2 needs a speaker reference; omit to use the default voice.
             )
+        except Exception as exc:  # noqa: BLE001
+            logger.warning(
+                "XTTS synthesis failed (%s); emitting silent stub.", type(exc).__name__
+            )
             return self._silent_stub(text)
         return str(out_path)
+    def _silent_stub(self, text: str) -> str | None:
+        """Emit a 0.5 s silent WAV so the Gradio audio component has something to play.
+        Returns None when even the stub can't be written (numpy/soundfile not
+        available); callers must handle None and present a clean "no audio"
+        UI rather than feeding "" into a Gradio Audio component.
+        """
+        out_path = self._cache_dir / f"silent_{_cache_key(text)}.wav"
         if out_path.exists():
             return str(out_path)
         try:
             import numpy as np
             import soundfile as sf  # type: ignore[import-not-found]
         except ImportError:
+            return None
         sf.write(str(out_path), np.zeros(8000, dtype="int16"), 16000)
         return str(out_path)