Spaces:
Build error
Build error
| title: SignBridge | |
| emoji: π€ | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Real-time ASL β English speech on AMD MI300X. | |
| # SignBridge β real-time ASL β speech | |
| Two people who couldn't communicate, now can. | |
| A deaf person signs into the webcam. SignBridge β a multi-stage vision + reasoning + voice pipeline running on a single AMD Instinct MI300X β translates the signs into spoken English in under 2 seconds. | |
| Submission for the **AMD Developer Hackathon** (LabLab.ai, May 2026) β **Track 3: Vision & Multimodal AI**. | |
| ## How it works | |
| ``` | |
| webcam frames β MediaPipe Holistic β trained sign classifier | |
| (1β5 fps) (543-dim pose) (WLASL Top-100 + alphabet) | |
| β | |
| βΌ | |
| Llama-3.1-8B sentence composer | |
| β | |
| βΌ | |
| Coqui XTTS-v2 β speech | |
| ``` | |
| All four stages run **concurrently on a single AMD Instinct MI300X** via AMD Developer Cloud. Total weights ~22 GB on a 192 GB GPU β fits with margin for KV cache + serving overhead. | |
| ## V1 use cases | |
| 1. **ASL fingerspelling alphabet** β sign AβZ and 0β9 β AI speaks the letters / numbers | |
| 2. **Top-50 WLASL signs** (hello, thank you, name, please, sorry, family, eat, drink, work, β¦) β AI composes grammatical English sentences | |
| V1 is **one-way**: deaf signs β hearing hears. Reverse direction (speech β on-screen text) is V2. | |
| ## Why AMD | |
| The MI300X's 192 GB HBM3 and 5.3 TB/s memory bandwidth let the entire multi-stage pipeline (sign classifier + Llama-3.1-8B + XTTS-v2) run concurrently on a single GPU. Bandwidth-bound streaming workload is the textbook MI300X use case. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables. | |
| ## Why this matters (business case) | |
| Sign-language interpreters cost **$50β200 per hour** and are scarce. Courts, hospitals, schools, and public services **must by law** provide interpretation (ADA Title II/III in the US, EAA 2025 in the EU). Sorenson VRS β the dominant relay-services provider β books **$4B+ in annual revenue** in this space. SignBridge is the open-source backbone that any country, NGO, or enterprise can deploy on their own AMD compute. | |
| ## Privacy | |
| Session-only. Frames and audio are processed in-memory and not persisted server-side beyond the WebSocket / HTTP session. | |
| ## Local dev | |
| ```bash | |
| # Setup | |
| pip install -r requirements.txt | |
| cp .env.example .env # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback) | |
| # Run the Gradio app | |
| python app.py | |
| # Run the inference backend (point at AMD Dev Cloud or local ROCm) | |
| python -m signbridge.backend | |
| # Train the classifier on WLASL Top-100 (Day 2 task β run on AMD Dev Cloud) | |
| python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30 | |
| ``` | |
| ## Datasets used | |
| - [WLASL](https://github.com/dxli94/WLASL) β Word-Level American Sign Language; we use the Top-100 subset | |
| - ASL fingerspelling alphabet (open dataset) | |
| ## Models pulled from Hugging Face Hub | |
| - `meta-llama/Llama-3.1-8B-Instruct` β sentence composer | |
| - `coqui/XTTS-v2` β text-to-speech | |
| - (V2 stretch) `openai/whisper-large-v3` β for the reverse direction | |
| ## License | |
| MIT. See [`LICENSE`](LICENSE). | |
| ## Status | |
| Active development β see `CLAUDE.md` for the working state and `docs/walkthrough.md` for the technical writeup. | |