Spaces:
Build error
Build error
File size: 4,278 Bytes
18d028b 2de9ac2 18d028b 96fe5d4 4499b6e d7d6b24 18d028b 8b64ea8 18d028b 277d6c0 18d028b 277d6c0 18d028b 4499b6e 18d028b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ---
title: SignBridge
emoji: π€
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
thumbnail: assets/cover.png
license: mit
short_description: Real-time ASL β English speech on AMD MI300X.
---
# SignBridge β real-time ASL β speech
Two people who couldn't communicate, now can.
A deaf person signs into the webcam. SignBridge β a multi-stage vision + reasoning + voice pipeline running on a single AMD Instinct MI300X β translates the signs into spoken English in under 2 seconds.
Submission for the **AMD Developer Hackathon** (LabLab.ai, May 2026) β **Track 3: Vision & Multimodal AI**.
## How it works
```
webcam frames β MediaPipe Holistic β trained sign classifier
(1β5 fps) (543-dim pose) (WLASL Top-100 + alphabet)
β
βΌ
Llama-3.1-8B sentence composer
β
βΌ
Coqui XTTS-v2 β speech
```
All four stages run **concurrently on a single AMD Instinct MI300X** via AMD Developer Cloud. Total weights ~22 GB on a 192 GB GPU β fits with margin for KV cache + serving overhead.
## V1 use cases
1. **ASL fingerspelling alphabet** β sign AβZ and 0β9 β AI speaks the letters / numbers
2. **Top-50 WLASL signs** (hello, thank you, name, please, sorry, family, eat, drink, work, β¦) β AI composes grammatical English sentences
V1 is **one-way**: deaf signs β hearing hears. Reverse direction (speech β on-screen text) is V2.
## Why AMD
The MI300X's 192 GB HBM3 fits the entire pipeline (Qwen3-VL-8B + Llama-3.1-8B + XTTS-v2) on one GPU with margin. NVIDIA H100 (80 GB) requires sharding, and the V2 plan to upgrade to a 70B reasoner is impossible on H100 without a 3-GPU cluster. Single-GPU concurrency + 5.3 TB/s memory bandwidth is the actual AMD pitch β practical accessibility tools running globally need the cost-and-availability profile that AMD enables.
## Why this matters (business case)
Sign-language interpreters cost **$50β200 per hour** and are scarce. Courts, hospitals, schools, and public services **must by law** provide interpretation (ADA Title II/III in the US, EAA 2025 in the EU). Sorenson VRS β the dominant relay-services provider β books **$4B+ in annual revenue** in this space. SignBridge is the open-source backbone that any country, NGO, or enterprise can deploy on their own AMD compute.
## Privacy
Session-only. Frames and audio are processed in-memory and not persisted server-side beyond the WebSocket / HTTP session.
## For Deaf-led teams
SignBridge is open-source under MIT license and intentionally scoped to ASL-only V1. The pipeline is a substrate, not a finished product β Deaf-led organisations (schools-for-the-Deaf, NGOs, ministries) are the intended deployers. Other sign languages (BSL, MSL, CSL, ISL, +200 more) deserve their own teams, training data, and Deaf community leadership. See [`docs/walkthrough.md`](docs/walkthrough.md) β "Deployment ethics" for the design principles drawn from the Deaf-led academic literature.
## Local dev
```bash
# Setup
pip install -r requirements.txt
cp .env.example .env # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback)
# Run the Gradio app
python app.py
# Run the inference backend (point at AMD Dev Cloud or local ROCm)
python -m signbridge.backend
# Train the classifier on WLASL Top-100 (Day 2 task β run on AMD Dev Cloud)
python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30
```
## Datasets used
- [WLASL](https://github.com/dxli94/WLASL) β Word-Level American Sign Language; we use the Top-100 subset
- ASL fingerspelling alphabet (open dataset)
## Models pulled from Hugging Face Hub
- `meta-llama/Llama-3.1-8B-Instruct` β sentence composer
- `coqui/XTTS-v2` β text-to-speech
- (V2 stretch) `openai/whisper-large-v3` β for the reverse direction
## License
MIT. See [`LICENSE`](LICENSE).
## Status
Active development β see `CLAUDE.md` for the working state and `docs/walkthrough.md` for the technical writeup.
|