Spaces:

lablab-ai-amd-developer-hackathon
/

signbridge

Running

LucasLooTan commited on 3 days ago

Commit

549efd4

1 Parent(s): 277d6c0

docs: pitch deck + demo video script + lablab submission form content

Three pure-content deliverables that don't depend on AMD Dev Cloud
credits being live — Lucas can paste these directly when ready.

- docs/pitch-deck.md — 8-slide deck, slide-by-slide content + visual
notes. Built around the four judging criteria. Closes on the
substrate-not-product framing.
- docs/demo-video-script.md — 2:30 shot list, voice-over script,
recording order, editing checklist, export checklist.
- docs/lablab-submission-form.md — copy-paste content for every
field on lablab.ai's submission form, with character counts
pre-validated and tags pre-selected.

Files changed (3) hide show

docs/demo-video-script.md +162 -0
docs/lablab-submission-form.md +139 -0
docs/pitch-deck.md +166 -0

docs/demo-video-script.md ADDED Viewed

	@@ -0,0 +1,162 @@

+# SignBridge — Demo Video Script
+> Target length: **2:30 (≤ 3 min)**. Format: 1080p MP4, MP3 audio. Aspect ratio 16:9.
+> Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing.
+---
+## Story arc (3 acts)
+| Time | Act | Beat |
+|---|---|---|
+| 0:00–0:20 | **Hook** | Open with the human problem; viewer must feel the gap. |
+| 0:20–1:30 | **Demo** | Live SignBridge in action — both fingerspelling AND a motion sign. |
+| 1:30–2:30 | **Why AMD + close** | Architecture diagram + concrete MI300X comparison + open-source ethics + URL. |
+Hard rule: **no slide-by-slide voice-over reading**. The demo should *play live*; voice-over should narrate what we're seeing, not summarise text on screen.
+---
+## Shot list
+### Act 1 — Hook (0:00 → 0:20)
+**Visual A (5 s):** Plain background, bold text card fades in:
+> 70 million deaf people. Interpreters cost $50–200 / hour. They're scarce.
+**Visual B (5 s):** Text card → "What if your phone could just translate?"
+**Visual C (10 s):** Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence — let the viewer feel that the sign means nothing to them.
+**Voice-over:** *(starts at 0:15)*
+> "Most of us can't read this. SignBridge can."
+---
+### Act 2 — Live demo (0:20 → 1:30)
+**Setup (0:20 → 0:25):** 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck.
+**Beat 2A — Fingerspelling (0:25 → 0:55):**
+**Visual (split screen recommended):** Left = your face/hand on webcam, right = the Gradio app receiving frames.
+- Sign **L** clearly. Click **Capture sign**. App shows "detected: L (85%)".
+- Sign **U**. Capture.
+- Sign **C**. Capture.
+- Sign **A**. Capture.
+- Sign **S**. Capture.
+- Click **🔊 Speak**. App composes → speaks: **"Lucas."**
+**Voice-over during this beat:**
+> "First, fingerspelling. I sign each letter, the app captures it, and—" *(pause for the speak)* — *"composed in natural English."*
+**Beat 2B — Motion sign (0:55 → 1:25):**
+**Visual:** Switch tabs to **Record sign**. Hit Record, sign **HELLO** (the wave-from-forehead motion), stop, click Submit.
+- Detected: **hello (85%)**. Click Speak.
+- App says: **"Hello."**
+Repeat one more sign for variety: **THANK_YOU**.
+**Voice-over:**
+> "But fingerspelling alone isn't real ASL — most signs are *motion*. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." *(pause for the speak)*
+**Beat 2C — Two-person scene (1:25 → 1:30):** *(optional but high-impact)*
+**Visual:** You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds.
+**No voice-over** during this beat — let the moment land.
+---
+### Act 3 — Architecture + AMD pitch (1:30 → 2:30)
+**Beat 3A — Architecture diagram (1:30 → 1:55):**
+**Visual:** Static slide showing the pipeline:
+```
+Webcam frames → Qwen3-VL-8B (vision) → Llama-3.1-8B (composer) → XTTS-v2 (speech)
+                            All on a single AMD Instinct MI300X
+```
+**Voice-over:**
+> "Under the hood: a multi-modal pipeline running on a single AMD Instinct MI300X. Vision, reasoning, and voice — all concurrent on one GPU."
+**Beat 3B — The MI300X comparison (1:55 → 2:15):**
+**Visual:** The comparison table from the walkthrough:
+| | MI300X 1× | H100 80 GB |
+|---|---|---|
+| V1 pipeline (~34 GB) | ✅ comfortable | ⚠ tight |
+| V2 with Llama-3.1-70B FP8 (~70 GB extra) | ✅ still fits | ❌ doesn't fit |
+**Voice-over:**
+> "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables."
+**Beat 3C — Substrate + close (2:15 → 2:30):**
+**Visual:** Final slide:
+- "Open source, MIT — github.com/seekerPrice/signbridge"
+- "Hugging Face Space — huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge"
+- "ASL V1. Deaf-led teams own the rest."
+- 🤟 SignBridge
+**Voice-over:**
+> "SignBridge is open source under MIT. It's a substrate — Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching."
+---
+## Voice-over recording tips
+- Record voice **separately** from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6–12 inches away.
+- One take, then cut. Don't try to dub multiple takes line-by-line.
+- Cadence: ~140 words/min. Pause for 0.5 s after each section.
+- If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics.
+---
+## Editing notes
+- **Captions/subtitles required.** Burn in the spoken English text below the speaker's face throughout — both for accessibility and so judges can follow with sound off.
+- **Highlight the recognized token visually.** When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text — judges' eyes need to find it fast.
+- **Music: skip.** The demo is loud enough on its own; background music distracts from the speech-output beats.
+- **Smooth transitions only** — don't use fancy wipes; cut on action.
+- **Final cut export:** 1080p, H.264, MP4, ≤100 MB if possible (lablab uploader has size limits).
+---
+## Prep before recording
+- [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X — *this is the hackathon talk-track*); fall back to HF Inference if not.
+- [ ] Lighting: front-facing soft light. No back-window glare.
+- [ ] Plain background (white wall ideal).
+- [ ] Wear a contrasting solid colour (not patterns) — VLM accuracy improves.
+- [ ] Webcam height: at eye level. Hands need to be in frame for signs.
+- [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record.
+- [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45.
+---
+## Recording order (don't shoot in story order)
+1. **Live demo screen recording first** — 3 takes of the full demo flow, pick the cleanest.
+2. **Voice-over second** — record continuous narration over the picked demo take.
+3. **B-roll of you signing alone** (Act 1 silent shot, Act 2C two-person reaction) — last, since they're easier to re-shoot.
+4. Edit it together in iMovie / CapCut.
+5. Export.
+6. Upload to YouTube as **Unlisted**, copy URL.
+7. Paste URL into lablab.ai submission form's "Video Presentation" field.
+---
+## Export checklist
+- [ ] Length 2:00–3:00
+- [ ] Captions visible throughout
+- [ ] AMD Dev Cloud / MI300X mentioned by name ≥3 times
+- [ ] HF Space URL shown on screen at least once
+- [ ] GitHub URL shown on screen at least once
+- [ ] No copyrighted music / footage
+- [ ] Speaker face visible (judges remember faces)
+- [ ] Final shot: SignBridge logo + URLs

docs/lablab-submission-form.md ADDED Viewed

	@@ -0,0 +1,139 @@

+# SignBridge — lablab.ai Submission Form Content
+> Open https://lablab.ai/ai-hackathons/amd-developer → scroll to bottom → click **Submit project**. Paste each field below into the matching input.
+---
+## Project Title (≤ ~70 chars)
+```
+SignBridge — Real-time ASL → English speech on AMD Instinct MI300X
+```
+(63 characters; safe under platform limit.)
+---
+## Short Description (≤ 150 chars typical)
+```
+Two people who couldn't communicate, now can. Real-time ASL → English speech via Qwen3-VL + Llama-3.1 + XTTS, on a single AMD MI300X.
+```
+(132 characters.)
+---
+## Long Description (no hard limit, ~300 words is the sweet spot)
+```
+SignBridge is a real-time American Sign Language to English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI).
+The user signs at the webcam — either fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) — and SignBridge replies in spoken English. Two people who couldn't communicate, now can.
+Architecture: a multi-stage pipeline (Qwen3-VL-8B for sign recognition, Llama-3.1-8B for sentence composition, Coqui XTTS-v2 for speech synthesis), running concurrently on a single AMD Instinct MI300X via vLLM. The 192 GB HBM3 of one MI300X holds the entire pipeline with margin — the same workload on NVIDIA H100 needs three GPUs.
+For motion-dependent signs (HELLO, THANK_YOU, PLEASE, EAT) the Record-sign tab captures 1.5 s of webcam, samples 4 evenly-spaced frames, and sends them as a multi-image VLM call with NVIDIA-style sequential frame markers in the prompt — most ASL signs are motion, not held poses, so single-frame approaches fundamentally cannot translate them.
+Why this matters: sign-language interpreters cost $50–200 per hour and are scarce. Courts, hospitals, schools, and public services must by law (ADA, EAA 2025) provide interpretation. Sorenson VRS — the dominant relay-services provider — books $4B+ in annual revenue filling this gap. SignBridge is an open-source MIT-licensed substrate that any Deaf-led NGO, school, ministry, or enterprise can deploy on their own AMD compute.
+V1 is ASL-only, deliberately. Sign languages aren't interchangeable — BSL, MSL, CSL, ISL, and 200+ others each deserve their own teams, training data, and Deaf community leadership. (See Bragg et al., "Systemic Biases in Sign Language AI Research", arXiv 2403.02563.)
+Built solo by Lucas Loo Tan Yu Heng, May 5–11, 2026.
+```
+---
+## Technology & Category Tags
+Pick from lablab's tag dropdown — these are the tags that match SignBridge:
+**Primary (must-haves):**
+- `AMD Developer Cloud`
+- `AMD ROCm`
+- `HuggingFace Spaces`
+**Secondary (relevant):**
+- `LLaMA` (Llama-3.1-8B composer)
+- `Qwen` (Qwen3-VL-8B vision)
+- `Gradio`
+- `FastAPI`
+- `Vision`
+- `Multimodal`
+- `Accessibility`
+- `Open Source`
+**Track:** Track 3 — Vision & Multimodal AI
+---
+## Cover Image
+Upload `assets/cover.png` from the repo (1280×640 PNG, ~60 KB).
+If lablab requires a different aspect ratio (e.g. square 1:1), regenerate with `python -m signbridge.scripts.make_cover` after editing the `WIDTH, HEIGHT` constants in `signbridge/scripts/make_cover.py`.
+---
+## Video Presentation
+Paste the YouTube URL of the demo video (uploaded as **Unlisted**).
+Reference content: `docs/demo-video-script.md`.
+---
+## Slide Presentation
+Upload the deck PDF.
+Reference content: `docs/pitch-deck.md`. Build in Google Slides, File → Download → PDF, upload here.
+---
+## Public GitHub Repository
+```
+https://github.com/seekerPrice/signbridge
+```
+---
+## Demo Application Platform
+```
+Hugging Face Space
+```
+---
+## Application URL
+```
+https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
+```
+---
+## Final pre-submit checklist
+Before clicking Submit on lablab:
+- [ ] Title pasted (63 chars)
+- [ ] Short description pasted (132 chars)
+- [ ] Long description pasted (~300 words)
+- [ ] Tags selected (Track 3 + at minimum: AMD Developer Cloud, AMD ROCm, HuggingFace Spaces, Qwen, LLaMA)
+- [ ] Cover image uploaded (assets/cover.png)
+- [ ] Video URL pasted (YouTube unlisted)
+- [ ] Pitch deck PDF uploaded
+- [ ] GitHub URL pasted
+- [ ] HF Space URL pasted
+- [ ] **Track selection: Track 3 — Vision & Multimodal AI**
+- [ ] HF Space loads from a fresh browser (incognito test)
+- [ ] GitHub repo has a clean README
+- [ ] LICENSE file is MIT
+- [ ] All commits pushed to both remotes
+When all boxes are ticked → click Submit → wait for confirmation email → done.
+Time-target: submit by **2026-05-11 02:00 MYT** (1-hour buffer before the 03:00 cutoff).

docs/pitch-deck.md ADDED Viewed

	@@ -0,0 +1,166 @@

+# SignBridge — Pitch Deck (8 slides)
+> Open a Google Slides deck (or Pitch). Paste each slide's content into the matching blank slide. Visuals are described in italics — replace with actual screenshots / diagrams / table renders.
+> Aspect ratio: 16:9. Theme: indigo→pink gradient (matches HF Space card).
+---
+## Slide 1 — Title
+**Title (huge):**
+SignBridge
+**Subtitle:**
+Real-time ASL → English speech, on a single AMD Instinct MI300X.
+**Footer (small):**
+Track 3 · Vision & Multimodal AI · AMD Developer Hackathon 2026 · Lucas Loo Tan Yu Heng
+*Visual: the cover.png we already shipped (1280×640 indigo→pink gradient with 🤟 + project name).*
+---
+## Slide 2 — The problem
+**Headline:**
+70 million deaf people. Sign-language interpreters cost $50–200 per hour. They're scarce.
+**Body bullets:**
+- Courts, hospitals, schools, public services **must by law** provide interpretation (ADA Title II/III in the US; European Accessibility Act 2025 in the EU).
+- **Sorenson VRS**, the dominant sign-language relay-services provider, books **$4B+ in annual revenue** filling this gap — proof the demand is enormous and budgeted-for.
+- Existing AI alternatives (Be My Eyes, Microsoft Seeing AI) are turn-based, photo-only, English-default, and closed-source. Real ASL is *motion* — they fundamentally can't translate "HELLO" or "THANK YOU".
+*Visual: a row of three context icons — courthouse / hospital / classroom — labeled with the mandates.*
+---
+## Slide 3 — The solution
+**Headline:**
+Hold to record. Sign. Speak.
+**Body (3-step arc):**
+1. **Hold-to-record button** captures 1.5 seconds of your sign.
+2. A multi-stage pipeline (vision → reasoning → speech) translates it.
+3. The other person hears natural English.
+**Tag line under the arc:**
+Two people who couldn't communicate, now can.
+*Visual: 3 screenshots of the live Gradio Space — (a) user signing into webcam; (b) "detected: HELLO (85%)"; (c) audio waveform playing "Hello.".*
+*If single screenshot: just the Gradio "Record sign" tab mid-demo.*
+---
+## Slide 4 — Architecture (the AMD pitch)
+**Headline:**
+The whole pipeline fits on a single MI300X. NVIDIA H100 doesn't.
+**Diagram (build in Slides; described as bullets):**
+```
+[ Webcam frame burst (4 frames, 1.5 s) ]
+              │
+              ▼
+[ Qwen3-VL-8B  ── frame summariser, multi-image VLM call ]
+              │
+              ▼
+[ Llama-3.1-8B ── sentence composer (sign tokens → English) ]
+              │
+              ▼
+[ Coqui XTTS-v2 ── multilingual streaming TTS ]
+              │
+              ▼
+[ Audio out ── speaker / Gradio audio component ]
+```
+**Comparison table (small print under diagram):**
+| Component | Weights (FP16) | MI300X 1× (192 GB) | H100 80 GB |
+|---|---|---|---|
+| Qwen3-VL-8B | ~16 GB | ✅ fits | ✅ |
+| Llama-3.1-8B | ~16 GB | ✅ fits | ✅ |
+| XTTS-v2 + Whisper (V2) | ~5 GB | ✅ fits | ⚠ tight |
+| (V2) **Llama-3.1-70B FP8 reasoner** | ~70 GB | **✅ still fits** | **❌ doesn't fit at all** |
+**Closer:** The single-GPU concurrency story is the AMD pitch.
+*Visual: the diagram + table as a single composite slide. Use a brand colour for the AMD column to highlight.*
+---
+## Slide 5 — Live demo
+**Headline:**
+*(blank — this slide is the live demo)*
+**Speaker note:**
+Switch to the live HF Space at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge. 30 seconds:
+1. **Snapshot tab** — fingerspell L-U-C-A-S → click Speak → AI says "Lucas."
+2. **Record sign tab** — record HELLO → click Submit → "hello" detected → click Speak → AI says "Hello."
+If demo fails / network down → fall back to the pre-recorded 2-min video on slide 6.
+*Visual: leave the slide blank or use a single QR code linking to the Space URL for the audience to scan and try themselves.*
+---
+## Slide 6 — Demo video (fallback)
+**Headline:**
+*(blank — this slide embeds the demo video)*
+**Embed:**
+The 2–3 minute demo video, looping, autoplay-on-slide-show.
+*Visual: video player.*
+---
+## Slide 7 — Why this is the right submission for Track 3
+**Headline:**
+Four judging criteria, four deliberate choices.
+**Two-column layout:**
+| Judging criterion | Our choice |
+|---|---|
+| **Application of Technology** | Multi-modal pipeline (vision + reasoning + voice) running concurrently on a single MI300X — exactly what Track 3's "massive memory bandwidth of AMD GPUs" was for. |
+| **Presentation** | Demo is *experienced*: judge holds phone, signs HELLO, hears "Hello." 30 seconds, no explanation needed. |
+| **Business Value** | $4B+ existing market (Sorenson VRS comparable), legally-mandated interpretation budgets, open-source so any Deaf-led NGO / ministry / school can self-host on their own AMD compute. |
+| **Originality** | Streaming continuous multi-frame VLM agent for sign language — no peer-reviewed benchmark exists for this approach yet (we checked the literature). Real ASL motion-words, not just fingerspelling. |
+*Visual: 2×2 grid of icons, one per criterion.*
+---
+## Slide 8 — Substrate, not product · Open · Deaf-led future
+**Headline:**
+SignBridge is a substrate. Deaf-led teams are the deployers.
+**Body:**
+- **MIT-licensed**, code at github.com/seekerPrice/signbridge — anyone can self-host.
+- **ASL only V1 is a scope decision.** BSL, MSL, CSL, ISL, +200 sign languages each deserve their own teams, training data, and Deaf community leadership. (Citing Bragg et al., *"Systemic Biases in Sign Language AI Research"*, arXiv 2403.02563.)
+- **Privacy by default** — frames and audio are processed in-memory and not persisted server-side beyond the request lifetime.
+**Closing line (large):**
+The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible.
+*Visual: world map outline with sign-language regional dots; or just the SignBridge logo with the closing tagline.*
+---
+## Speaker-note tips (read these before recording)
+1. **Lead with the human problem (Slide 2), not the architecture.** Architecture is for criterion 1; emotion is what closes criteria 2–4.
+2. **Time the live demo** — 30 seconds max. If it fails, switch to fallback video without comment.
+3. **Always say "AMD MI300X" by name** at least 3 times in the talk track. Sponsors notice.
+4. **End on the substrate framing** — pre-empts the "savior tech" critique that Deaf-AI judges look out for.
+---
+## Export
+Once filled in: File → Download → PDF document → upload to lablab.ai submission form's "Slide Presentation" field.