Spaces:
Build error
A newer version of the Gradio SDK is available: 6.14.0
SignBridge β real-time ASL β speech translation
Loaded when the working directory is inside /Users/lucaslt/Documents/side-gig/amd-hackathon/. Keep this file current: prepend a dated entry to the Progress log after every milestone. Prune entries older than 60 days unless they anchor a persistent fact.
Standing rules
- Never make assumptions β always look up answers online. Before coding, configuring, or recommending anything, verify against authoritative sources (use
context7for libraries / SDKs / APIs,WebSearch/WebFetchfor everything else). Training data is stale; default-guesses waste time. This applies even to things that "seem obvious". - Use Superpowers skills for every suitable use case β especially planning. Any planning, debugging, executing-from-plan, brainstorming, parallel-agent dispatch, TDD, or pre-completion verification goes through the matching
superpowers:*skill (superpowers:writing-plans,:executing-plans,:brainstorming,:systematic-debugging,:subagent-driven-development,:verification-before-completion,:test-driven-development,:dispatching-parallel-agents). Free-form prose plans are not allowed. - Use the
deep-researchskill for deep academic research. Multi-source comparison, literature review, state-of-the-art surveys, citation-tracked evidence β invokedeep-research, not ad-hoc web search. - Always do deep research / online research BEFORE making non-trivial decisions. Any architectural choice, model pick, library selection, or competition-strategy call goes through
deep-research(academic) orWebSearch/context7(practical) first. Document findings inline so the decision is auditable. Default-guesses based on training data or "what feels right" are not allowed; the cost of looking things up is small, the cost of building on a wrong assumption is large. - Use the
deep-checkskill for whole-repo audits before any submission, merge, or major checkpoint. Run line-by-line bug + logic + security scan viadeep-checkafter every meaningful change. Surface findings explicitly; fix blockers before declaring work done.
Competition requirements (authoritative)
Snapshot of the official AMD Developer Hackathon rules, captured 2026-05-08 from https://lablab.ai/ai-hackathons/amd-developer. Read-only β never edit. If the lablab page changes, re-snapshot the entire section.
Hackathon: AMD Developer Hackathon (lablab.ai Β· sponsored by AMD + Akash Systems Β· partners: Hugging Face, Qwen)
Hard deadlines (Malaysia Time)
| Event | Date / time |
|---|---|
| Hackathon kick-off | 2026-05-05 00:00 MYT |
| On-site (SF, by invitation only) | 2026-05-09 17:00 MYT β 2026-05-10 03:00 MYT |
| Online build phase | open since kick-off |
| Submission deadline | 2026-05-11 03:00 MYT |
| Live on-stage pitching (on-site only) | 2026-05-11 05:00 MYT |
Targeted track: Track 3 β Vision & Multimodal AI
Verbatim from the lablab page:
- Objective: Build applications that process and understand multiple data types (Images, Video, Audio) using the massive memory bandwidth of AMD GPUs.
- What to Build: High-throughput industrial inspection, medical imaging analysis, or multimodal conversational assistants.
- Tech Stack: Multimodal models (like Llama 3.2 Vision, Qwen-VL) optimized for ROCm.
- Compute Resource: Access to AMD Instinct MI300X instances via AMD Developer Cloud.
Submission flow (Hugging Face partnership)
Verbatim from lablab page β "Technology Partners & Workshops" β Hugging Face section:
- Find a model on Hugging Face Hub to work with.
- Build or fine-tune it using your AMD Developer Cloud credits.
- Publish your completed project as a Hugging Face Space within the event organization β
lablab-ai-amd-developer-hackathon. - Submit your Space link on lablab when you submit your project.
Lucas joined the org and the Space lives at
huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge(or will, once Fix A lands). Personal-namespace Spaces are NOT eligible for the HF Special Prize.
Required submission deliverables (verbatim from "What to submit?")
Basic Information:
- Project Title
- Short Description
- Long Description
- Technology & Category Tags
Cover Image and Presentation: 5. Cover Image 6. Video Presentation 7. Slide Presentation
App Hosting & Code Repository: 8. Public GitHub Repository 9. Demo Application Platform (= Hugging Face Space) 10. Application URL
Judging criteria (verbatim)
| Criterion | Definition |
|---|---|
| Application of Technology | How effectively the chosen model(s) are integrated into the solution. |
| Presentation | The clarity and effectiveness of the project presentation. |
| Business Value | The impact and practical value, considering how well it fits into business areas. |
| Originality | The uniqueness & creativity of the solution, highlighting approaches and ability to demonstrate behaviors. |
Prize structure (verbatim from "Prizes")
- Total prize pool: $21,500+, sponsored by AMD and Akash Systems, plus an AMD hardware reward and exclusive Hugging Face prizes.
- π Grand Prize: $5,000 β overall top project.
- Exclusive Hardware Reward: AMD Radeon AI PRO R9700 GPU β awarded for outstanding social engagement or project promotion.
- π¨ Track 3 β Vision & Multimodal AI: 1st $2,500 Β· 2nd $1,500 Β· 3rd $1,000.
- π€ Track 1 β AI Agents & Agentic Workflows: same tier.
- β‘ Track 2 β Fine-Tuning on AMD GPUs: same tier.
- π€ Hugging Face Special Prize (Space with the most likes in the event org):
- 1st: 1 Reachy Mini Wireless + 6 months Hugging Face PRO + $500 Hugging Face Credits.
- 2nd: 3 months Hugging Face PRO + $300 Hugging Face Credits.
- 3rd: 2 months Hugging Face PRO + $200 Hugging Face Credits.
Prize targets for SignBridge
- π₯ Track 3 (primary).
- π€ HF Special Prize (most likes β requires Space in event org + sharing the link).
- π Grand Prize (aspirational).
- β Build-in-Public extra: dropped by user direction 2026-05-07 (no tweet obligations; walkthrough kept as internal doc only).
License rule
Per the Voluntary Participation & Prize Terms footer: "Submissions must be original and MIT-compliant." SignBridge ships under MIT License (originally drafted as Apache 2.0 β switched 2026-05-08 to satisfy the literal reading of "MIT-compliant").
Tech stack constraints (per Track 3)
- Compute: AMD Instinct MI300X via AMD Developer Cloud (datacenter GPU, 192 GB HBM3, 5.3 TB/s memory bandwidth). Not Ryzen, not Radeon Pro β those are different AMD product lines.
- Models: Multimodal models optimized for ROCm. Examples called out by the rules: Llama 3.2 Vision, Qwen-VL family. SignBridge uses
Qwen/Qwen3-VL-8B-Instruct(Qwen-VL family β) for sign recognition +meta-llama/Llama-3.1-8B-Instructfor sentence composition +coqui/XTTS-v2for speech. - Frameworks: ROCm + PyTorch + Hugging Face Optimum-AMD + vLLM (per the rules).
Workshop references (provided by AMD)
- "Build and Deploy an AI App on AMD MI300X as a Hugging Face Space" β Steve Kimoi, lablab.ai
- "Getting Started on AMD Developer Cloud" β Maharshi Trivedi, AMD
- "AI Agents 101: Building AI Agents with MCP & Open-Source Inference" β Mahdi Ghodsi, AMD
Status
Day 1 / ~4 β pivoted from Iris to SignBridge on 2026-05-07. Submission deadline: 2026-05-11 03:00 MYT. ~3.5 days remaining. AMD Developer Hackathon, Track 3 β Vision & Multimodal AI (only β Build-in-Public dropped 2026-05-07). Currently scaffolding + Day 1 hello-world.
Goal
Win the AMD Developer Hackathon (LabLab.ai, May 2026), Track 3, with a real-time webcam-based ASL β English speech translator. A deaf person signs β AI speaks. The demo IS the project: judges literally see two people who couldn't communicate, now do.
Success criteria
- Submission accepted by 2026-05-11 03:00 MYT β live HF Space (Gradio) URL + 2β3 min demo video + lablab.ai submission form complete.
- End-to-end working flow: webcam frame β VLM recognizer β Llama-3.1-8B sentence composer β Coqui XTTS-v2 β speech output. β€ 2 s from capture to start of speech.
- V1 use cases: (1) ASL fingerspelling alphabet AβZ + 0β9, (2) Top-50 WLASL signs (hello, thank you, name, please, β¦). Target β₯ 75% accuracy on a 30-sample gold set.
- Reverse direction (speech β on-screen text for the deaf user) is a stretch for the buffer day only.
- Track 3: top-3 finish at minimum; gold target.
Workflow tools
| Task | Skill / Plugin | Why |
|---|---|---|
| Planning (any non-trivial change) | superpowers:writing-plans |
Hard rule β no free-form prose plans |
| Early-stage exploration | superpowers:brainstorming |
Use before requirements firm |
| Executing the build plan | superpowers:executing-plans |
Plan-driven implementation |
| Debugging | superpowers:systematic-debugging |
Root-cause-first |
| Multi-agent / parallel sub-work | superpowers:dispatching-parallel-agents or :subagent-driven-development |
Decompose by specialist |
| Pre-completion verification | superpowers:verification-before-completion |
Don't claim done without checks |
| Test-driven implementation | superpowers:test-driven-development |
Write test before code |
| Long-context cross-file analysis | cc-gemini-plugin:gemini |
When 1M context window helps |
| Online docs lookup | context7 (search/resolve) |
"Verify online" rule β ROCm + HF + WLASL + MediaPipe specifics |
| Multi-source research with citations | deep-research |
WLASL prior art, sign-language ML state of the art, ROCm performance |
| Whole-repo bug + logic audit | deep-check |
16-category systematic scan before submission |
| Second-opinion / rescue / stuck | codex:rescue |
Hand off to Codex runtime |
| Code review (own work pre-submission) | code-review:code-review or pr-review-toolkit:review-pr |
Style/bug/security pass before public release |
| Security review | owasp-security |
OWASP Top 10 / ASVS β webcam + audio handling |
| Browser-based demo verification | chrome-devtools-mcp:chrome-devtools |
Verify the HF Space before recording |
| Commit / push / PR | commit-commands:commit-push-pr |
Standard commit flow |
Hard rule: every planning task goes through a superpowers:* skill β no free-form prose plans.
Tech stack (locked)
- Languages: Python 3.12 (primary)
- Submission deliverable: Hugging Face Space (Gradio app, public, MIT)
- Inference backend: FastAPI on AMD Developer Cloud (single MI300X instance), exposed as OpenAI-compatible API
- Transport: HTTPS for V1; WebSocket only if latency demands it post-Day-2
- Pipeline (concurrent on one MI300X):
- Pose extraction: MediaPipe Holistic (Google) β frame β 543-dim landmark vector
- Sign classifier: trained-from-scratch small transformer over landmark sequences (WLASL Top-100 + ASL fingerspelling alphabet) β sign tokens
- Sentence composer:
meta-llama/Llama-3.1-8B-Instructβ grammatical English sentence from sign-token stream - TTS:
coqui/XTTS-v2β audio - (Stretch) STT:
openai/whisper-large-v3β reverse direction (speech β on-screen text)
- Datasets: WLASL Top-100 subset + ASL fingerspelling alphabet (open)
- HF Hub artifact:
lucas-loo/signbridge-classifier(trained classifier weights + model card with ROCm training config) - License: MIT
- GitHub mirror: https://github.com/seekerPrice/signbridge
- HF Space URL: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
- Submission link: fill in once started on lablab.ai
Run Commands
# Setup (one-time)
pip install -r requirements.txt
cp .env.example .env # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback)
# Dev β run Gradio Space locally
python app.py
# Dev β run inference backend (locally for dev, deploys to AMD Dev Cloud for production)
python -m signbridge.backend
# Train the sign classifier on WLASL Top-100 (run on AMD Dev Cloud Day 2)
python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30
# Tests
pytest
# Lint / format / type
ruff check . && mypy signbridge/
# Push HF Space update (auto-deploys on git push to HF remote)
git push huggingface main
Workspace layout
/Users/lucaslt/Documents/side-gig/amd-hackathon/
βββ README.md # HF Space card via frontmatter
βββ LICENSE # MIT
βββ CLAUDE.md
βββ .claude/
βββ requirements.txt
βββ .env.example
βββ app.py # HF Space entry β Gradio
βββ signbridge/
β βββ __init__.py
β βββ space.py # Gradio UI
β βββ backend.py # FastAPI inference server
β βββ recognizer/
β β βββ __init__.py
β β βββ landmarks.py # MediaPipe Holistic wrapper
β β βββ classifier.py # trained sign classifier
β βββ composer/
β β βββ __init__.py
β β βββ sentence.py # Llama-3.1-8B sentence composer
β βββ voice/
β β βββ __init__.py
β β βββ tts.py # Coqui XTTS-v2
β βββ scripts/
β βββ __init__.py
β βββ train_classifier.py # WLASL training script
βββ data/
β βββ wlasl/ # gitignored β WLASL Top-100 dataset
βββ assets/
β βββ cover.png # 1280Γ640 HF Space + lablab cover
βββ tests/
β βββ golden/ # 30-sample gold set (Top-50 + alphabet)
βββ docs/
βββ walkthrough.md # technical walkthrough for submission
References
- Owner: Lucas
- Working dir:
/Users/lucaslt/Documents/side-gig/amd-hackathon/ - Hackathon page: https://lablab.ai/ai-hackathons/amd-developer
- AMD article: https://www.amd.com/en/developer/resources/technical-articles/2026/build-across-the-ai-stack--join-the-amd-x-lablab-ai-hackathon-.html
- Track: 3 (Vision & Multimodal AI). Extra Challenge (Build in Public) intentionally skipped 2026-05-07.
- WLASL dataset: https://github.com/dxli94/WLASL
- MediaPipe Holistic: https://developers.google.com/mediapipe/solutions/vision/holistic_landmarker
- HF Space: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (moved to event org 2026-05-08)
- GitHub mirror: https://github.com/seekerPrice/signbridge (deployed 2026-05-07)
- Submission link: fill in once started on lablab.ai
- Plan file:
/Users/lucaslt/.claude/plans/first-need-to-change-sparkling-dawn.md
Progress log (newest first)
2026-05-08 β Fix A: HF Space moved to event org. Now at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge. Eligible for HF Special Prize ranking. Personal-namespace LucasLooTan/signbridge left as-is (will mark private after the hackathon).
2026-05-07 β GitHub repo + HF Space live. GitHub: seekerPrice/signbridge. HF Space: LucasLooTan/signbridge (Gradio SDK 4.44.1, Apache 2.0). All 16 source files mirrored to both. Awaiting AMD Dev Cloud credit email to wire up real VLM endpoint.
2026-05-07 β Dropped Build-in-Public extra challenge. Track 3 only. Frees ~2 hours that were earmarked for the 2 social posts + the external-facing walkthrough framing. Walkthrough doc kept as an internal technical record but no longer a submission deliverable.
2026-05-07 β Pivoted to SignBridge. Re-scored against the four judging criteria: SignBridge wins on Originality (10) and Presentation (10) thanks to the live deaf-person-to-hearing-person demo. Business value also stronger (Sorenson VRS comparable, mandated interpreter budgets). Replaced Iris scaffold (iris/ package, README, requirements deps) with signbridge/ package. CLAUDE.md, plan file, README rewritten. Day 1 hello-world starts: MediaPipe Holistic on webcam, WLASL data download, Plan-B VLM test.
2026-05-07 β Initial Iris scaffold (deprecated). Bootstrapped repo with Iris (visually-impaired navigation) plan, requirements.txt, .gitignore, .env.example, README. Replaced same-day after re-evaluation; kept reusable pieces (.gitignore, structural choices).