Spaces:

lablab-ai-amd-developer-hackathon
/

signbridge

Build error

App Files Files Community

signbridge / CLAUDE.md

LucasLooTan

docs(rules): add 'research before decisions' + 'deep-check before checkpoints' to standing rules

e831a7f about 15 hours ago

preview code

raw

history blame contribute delete

16.5 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

SignBridge — real-time ASL → speech translation

Loaded when the working directory is inside /Users/lucaslt/Documents/side-gig/amd-hackathon/. Keep this file current: prepend a dated entry to the Progress log after every milestone. Prune entries older than 60 days unless they anchor a persistent fact.

Standing rules

Never make assumptions — always look up answers online. Before coding, configuring, or recommending anything, verify against authoritative sources (use context7 for libraries / SDKs / APIs, WebSearch / WebFetch for everything else). Training data is stale; default-guesses waste time. This applies even to things that "seem obvious".
Use Superpowers skills for every suitable use case — especially planning. Any planning, debugging, executing-from-plan, brainstorming, parallel-agent dispatch, TDD, or pre-completion verification goes through the matching superpowers:* skill (superpowers:writing-plans, :executing-plans, :brainstorming, :systematic-debugging, :subagent-driven-development, :verification-before-completion, :test-driven-development, :dispatching-parallel-agents). Free-form prose plans are not allowed.
Use the deep-research skill for deep academic research. Multi-source comparison, literature review, state-of-the-art surveys, citation-tracked evidence — invoke deep-research, not ad-hoc web search.
Always do deep research / online research BEFORE making non-trivial decisions. Any architectural choice, model pick, library selection, or competition-strategy call goes through deep-research (academic) or WebSearch / context7 (practical) first. Document findings inline so the decision is auditable. Default-guesses based on training data or "what feels right" are not allowed; the cost of looking things up is small, the cost of building on a wrong assumption is large.
Use the deep-check skill for whole-repo audits before any submission, merge, or major checkpoint. Run line-by-line bug + logic + security scan via deep-check after every meaningful change. Surface findings explicitly; fix blockers before declaring work done.

Competition requirements (authoritative)

Snapshot of the official AMD Developer Hackathon rules, captured 2026-05-08 from https://lablab.ai/ai-hackathons/amd-developer. Read-only — never edit. If the lablab page changes, re-snapshot the entire section.

Hackathon: AMD Developer Hackathon (lablab.ai · sponsored by AMD + Akash Systems · partners: Hugging Face, Qwen)

Hard deadlines (Malaysia Time)

Event	Date / time
Hackathon kick-off	2026-05-05 00:00 MYT
On-site (SF, by invitation only)	2026-05-09 17:00 MYT → 2026-05-10 03:00 MYT
Online build phase	open since kick-off
Submission deadline	2026-05-11 03:00 MYT
Live on-stage pitching (on-site only)	2026-05-11 05:00 MYT

Targeted track: Track 3 — Vision & Multimodal AI

Verbatim from the lablab page:

Objective: Build applications that process and understand multiple data types (Images, Video, Audio) using the massive memory bandwidth of AMD GPUs.
What to Build: High-throughput industrial inspection, medical imaging analysis, or multimodal conversational assistants.
Tech Stack: Multimodal models (like Llama 3.2 Vision, Qwen-VL) optimized for ROCm.
Compute Resource: Access to AMD Instinct MI300X instances via AMD Developer Cloud.

Submission flow (Hugging Face partnership)

Verbatim from lablab page → "Technology Partners & Workshops" → Hugging Face section:

Find a model on Hugging Face Hub to work with.
Build or fine-tune it using your AMD Developer Cloud credits.
Publish your completed project as a Hugging Face Space within the event organization — lablab-ai-amd-developer-hackathon.
Submit your Space link on lablab when you submit your project.

Lucas joined the org and the Space lives at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (or will, once Fix A lands). Personal-namespace Spaces are NOT eligible for the HF Special Prize.

Required submission deliverables (verbatim from "What to submit?")

Basic Information:

Project Title
Short Description
Long Description
Technology & Category Tags

Cover Image and Presentation: 5. Cover Image 6. Video Presentation 7. Slide Presentation

App Hosting & Code Repository: 8. Public GitHub Repository 9. Demo Application Platform (= Hugging Face Space) 10. Application URL

Judging criteria (verbatim)

Criterion	Definition
Application of Technology	How effectively the chosen model(s) are integrated into the solution.
Presentation	The clarity and effectiveness of the project presentation.
Business Value	The impact and practical value, considering how well it fits into business areas.
Originality	The uniqueness & creativity of the solution, highlighting approaches and ability to demonstrate behaviors.

Prize structure (verbatim from "Prizes")

Total prize pool: $21,500+, sponsored by AMD and Akash Systems, plus an AMD hardware reward and exclusive Hugging Face prizes.
🏆 Grand Prize: $5,000 — overall top project.
Exclusive Hardware Reward: AMD Radeon AI PRO R9700 GPU — awarded for outstanding social engagement or project promotion.
🎨 Track 3 — Vision & Multimodal AI: 1st $2,500 · 2nd $1,500 · 3rd $1,000.
🤖 Track 1 — AI Agents & Agentic Workflows: same tier.
⚡ Track 2 — Fine-Tuning on AMD GPUs: same tier.
🤗 Hugging Face Special Prize (Space with the most likes in the event org):
- 1st: 1 Reachy Mini Wireless + 6 months Hugging Face PRO + $500 Hugging Face Credits.
- 2nd: 3 months Hugging Face PRO + $300 Hugging Face Credits.
- 3rd: 2 months Hugging Face PRO + $200 Hugging Face Credits.

Prize targets for SignBridge

🥇 Track 3 (primary).
🤗 HF Special Prize (most likes — requires Space in event org + sharing the link).
🏆 Grand Prize (aspirational).
❌ Build-in-Public extra: dropped by user direction 2026-05-07 (no tweet obligations; walkthrough kept as internal doc only).

License rule

Per the Voluntary Participation & Prize Terms footer: "Submissions must be original and MIT-compliant." SignBridge ships under MIT License (originally drafted as Apache 2.0 — switched 2026-05-08 to satisfy the literal reading of "MIT-compliant").

Tech stack constraints (per Track 3)

Compute: AMD Instinct MI300X via AMD Developer Cloud (datacenter GPU, 192 GB HBM3, 5.3 TB/s memory bandwidth). Not Ryzen, not Radeon Pro — those are different AMD product lines.
Models: Multimodal models optimized for ROCm. Examples called out by the rules: Llama 3.2 Vision, Qwen-VL family. SignBridge uses Qwen/Qwen3-VL-8B-Instruct (Qwen-VL family ✓) for sign recognition + meta-llama/Llama-3.1-8B-Instruct for sentence composition + coqui/XTTS-v2 for speech.
Frameworks: ROCm + PyTorch + Hugging Face Optimum-AMD + vLLM (per the rules).

Workshop references (provided by AMD)

"Build and Deploy an AI App on AMD MI300X as a Hugging Face Space" — Steve Kimoi, lablab.ai
"Getting Started on AMD Developer Cloud" — Maharshi Trivedi, AMD
"AI Agents 101: Building AI Agents with MCP & Open-Source Inference" — Mahdi Ghodsi, AMD

Status

Day 1 / ~4 — pivoted from Iris to SignBridge on 2026-05-07. Submission deadline: 2026-05-11 03:00 MYT. ~3.5 days remaining. AMD Developer Hackathon, Track 3 — Vision & Multimodal AI (only — Build-in-Public dropped 2026-05-07). Currently scaffolding + Day 1 hello-world.

Goal

Win the AMD Developer Hackathon (LabLab.ai, May 2026), Track 3, with a real-time webcam-based ASL → English speech translator. A deaf person signs → AI speaks. The demo IS the project: judges literally see two people who couldn't communicate, now do.

Success criteria

Submission accepted by 2026-05-11 03:00 MYT — live HF Space (Gradio) URL + 2–3 min demo video + lablab.ai submission form complete.
End-to-end working flow: webcam frame → VLM recognizer → Llama-3.1-8B sentence composer → Coqui XTTS-v2 → speech output. ≤ 2 s from capture to start of speech.
V1 use cases: (1) ASL fingerspelling alphabet A–Z + 0–9, (2) Top-50 WLASL signs (hello, thank you, name, please, …). Target ≥ 75% accuracy on a 30-sample gold set.
Reverse direction (speech → on-screen text for the deaf user) is a stretch for the buffer day only.
Track 3: top-3 finish at minimum; gold target.

Workflow tools

Task	Skill / Plugin	Why
Planning (any non-trivial change)	`superpowers:writing-plans`	Hard rule — no free-form prose plans
Early-stage exploration	`superpowers:brainstorming`	Use before requirements firm
Executing the build plan	`superpowers:executing-plans`	Plan-driven implementation
Debugging	`superpowers:systematic-debugging`	Root-cause-first
Multi-agent / parallel sub-work	`superpowers:dispatching-parallel-agents` or `:subagent-driven-development`	Decompose by specialist
Pre-completion verification	`superpowers:verification-before-completion`	Don't claim done without checks
Test-driven implementation	`superpowers:test-driven-development`	Write test before code
Long-context cross-file analysis	`cc-gemini-plugin:gemini`	When 1M context window helps
Online docs lookup	`context7` (search/resolve)	"Verify online" rule — ROCm + HF + WLASL + MediaPipe specifics
Multi-source research with citations	`deep-research`	WLASL prior art, sign-language ML state of the art, ROCm performance
Whole-repo bug + logic audit	`deep-check`	16-category systematic scan before submission
Second-opinion / rescue / stuck	`codex:rescue`	Hand off to Codex runtime
Code review (own work pre-submission)	`code-review:code-review` or `pr-review-toolkit:review-pr`	Style/bug/security pass before public release
Security review	`owasp-security`	OWASP Top 10 / ASVS — webcam + audio handling
Browser-based demo verification	`chrome-devtools-mcp:chrome-devtools`	Verify the HF Space before recording
Commit / push / PR	`commit-commands:commit-push-pr`	Standard commit flow

Hard rule: every planning task goes through a superpowers:* skill — no free-form prose plans.

Tech stack (locked)

Languages: Python 3.12 (primary)
Submission deliverable: Hugging Face Space (Gradio app, public, MIT)
Inference backend: FastAPI on AMD Developer Cloud (single MI300X instance), exposed as OpenAI-compatible API
Transport: HTTPS for V1; WebSocket only if latency demands it post-Day-2
Pipeline (concurrent on one MI300X):
- Pose extraction: MediaPipe Holistic (Google) — frame → 543-dim landmark vector
- Sign classifier: trained-from-scratch small transformer over landmark sequences (WLASL Top-100 + ASL fingerspelling alphabet) → sign tokens
- Sentence composer: meta-llama/Llama-3.1-8B-Instruct → grammatical English sentence from sign-token stream
- TTS: coqui/XTTS-v2 → audio
- (Stretch) STT: openai/whisper-large-v3 → reverse direction (speech → on-screen text)
Datasets: WLASL Top-100 subset + ASL fingerspelling alphabet (open)
HF Hub artifact: lucas-loo/signbridge-classifier (trained classifier weights + model card with ROCm training config)
License: MIT
GitHub mirror: https://github.com/seekerPrice/signbridge
HF Space URL: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
Submission link: fill in once started on lablab.ai

Run Commands

# Setup (one-time)
pip install -r requirements.txt
cp .env.example .env  # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback)

# Dev — run Gradio Space locally
python app.py

# Dev — run inference backend (locally for dev, deploys to AMD Dev Cloud for production)
python -m signbridge.backend

# Train the sign classifier on WLASL Top-100 (run on AMD Dev Cloud Day 2)
python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30

# Tests
pytest

# Lint / format / type
ruff check . && mypy signbridge/

# Push HF Space update (auto-deploys on git push to HF remote)
git push huggingface main

Workspace layout

/Users/lucaslt/Documents/side-gig/amd-hackathon/
├── README.md                       # HF Space card via frontmatter
├── LICENSE                         # MIT
├── CLAUDE.md
├── .claude/
├── requirements.txt
├── .env.example
├── app.py                          # HF Space entry — Gradio
├── signbridge/
│   ├── __init__.py
│   ├── space.py                    # Gradio UI
│   ├── backend.py                  # FastAPI inference server
│   ├── recognizer/
│   │   ├── __init__.py
│   │   ├── landmarks.py            # MediaPipe Holistic wrapper
│   │   └── classifier.py           # trained sign classifier
│   ├── composer/
│   │   ├── __init__.py
│   │   └── sentence.py             # Llama-3.1-8B sentence composer
│   ├── voice/
│   │   ├── __init__.py
│   │   └── tts.py                  # Coqui XTTS-v2
│   └── scripts/
│       ├── __init__.py
│       └── train_classifier.py     # WLASL training script
├── data/
│   └── wlasl/                      # gitignored — WLASL Top-100 dataset
├── assets/
│   └── cover.png                   # 1280×640 HF Space + lablab cover
├── tests/
│   └── golden/                     # 30-sample gold set (Top-50 + alphabet)
└── docs/
    └── walkthrough.md              # technical walkthrough for submission

References

Owner: Lucas
Working dir: /Users/lucaslt/Documents/side-gig/amd-hackathon/
Hackathon page: https://lablab.ai/ai-hackathons/amd-developer
AMD article: https://www.amd.com/en/developer/resources/technical-articles/2026/build-across-the-ai-stack--join-the-amd-x-lablab-ai-hackathon-.html
Track: 3 (Vision & Multimodal AI). Extra Challenge (Build in Public) intentionally skipped 2026-05-07.
WLASL dataset: https://github.com/dxli94/WLASL
MediaPipe Holistic: https://developers.google.com/mediapipe/solutions/vision/holistic_landmarker
HF Space: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (moved to event org 2026-05-08)
GitHub mirror: https://github.com/seekerPrice/signbridge (deployed 2026-05-07)
Submission link: fill in once started on lablab.ai
Plan file: /Users/lucaslt/.claude/plans/first-need-to-change-sparkling-dawn.md

Progress log (newest first)

2026-05-08 — Fix A: HF Space moved to event org. Now at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge. Eligible for HF Special Prize ranking. Personal-namespace LucasLooTan/signbridge left as-is (will mark private after the hackathon).

2026-05-07 — GitHub repo + HF Space live. GitHub: seekerPrice/signbridge. HF Space: LucasLooTan/signbridge (Gradio SDK 4.44.1, Apache 2.0). All 16 source files mirrored to both. Awaiting AMD Dev Cloud credit email to wire up real VLM endpoint.

2026-05-07 — Dropped Build-in-Public extra challenge. Track 3 only. Frees ~2 hours that were earmarked for the 2 social posts + the external-facing walkthrough framing. Walkthrough doc kept as an internal technical record but no longer a submission deliverable.

2026-05-07 — Pivoted to SignBridge. Re-scored against the four judging criteria: SignBridge wins on Originality (10) and Presentation (10) thanks to the live deaf-person-to-hearing-person demo. Business value also stronger (Sorenson VRS comparable, mandated interpreter budgets). Replaced Iris scaffold (iris/ package, README, requirements deps) with signbridge/ package. CLAUDE.md, plan file, README rewritten. Day 1 hello-world starts: MediaPipe Holistic on webcam, WLASL data download, Plan-B VLM test.

2026-05-07 — Initial Iris scaffold (deprecated). Bootstrapped repo with Iris (visually-impaired navigation) plan, requirements.txt, .gitignore, .env.example, README. Replaced same-day after re-evaluation; kept reusable pieces (.gitignore, structural choices).