Spaces:
Build error
Build error
File size: 16,539 Bytes
18d028b e831a7f 18d028b 7b09c0b 18d028b 8b64ea8 18d028b 8b64ea8 18d028b 8b64ea8 18d028b 8b64ea8 18d028b 6e90289 18d028b 4499b6e 3a0014b c334b5a 18d028b 6e90289 18d028b 96fe5d4 18d028b 8b64ea8 18d028b c334b5a 3a0014b 18d028b c334b5a 3a0014b 8b64ea8 18d028b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | # SignBridge β real-time ASL β speech translation
Loaded when the working directory is inside `/Users/lucaslt/Documents/side-gig/amd-hackathon/`. Keep this file current: prepend a dated entry to the Progress log after every milestone. Prune entries older than 60 days unless they anchor a persistent fact.
---
## Standing rules
- **Never make assumptions β always look up answers online.** Before coding, configuring, or recommending anything, verify against authoritative sources (use `context7` for libraries / SDKs / APIs, `WebSearch` / `WebFetch` for everything else). Training data is stale; default-guesses waste time. This applies even to things that "seem obvious".
- **Use Superpowers skills for every suitable use case β especially planning.** Any planning, debugging, executing-from-plan, brainstorming, parallel-agent dispatch, TDD, or pre-completion verification goes through the matching `superpowers:*` skill (`superpowers:writing-plans`, `:executing-plans`, `:brainstorming`, `:systematic-debugging`, `:subagent-driven-development`, `:verification-before-completion`, `:test-driven-development`, `:dispatching-parallel-agents`). Free-form prose plans are not allowed.
- **Use the `deep-research` skill for deep academic research.** Multi-source comparison, literature review, state-of-the-art surveys, citation-tracked evidence β invoke `deep-research`, not ad-hoc web search.
- **Always do deep research / online research BEFORE making non-trivial decisions.** Any architectural choice, model pick, library selection, or competition-strategy call goes through `deep-research` (academic) or `WebSearch` / `context7` (practical) first. Document findings inline so the decision is auditable. Default-guesses based on training data or "what feels right" are not allowed; the cost of looking things up is small, the cost of building on a wrong assumption is large.
- **Use the `deep-check` skill for whole-repo audits before any submission, merge, or major checkpoint.** Run line-by-line bug + logic + security scan via `deep-check` after every meaningful change. Surface findings explicitly; fix blockers before declaring work done.
---
## Competition requirements (authoritative)
> Snapshot of the official AMD Developer Hackathon rules, captured 2026-05-08 from https://lablab.ai/ai-hackathons/amd-developer. **Read-only β never edit. If the lablab page changes, re-snapshot the entire section.**
### Hackathon: AMD Developer Hackathon (lablab.ai Β· sponsored by AMD + Akash Systems Β· partners: Hugging Face, Qwen)
### Hard deadlines (Malaysia Time)
| Event | Date / time |
|---|---|
| Hackathon kick-off | 2026-05-05 00:00 MYT |
| On-site (SF, by invitation only) | 2026-05-09 17:00 MYT β 2026-05-10 03:00 MYT |
| Online build phase | open since kick-off |
| **Submission deadline** | **2026-05-11 03:00 MYT** |
| Live on-stage pitching (on-site only) | 2026-05-11 05:00 MYT |
### Targeted track: Track 3 β Vision & Multimodal AI
Verbatim from the lablab page:
- **Objective:** Build applications that process and understand multiple data types (Images, Video, Audio) using the massive memory bandwidth of AMD GPUs.
- **What to Build:** High-throughput industrial inspection, medical imaging analysis, or multimodal conversational assistants.
- **Tech Stack:** Multimodal models (like Llama 3.2 Vision, Qwen-VL) optimized for ROCm.
- **Compute Resource:** Access to AMD Instinct MI300X instances via AMD Developer Cloud.
### Submission flow (Hugging Face partnership)
Verbatim from lablab page β "Technology Partners & Workshops" β Hugging Face section:
1. Find a model on Hugging Face Hub to work with.
2. Build or fine-tune it using your AMD Developer Cloud credits.
3. **Publish your completed project as a Hugging Face Space within the event organization** β `lablab-ai-amd-developer-hackathon`.
4. Submit your Space link on lablab when you submit your project.
> Lucas joined the org and the Space lives at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge` (or will, once Fix A lands). Personal-namespace Spaces are NOT eligible for the HF Special Prize.
### Required submission deliverables (verbatim from "What to submit?")
**Basic Information:**
1. Project Title
2. Short Description
3. Long Description
4. Technology & Category Tags
**Cover Image and Presentation:**
5. Cover Image
6. Video Presentation
7. Slide Presentation
**App Hosting & Code Repository:**
8. Public GitHub Repository
9. Demo Application Platform (= Hugging Face Space)
10. Application URL
### Judging criteria (verbatim)
| Criterion | Definition |
|---|---|
| **Application of Technology** | How effectively the chosen model(s) are integrated into the solution. |
| **Presentation** | The clarity and effectiveness of the project presentation. |
| **Business Value** | The impact and practical value, considering how well it fits into business areas. |
| **Originality** | The uniqueness & creativity of the solution, highlighting approaches and ability to demonstrate behaviors. |
### Prize structure (verbatim from "Prizes")
- **Total prize pool: $21,500+**, sponsored by AMD and Akash Systems, plus an AMD hardware reward and exclusive Hugging Face prizes.
- π **Grand Prize: $5,000** β overall top project.
- **Exclusive Hardware Reward:** AMD Radeon AI PRO R9700 GPU β awarded for outstanding social engagement or project promotion.
- π¨ **Track 3 β Vision & Multimodal AI**: 1st $2,500 Β· 2nd $1,500 Β· 3rd $1,000.
- π€ Track 1 β AI Agents & Agentic Workflows: same tier.
- β‘ Track 2 β Fine-Tuning on AMD GPUs: same tier.
- π€ **Hugging Face Special Prize** (Space with the most likes in the event org):
- 1st: 1 Reachy Mini Wireless + 6 months Hugging Face PRO + $500 Hugging Face Credits.
- 2nd: 3 months Hugging Face PRO + $300 Hugging Face Credits.
- 3rd: 2 months Hugging Face PRO + $200 Hugging Face Credits.
### Prize targets for SignBridge
- π₯ **Track 3** (primary).
- π€ **HF Special Prize** (most likes β requires Space in event org + sharing the link).
- π Grand Prize (aspirational).
- β Build-in-Public extra: **dropped** by user direction 2026-05-07 (no tweet obligations; walkthrough kept as internal doc only).
### License rule
Per the Voluntary Participation & Prize Terms footer: *"Submissions must be original and MIT-compliant."* SignBridge ships under **MIT License** (originally drafted as Apache 2.0 β switched 2026-05-08 to satisfy the literal reading of "MIT-compliant").
### Tech stack constraints (per Track 3)
- **Compute:** AMD Instinct MI300X via AMD Developer Cloud (datacenter GPU, 192 GB HBM3, 5.3 TB/s memory bandwidth). Not Ryzen, not Radeon Pro β those are different AMD product lines.
- **Models:** Multimodal models optimized for ROCm. Examples called out by the rules: Llama 3.2 Vision, Qwen-VL family. SignBridge uses `Qwen/Qwen3-VL-8B-Instruct` (Qwen-VL family β) for sign recognition + `meta-llama/Llama-3.1-8B-Instruct` for sentence composition + `coqui/XTTS-v2` for speech.
- **Frameworks:** ROCm + PyTorch + Hugging Face Optimum-AMD + vLLM (per the rules).
### Workshop references (provided by AMD)
- "Build and Deploy an AI App on AMD MI300X as a Hugging Face Space" β Steve Kimoi, lablab.ai
- "Getting Started on AMD Developer Cloud" β Maharshi Trivedi, AMD
- "AI Agents 101: Building AI Agents with MCP & Open-Source Inference" β Mahdi Ghodsi, AMD
---
## Status
Day 1 / ~4 β pivoted from Iris to SignBridge on 2026-05-07. **Submission deadline: 2026-05-11 03:00 MYT.** ~3.5 days remaining. AMD Developer Hackathon, **Track 3 β Vision & Multimodal AI** (only β Build-in-Public dropped 2026-05-07). Currently scaffolding + Day 1 hello-world.
## Goal
Win the AMD Developer Hackathon (LabLab.ai, May 2026), Track 3, with a real-time webcam-based ASL β English speech translator. A deaf person signs β AI speaks. The demo IS the project: judges literally see two people who couldn't communicate, now do.
### Success criteria
- Submission accepted by 2026-05-11 03:00 MYT β live HF Space (Gradio) URL + 2β3 min demo video + lablab.ai submission form complete.
- End-to-end working flow: webcam frame β VLM recognizer β Llama-3.1-8B sentence composer β Coqui XTTS-v2 β speech output. **β€ 2 s** from capture to start of speech.
- V1 use cases: (1) ASL fingerspelling alphabet AβZ + 0β9, (2) Top-50 WLASL signs (hello, thank you, name, please, β¦). Target β₯ 75% accuracy on a 30-sample gold set.
- Reverse direction (speech β on-screen text for the deaf user) is a **stretch** for the buffer day only.
- Track 3: top-3 finish at minimum; gold target.
---
## Workflow tools
| Task | Skill / Plugin | Why |
|---|---|---|
| Planning (any non-trivial change) | `superpowers:writing-plans` | Hard rule β no free-form prose plans |
| Early-stage exploration | `superpowers:brainstorming` | Use before requirements firm |
| Executing the build plan | `superpowers:executing-plans` | Plan-driven implementation |
| Debugging | `superpowers:systematic-debugging` | Root-cause-first |
| Multi-agent / parallel sub-work | `superpowers:dispatching-parallel-agents` or `:subagent-driven-development` | Decompose by specialist |
| Pre-completion verification | `superpowers:verification-before-completion` | Don't claim done without checks |
| Test-driven implementation | `superpowers:test-driven-development` | Write test before code |
| Long-context cross-file analysis | `cc-gemini-plugin:gemini` | When 1M context window helps |
| Online docs lookup | `context7` (search/resolve) | "Verify online" rule β ROCm + HF + WLASL + MediaPipe specifics |
| Multi-source research with citations | `deep-research` | WLASL prior art, sign-language ML state of the art, ROCm performance |
| Whole-repo bug + logic audit | `deep-check` | 16-category systematic scan before submission |
| Second-opinion / rescue / stuck | `codex:rescue` | Hand off to Codex runtime |
| Code review (own work pre-submission) | `code-review:code-review` or `pr-review-toolkit:review-pr` | Style/bug/security pass before public release |
| Security review | `owasp-security` | OWASP Top 10 / ASVS β webcam + audio handling |
| Browser-based demo verification | `chrome-devtools-mcp:chrome-devtools` | Verify the HF Space before recording |
| Commit / push / PR | `commit-commands:commit-push-pr` | Standard commit flow |
**Hard rule:** every planning task goes through a `superpowers:*` skill β no free-form prose plans.
---
## Tech stack (locked)
- Languages: Python 3.12 (primary)
- Submission deliverable: Hugging Face Space (Gradio app, public, MIT)
- Inference backend: FastAPI on AMD Developer Cloud (single MI300X instance), exposed as OpenAI-compatible API
- Transport: HTTPS for V1; WebSocket only if latency demands it post-Day-2
- Pipeline (concurrent on one MI300X):
- **Pose extraction:** MediaPipe Holistic (Google) β frame β 543-dim landmark vector
- **Sign classifier:** trained-from-scratch small transformer over landmark sequences (WLASL Top-100 + ASL fingerspelling alphabet) β sign tokens
- **Sentence composer:** `meta-llama/Llama-3.1-8B-Instruct` β grammatical English sentence from sign-token stream
- **TTS:** `coqui/XTTS-v2` β audio
- **(Stretch) STT:** `openai/whisper-large-v3` β reverse direction (speech β on-screen text)
- Datasets: [WLASL](https://github.com/dxli94/WLASL) Top-100 subset + ASL fingerspelling alphabet (open)
- HF Hub artifact: `lucas-loo/signbridge-classifier` (trained classifier weights + model card with ROCm training config)
- License: MIT
- GitHub mirror: https://github.com/seekerPrice/signbridge
- HF Space URL: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
- Submission link: *fill in once started on lablab.ai*
## Run Commands
```bash
# Setup (one-time)
pip install -r requirements.txt
cp .env.example .env # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback)
# Dev β run Gradio Space locally
python app.py
# Dev β run inference backend (locally for dev, deploys to AMD Dev Cloud for production)
python -m signbridge.backend
# Train the sign classifier on WLASL Top-100 (run on AMD Dev Cloud Day 2)
python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30
# Tests
pytest
# Lint / format / type
ruff check . && mypy signbridge/
# Push HF Space update (auto-deploys on git push to HF remote)
git push huggingface main
```
## Workspace layout
```
/Users/lucaslt/Documents/side-gig/amd-hackathon/
βββ README.md # HF Space card via frontmatter
βββ LICENSE # MIT
βββ CLAUDE.md
βββ .claude/
βββ requirements.txt
βββ .env.example
βββ app.py # HF Space entry β Gradio
βββ signbridge/
β βββ __init__.py
β βββ space.py # Gradio UI
β βββ backend.py # FastAPI inference server
β βββ recognizer/
β β βββ __init__.py
β β βββ landmarks.py # MediaPipe Holistic wrapper
β β βββ classifier.py # trained sign classifier
β βββ composer/
β β βββ __init__.py
β β βββ sentence.py # Llama-3.1-8B sentence composer
β βββ voice/
β β βββ __init__.py
β β βββ tts.py # Coqui XTTS-v2
β βββ scripts/
β βββ __init__.py
β βββ train_classifier.py # WLASL training script
βββ data/
β βββ wlasl/ # gitignored β WLASL Top-100 dataset
βββ assets/
β βββ cover.png # 1280Γ640 HF Space + lablab cover
βββ tests/
β βββ golden/ # 30-sample gold set (Top-50 + alphabet)
βββ docs/
βββ walkthrough.md # technical walkthrough for submission
```
## References
- **Owner:** Lucas
- **Working dir:** `/Users/lucaslt/Documents/side-gig/amd-hackathon/`
- **Hackathon page:** https://lablab.ai/ai-hackathons/amd-developer
- **AMD article:** https://www.amd.com/en/developer/resources/technical-articles/2026/build-across-the-ai-stack--join-the-amd-x-lablab-ai-hackathon-.html
- **Track:** 3 (Vision & Multimodal AI). Extra Challenge (Build in Public) intentionally skipped 2026-05-07.
- **WLASL dataset:** https://github.com/dxli94/WLASL
- **MediaPipe Holistic:** https://developers.google.com/mediapipe/solutions/vision/holistic_landmarker
- **HF Space:** https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (moved to event org 2026-05-08)
- **GitHub mirror:** https://github.com/seekerPrice/signbridge (deployed 2026-05-07)
- **Submission link:** *fill in once started on lablab.ai*
- **Plan file:** `/Users/lucaslt/.claude/plans/first-need-to-change-sparkling-dawn.md`
---
## Progress log (newest first)
**2026-05-08 β Fix A: HF Space moved to event org.** Now at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. Eligible for HF Special Prize ranking. Personal-namespace `LucasLooTan/signbridge` left as-is (will mark private after the hackathon).
**2026-05-07 β GitHub repo + HF Space live.** GitHub: `seekerPrice/signbridge`. HF Space: `LucasLooTan/signbridge` (Gradio SDK 4.44.1, Apache 2.0). All 16 source files mirrored to both. Awaiting AMD Dev Cloud credit email to wire up real VLM endpoint.
**2026-05-07 β Dropped Build-in-Public extra challenge.** Track 3 only. Frees ~2 hours that were earmarked for the 2 social posts + the external-facing walkthrough framing. Walkthrough doc kept as an internal technical record but no longer a submission deliverable.
**2026-05-07 β Pivoted to SignBridge.** Re-scored against the four judging criteria: SignBridge wins on Originality (10) and Presentation (10) thanks to the live deaf-person-to-hearing-person demo. Business value also stronger (Sorenson VRS comparable, mandated interpreter budgets). Replaced Iris scaffold (`iris/` package, README, requirements deps) with `signbridge/` package. CLAUDE.md, plan file, README rewritten. Day 1 hello-world starts: MediaPipe Holistic on webcam, WLASL data download, Plan-B VLM test.
**2026-05-07 β Initial Iris scaffold (deprecated).** Bootstrapped repo with Iris (visually-impaired navigation) plan, requirements.txt, .gitignore, .env.example, README. Replaced same-day after re-evaluation; kept reusable pieces (.gitignore, structural choices).
|