File size: 16,539 Bytes
18d028b
 
 
 
 
 
 
 
 
 
 
e831a7f
 
18d028b
 
 
7b09c0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18d028b
 
8b64ea8
18d028b
 
 
8b64ea8
18d028b
 
 
8b64ea8
 
18d028b
 
8b64ea8
18d028b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e90289
18d028b
 
 
 
 
 
 
 
 
 
4499b6e
3a0014b
c334b5a
18d028b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e90289
18d028b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96fe5d4
 
18d028b
 
 
 
 
 
 
 
 
 
 
 
8b64ea8
18d028b
 
c334b5a
3a0014b
18d028b
 
 
 
 
 
 
c334b5a
 
3a0014b
 
8b64ea8
 
18d028b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# SignBridge β€” real-time ASL β†’ speech translation

Loaded when the working directory is inside `/Users/lucaslt/Documents/side-gig/amd-hackathon/`. Keep this file current: prepend a dated entry to the Progress log after every milestone. Prune entries older than 60 days unless they anchor a persistent fact.

---

## Standing rules

- **Never make assumptions β€” always look up answers online.** Before coding, configuring, or recommending anything, verify against authoritative sources (use `context7` for libraries / SDKs / APIs, `WebSearch` / `WebFetch` for everything else). Training data is stale; default-guesses waste time. This applies even to things that "seem obvious".
- **Use Superpowers skills for every suitable use case β€” especially planning.** Any planning, debugging, executing-from-plan, brainstorming, parallel-agent dispatch, TDD, or pre-completion verification goes through the matching `superpowers:*` skill (`superpowers:writing-plans`, `:executing-plans`, `:brainstorming`, `:systematic-debugging`, `:subagent-driven-development`, `:verification-before-completion`, `:test-driven-development`, `:dispatching-parallel-agents`). Free-form prose plans are not allowed.
- **Use the `deep-research` skill for deep academic research.** Multi-source comparison, literature review, state-of-the-art surveys, citation-tracked evidence β€” invoke `deep-research`, not ad-hoc web search.
- **Always do deep research / online research BEFORE making non-trivial decisions.** Any architectural choice, model pick, library selection, or competition-strategy call goes through `deep-research` (academic) or `WebSearch` / `context7` (practical) first. Document findings inline so the decision is auditable. Default-guesses based on training data or "what feels right" are not allowed; the cost of looking things up is small, the cost of building on a wrong assumption is large.
- **Use the `deep-check` skill for whole-repo audits before any submission, merge, or major checkpoint.** Run line-by-line bug + logic + security scan via `deep-check` after every meaningful change. Surface findings explicitly; fix blockers before declaring work done.

---

## Competition requirements (authoritative)

> Snapshot of the official AMD Developer Hackathon rules, captured 2026-05-08 from https://lablab.ai/ai-hackathons/amd-developer. **Read-only β€” never edit. If the lablab page changes, re-snapshot the entire section.**

### Hackathon: AMD Developer Hackathon (lablab.ai Β· sponsored by AMD + Akash Systems Β· partners: Hugging Face, Qwen)

### Hard deadlines (Malaysia Time)

| Event | Date / time |
|---|---|
| Hackathon kick-off | 2026-05-05 00:00 MYT |
| On-site (SF, by invitation only) | 2026-05-09 17:00 MYT β†’ 2026-05-10 03:00 MYT |
| Online build phase | open since kick-off |
| **Submission deadline** | **2026-05-11 03:00 MYT** |
| Live on-stage pitching (on-site only) | 2026-05-11 05:00 MYT |

### Targeted track: Track 3 β€” Vision & Multimodal AI

Verbatim from the lablab page:
- **Objective:** Build applications that process and understand multiple data types (Images, Video, Audio) using the massive memory bandwidth of AMD GPUs.
- **What to Build:** High-throughput industrial inspection, medical imaging analysis, or multimodal conversational assistants.
- **Tech Stack:** Multimodal models (like Llama 3.2 Vision, Qwen-VL) optimized for ROCm.
- **Compute Resource:** Access to AMD Instinct MI300X instances via AMD Developer Cloud.

### Submission flow (Hugging Face partnership)

Verbatim from lablab page β†’ "Technology Partners & Workshops" β†’ Hugging Face section:
1. Find a model on Hugging Face Hub to work with.
2. Build or fine-tune it using your AMD Developer Cloud credits.
3. **Publish your completed project as a Hugging Face Space within the event organization** β€” `lablab-ai-amd-developer-hackathon`.
4. Submit your Space link on lablab when you submit your project.

> Lucas joined the org and the Space lives at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge` (or will, once Fix A lands). Personal-namespace Spaces are NOT eligible for the HF Special Prize.

### Required submission deliverables (verbatim from "What to submit?")

**Basic Information:**
1. Project Title
2. Short Description
3. Long Description
4. Technology & Category Tags

**Cover Image and Presentation:**
5. Cover Image
6. Video Presentation
7. Slide Presentation

**App Hosting & Code Repository:**
8. Public GitHub Repository
9. Demo Application Platform (= Hugging Face Space)
10. Application URL

### Judging criteria (verbatim)

| Criterion | Definition |
|---|---|
| **Application of Technology** | How effectively the chosen model(s) are integrated into the solution. |
| **Presentation** | The clarity and effectiveness of the project presentation. |
| **Business Value** | The impact and practical value, considering how well it fits into business areas. |
| **Originality** | The uniqueness & creativity of the solution, highlighting approaches and ability to demonstrate behaviors. |

### Prize structure (verbatim from "Prizes")

- **Total prize pool: $21,500+**, sponsored by AMD and Akash Systems, plus an AMD hardware reward and exclusive Hugging Face prizes.
- πŸ† **Grand Prize: $5,000** β€” overall top project.
- **Exclusive Hardware Reward:** AMD Radeon AI PRO R9700 GPU β€” awarded for outstanding social engagement or project promotion.
- 🎨 **Track 3 β€” Vision & Multimodal AI**: 1st $2,500 Β· 2nd $1,500 Β· 3rd $1,000.
- πŸ€– Track 1 β€” AI Agents & Agentic Workflows: same tier.
- ⚑ Track 2 β€” Fine-Tuning on AMD GPUs: same tier.
- πŸ€— **Hugging Face Special Prize** (Space with the most likes in the event org):
  - 1st: 1 Reachy Mini Wireless + 6 months Hugging Face PRO + $500 Hugging Face Credits.
  - 2nd: 3 months Hugging Face PRO + $300 Hugging Face Credits.
  - 3rd: 2 months Hugging Face PRO + $200 Hugging Face Credits.

### Prize targets for SignBridge

- πŸ₯‡ **Track 3** (primary).
- πŸ€— **HF Special Prize** (most likes β€” requires Space in event org + sharing the link).
- πŸ† Grand Prize (aspirational).
- ❌ Build-in-Public extra: **dropped** by user direction 2026-05-07 (no tweet obligations; walkthrough kept as internal doc only).

### License rule

Per the Voluntary Participation & Prize Terms footer: *"Submissions must be original and MIT-compliant."* SignBridge ships under **MIT License** (originally drafted as Apache 2.0 β€” switched 2026-05-08 to satisfy the literal reading of "MIT-compliant").

### Tech stack constraints (per Track 3)

- **Compute:** AMD Instinct MI300X via AMD Developer Cloud (datacenter GPU, 192 GB HBM3, 5.3 TB/s memory bandwidth). Not Ryzen, not Radeon Pro β€” those are different AMD product lines.
- **Models:** Multimodal models optimized for ROCm. Examples called out by the rules: Llama 3.2 Vision, Qwen-VL family. SignBridge uses `Qwen/Qwen3-VL-8B-Instruct` (Qwen-VL family βœ“) for sign recognition + `meta-llama/Llama-3.1-8B-Instruct` for sentence composition + `coqui/XTTS-v2` for speech.
- **Frameworks:** ROCm + PyTorch + Hugging Face Optimum-AMD + vLLM (per the rules).

### Workshop references (provided by AMD)

- "Build and Deploy an AI App on AMD MI300X as a Hugging Face Space" β€” Steve Kimoi, lablab.ai
- "Getting Started on AMD Developer Cloud" β€” Maharshi Trivedi, AMD
- "AI Agents 101: Building AI Agents with MCP & Open-Source Inference" β€” Mahdi Ghodsi, AMD

---

## Status

Day 1 / ~4 β€” pivoted from Iris to SignBridge on 2026-05-07. **Submission deadline: 2026-05-11 03:00 MYT.** ~3.5 days remaining. AMD Developer Hackathon, **Track 3 β€” Vision & Multimodal AI** (only β€” Build-in-Public dropped 2026-05-07). Currently scaffolding + Day 1 hello-world.

## Goal

Win the AMD Developer Hackathon (LabLab.ai, May 2026), Track 3, with a real-time webcam-based ASL β†’ English speech translator. A deaf person signs β†’ AI speaks. The demo IS the project: judges literally see two people who couldn't communicate, now do.

### Success criteria

- Submission accepted by 2026-05-11 03:00 MYT β€” live HF Space (Gradio) URL + 2–3 min demo video + lablab.ai submission form complete.
- End-to-end working flow: webcam frame β†’ VLM recognizer β†’ Llama-3.1-8B sentence composer β†’ Coqui XTTS-v2 β†’ speech output. **≀ 2 s** from capture to start of speech.
- V1 use cases: (1) ASL fingerspelling alphabet A–Z + 0–9, (2) Top-50 WLASL signs (hello, thank you, name, please, …). Target β‰₯ 75% accuracy on a 30-sample gold set.
- Reverse direction (speech β†’ on-screen text for the deaf user) is a **stretch** for the buffer day only.
- Track 3: top-3 finish at minimum; gold target.

---

## Workflow tools

| Task | Skill / Plugin | Why |
|---|---|---|
| Planning (any non-trivial change) | `superpowers:writing-plans` | Hard rule β€” no free-form prose plans |
| Early-stage exploration | `superpowers:brainstorming` | Use before requirements firm |
| Executing the build plan | `superpowers:executing-plans` | Plan-driven implementation |
| Debugging | `superpowers:systematic-debugging` | Root-cause-first |
| Multi-agent / parallel sub-work | `superpowers:dispatching-parallel-agents` or `:subagent-driven-development` | Decompose by specialist |
| Pre-completion verification | `superpowers:verification-before-completion` | Don't claim done without checks |
| Test-driven implementation | `superpowers:test-driven-development` | Write test before code |
| Long-context cross-file analysis | `cc-gemini-plugin:gemini` | When 1M context window helps |
| Online docs lookup | `context7` (search/resolve) | "Verify online" rule β€” ROCm + HF + WLASL + MediaPipe specifics |
| Multi-source research with citations | `deep-research` | WLASL prior art, sign-language ML state of the art, ROCm performance |
| Whole-repo bug + logic audit | `deep-check` | 16-category systematic scan before submission |
| Second-opinion / rescue / stuck | `codex:rescue` | Hand off to Codex runtime |
| Code review (own work pre-submission) | `code-review:code-review` or `pr-review-toolkit:review-pr` | Style/bug/security pass before public release |
| Security review | `owasp-security` | OWASP Top 10 / ASVS β€” webcam + audio handling |
| Browser-based demo verification | `chrome-devtools-mcp:chrome-devtools` | Verify the HF Space before recording |
| Commit / push / PR | `commit-commands:commit-push-pr` | Standard commit flow |

**Hard rule:** every planning task goes through a `superpowers:*` skill β€” no free-form prose plans.

---

## Tech stack (locked)

- Languages: Python 3.12 (primary)
- Submission deliverable: Hugging Face Space (Gradio app, public, MIT)
- Inference backend: FastAPI on AMD Developer Cloud (single MI300X instance), exposed as OpenAI-compatible API
- Transport: HTTPS for V1; WebSocket only if latency demands it post-Day-2
- Pipeline (concurrent on one MI300X):
  - **Pose extraction:** MediaPipe Holistic (Google) β€” frame β†’ 543-dim landmark vector
  - **Sign classifier:** trained-from-scratch small transformer over landmark sequences (WLASL Top-100 + ASL fingerspelling alphabet) β†’ sign tokens
  - **Sentence composer:** `meta-llama/Llama-3.1-8B-Instruct` β†’ grammatical English sentence from sign-token stream
  - **TTS:** `coqui/XTTS-v2` β†’ audio
  - **(Stretch) STT:** `openai/whisper-large-v3` β†’ reverse direction (speech β†’ on-screen text)
- Datasets: [WLASL](https://github.com/dxli94/WLASL) Top-100 subset + ASL fingerspelling alphabet (open)
- HF Hub artifact: `lucas-loo/signbridge-classifier` (trained classifier weights + model card with ROCm training config)
- License: MIT
- GitHub mirror: https://github.com/seekerPrice/signbridge
- HF Space URL: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
- Submission link: *fill in once started on lablab.ai*

## Run Commands

```bash
# Setup (one-time)
pip install -r requirements.txt
cp .env.example .env  # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback)

# Dev β€” run Gradio Space locally
python app.py

# Dev β€” run inference backend (locally for dev, deploys to AMD Dev Cloud for production)
python -m signbridge.backend

# Train the sign classifier on WLASL Top-100 (run on AMD Dev Cloud Day 2)
python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30

# Tests
pytest

# Lint / format / type
ruff check . && mypy signbridge/

# Push HF Space update (auto-deploys on git push to HF remote)
git push huggingface main
```

## Workspace layout

```
/Users/lucaslt/Documents/side-gig/amd-hackathon/
β”œβ”€β”€ README.md                       # HF Space card via frontmatter
β”œβ”€β”€ LICENSE                         # MIT
β”œβ”€β”€ CLAUDE.md
β”œβ”€β”€ .claude/
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ app.py                          # HF Space entry β€” Gradio
β”œβ”€β”€ signbridge/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ space.py                    # Gradio UI
β”‚   β”œβ”€β”€ backend.py                  # FastAPI inference server
β”‚   β”œβ”€β”€ recognizer/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ landmarks.py            # MediaPipe Holistic wrapper
β”‚   β”‚   └── classifier.py           # trained sign classifier
β”‚   β”œβ”€β”€ composer/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── sentence.py             # Llama-3.1-8B sentence composer
β”‚   β”œβ”€β”€ voice/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── tts.py                  # Coqui XTTS-v2
β”‚   └── scripts/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── train_classifier.py     # WLASL training script
β”œβ”€β”€ data/
β”‚   └── wlasl/                      # gitignored β€” WLASL Top-100 dataset
β”œβ”€β”€ assets/
β”‚   └── cover.png                   # 1280Γ—640 HF Space + lablab cover
β”œβ”€β”€ tests/
β”‚   └── golden/                     # 30-sample gold set (Top-50 + alphabet)
└── docs/
    └── walkthrough.md              # technical walkthrough for submission
```

## References

- **Owner:** Lucas
- **Working dir:** `/Users/lucaslt/Documents/side-gig/amd-hackathon/`
- **Hackathon page:** https://lablab.ai/ai-hackathons/amd-developer
- **AMD article:** https://www.amd.com/en/developer/resources/technical-articles/2026/build-across-the-ai-stack--join-the-amd-x-lablab-ai-hackathon-.html
- **Track:** 3 (Vision & Multimodal AI). Extra Challenge (Build in Public) intentionally skipped 2026-05-07.
- **WLASL dataset:** https://github.com/dxli94/WLASL
- **MediaPipe Holistic:** https://developers.google.com/mediapipe/solutions/vision/holistic_landmarker
- **HF Space:** https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (moved to event org 2026-05-08)
- **GitHub mirror:** https://github.com/seekerPrice/signbridge (deployed 2026-05-07)
- **Submission link:** *fill in once started on lablab.ai*
- **Plan file:** `/Users/lucaslt/.claude/plans/first-need-to-change-sparkling-dawn.md`

---

## Progress log (newest first)

**2026-05-08 β€” Fix A: HF Space moved to event org.** Now at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. Eligible for HF Special Prize ranking. Personal-namespace `LucasLooTan/signbridge` left as-is (will mark private after the hackathon).

**2026-05-07 β€” GitHub repo + HF Space live.** GitHub: `seekerPrice/signbridge`. HF Space: `LucasLooTan/signbridge` (Gradio SDK 4.44.1, Apache 2.0). All 16 source files mirrored to both. Awaiting AMD Dev Cloud credit email to wire up real VLM endpoint.

**2026-05-07 β€” Dropped Build-in-Public extra challenge.** Track 3 only. Frees ~2 hours that were earmarked for the 2 social posts + the external-facing walkthrough framing. Walkthrough doc kept as an internal technical record but no longer a submission deliverable.

**2026-05-07 β€” Pivoted to SignBridge.** Re-scored against the four judging criteria: SignBridge wins on Originality (10) and Presentation (10) thanks to the live deaf-person-to-hearing-person demo. Business value also stronger (Sorenson VRS comparable, mandated interpreter budgets). Replaced Iris scaffold (`iris/` package, README, requirements deps) with `signbridge/` package. CLAUDE.md, plan file, README rewritten. Day 1 hello-world starts: MediaPipe Holistic on webcam, WLASL data download, Plan-B VLM test.

**2026-05-07 β€” Initial Iris scaffold (deprecated).** Bootstrapped repo with Iris (visually-impaired navigation) plan, requirements.txt, .gitignore, .env.example, README. Replaced same-day after re-evaluation; kept reusable pieces (.gitignore, structural choices).