Spaces:
Build error
Build error
docs(rules): add 'research before decisions' + 'deep-check before checkpoints' to standing rules
e831a7f | # SignBridge β real-time ASL β speech translation | |
| Loaded when the working directory is inside `/Users/lucaslt/Documents/side-gig/amd-hackathon/`. Keep this file current: prepend a dated entry to the Progress log after every milestone. Prune entries older than 60 days unless they anchor a persistent fact. | |
| --- | |
| ## Standing rules | |
| - **Never make assumptions β always look up answers online.** Before coding, configuring, or recommending anything, verify against authoritative sources (use `context7` for libraries / SDKs / APIs, `WebSearch` / `WebFetch` for everything else). Training data is stale; default-guesses waste time. This applies even to things that "seem obvious". | |
| - **Use Superpowers skills for every suitable use case β especially planning.** Any planning, debugging, executing-from-plan, brainstorming, parallel-agent dispatch, TDD, or pre-completion verification goes through the matching `superpowers:*` skill (`superpowers:writing-plans`, `:executing-plans`, `:brainstorming`, `:systematic-debugging`, `:subagent-driven-development`, `:verification-before-completion`, `:test-driven-development`, `:dispatching-parallel-agents`). Free-form prose plans are not allowed. | |
| - **Use the `deep-research` skill for deep academic research.** Multi-source comparison, literature review, state-of-the-art surveys, citation-tracked evidence β invoke `deep-research`, not ad-hoc web search. | |
| - **Always do deep research / online research BEFORE making non-trivial decisions.** Any architectural choice, model pick, library selection, or competition-strategy call goes through `deep-research` (academic) or `WebSearch` / `context7` (practical) first. Document findings inline so the decision is auditable. Default-guesses based on training data or "what feels right" are not allowed; the cost of looking things up is small, the cost of building on a wrong assumption is large. | |
| - **Use the `deep-check` skill for whole-repo audits before any submission, merge, or major checkpoint.** Run line-by-line bug + logic + security scan via `deep-check` after every meaningful change. Surface findings explicitly; fix blockers before declaring work done. | |
| --- | |
| ## Competition requirements (authoritative) | |
| > Snapshot of the official AMD Developer Hackathon rules, captured 2026-05-08 from https://lablab.ai/ai-hackathons/amd-developer. **Read-only β never edit. If the lablab page changes, re-snapshot the entire section.** | |
| ### Hackathon: AMD Developer Hackathon (lablab.ai Β· sponsored by AMD + Akash Systems Β· partners: Hugging Face, Qwen) | |
| ### Hard deadlines (Malaysia Time) | |
| | Event | Date / time | | |
| |---|---| | |
| | Hackathon kick-off | 2026-05-05 00:00 MYT | | |
| | On-site (SF, by invitation only) | 2026-05-09 17:00 MYT β 2026-05-10 03:00 MYT | | |
| | Online build phase | open since kick-off | | |
| | **Submission deadline** | **2026-05-11 03:00 MYT** | | |
| | Live on-stage pitching (on-site only) | 2026-05-11 05:00 MYT | | |
| ### Targeted track: Track 3 β Vision & Multimodal AI | |
| Verbatim from the lablab page: | |
| - **Objective:** Build applications that process and understand multiple data types (Images, Video, Audio) using the massive memory bandwidth of AMD GPUs. | |
| - **What to Build:** High-throughput industrial inspection, medical imaging analysis, or multimodal conversational assistants. | |
| - **Tech Stack:** Multimodal models (like Llama 3.2 Vision, Qwen-VL) optimized for ROCm. | |
| - **Compute Resource:** Access to AMD Instinct MI300X instances via AMD Developer Cloud. | |
| ### Submission flow (Hugging Face partnership) | |
| Verbatim from lablab page β "Technology Partners & Workshops" β Hugging Face section: | |
| 1. Find a model on Hugging Face Hub to work with. | |
| 2. Build or fine-tune it using your AMD Developer Cloud credits. | |
| 3. **Publish your completed project as a Hugging Face Space within the event organization** β `lablab-ai-amd-developer-hackathon`. | |
| 4. Submit your Space link on lablab when you submit your project. | |
| > Lucas joined the org and the Space lives at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge` (or will, once Fix A lands). Personal-namespace Spaces are NOT eligible for the HF Special Prize. | |
| ### Required submission deliverables (verbatim from "What to submit?") | |
| **Basic Information:** | |
| 1. Project Title | |
| 2. Short Description | |
| 3. Long Description | |
| 4. Technology & Category Tags | |
| **Cover Image and Presentation:** | |
| 5. Cover Image | |
| 6. Video Presentation | |
| 7. Slide Presentation | |
| **App Hosting & Code Repository:** | |
| 8. Public GitHub Repository | |
| 9. Demo Application Platform (= Hugging Face Space) | |
| 10. Application URL | |
| ### Judging criteria (verbatim) | |
| | Criterion | Definition | | |
| |---|---| | |
| | **Application of Technology** | How effectively the chosen model(s) are integrated into the solution. | | |
| | **Presentation** | The clarity and effectiveness of the project presentation. | | |
| | **Business Value** | The impact and practical value, considering how well it fits into business areas. | | |
| | **Originality** | The uniqueness & creativity of the solution, highlighting approaches and ability to demonstrate behaviors. | | |
| ### Prize structure (verbatim from "Prizes") | |
| - **Total prize pool: $21,500+**, sponsored by AMD and Akash Systems, plus an AMD hardware reward and exclusive Hugging Face prizes. | |
| - π **Grand Prize: $5,000** β overall top project. | |
| - **Exclusive Hardware Reward:** AMD Radeon AI PRO R9700 GPU β awarded for outstanding social engagement or project promotion. | |
| - π¨ **Track 3 β Vision & Multimodal AI**: 1st $2,500 Β· 2nd $1,500 Β· 3rd $1,000. | |
| - π€ Track 1 β AI Agents & Agentic Workflows: same tier. | |
| - β‘ Track 2 β Fine-Tuning on AMD GPUs: same tier. | |
| - π€ **Hugging Face Special Prize** (Space with the most likes in the event org): | |
| - 1st: 1 Reachy Mini Wireless + 6 months Hugging Face PRO + $500 Hugging Face Credits. | |
| - 2nd: 3 months Hugging Face PRO + $300 Hugging Face Credits. | |
| - 3rd: 2 months Hugging Face PRO + $200 Hugging Face Credits. | |
| ### Prize targets for SignBridge | |
| - π₯ **Track 3** (primary). | |
| - π€ **HF Special Prize** (most likes β requires Space in event org + sharing the link). | |
| - π Grand Prize (aspirational). | |
| - β Build-in-Public extra: **dropped** by user direction 2026-05-07 (no tweet obligations; walkthrough kept as internal doc only). | |
| ### License rule | |
| Per the Voluntary Participation & Prize Terms footer: *"Submissions must be original and MIT-compliant."* SignBridge ships under **MIT License** (originally drafted as Apache 2.0 β switched 2026-05-08 to satisfy the literal reading of "MIT-compliant"). | |
| ### Tech stack constraints (per Track 3) | |
| - **Compute:** AMD Instinct MI300X via AMD Developer Cloud (datacenter GPU, 192 GB HBM3, 5.3 TB/s memory bandwidth). Not Ryzen, not Radeon Pro β those are different AMD product lines. | |
| - **Models:** Multimodal models optimized for ROCm. Examples called out by the rules: Llama 3.2 Vision, Qwen-VL family. SignBridge uses `Qwen/Qwen3-VL-8B-Instruct` (Qwen-VL family β) for sign recognition + `meta-llama/Llama-3.1-8B-Instruct` for sentence composition + `coqui/XTTS-v2` for speech. | |
| - **Frameworks:** ROCm + PyTorch + Hugging Face Optimum-AMD + vLLM (per the rules). | |
| ### Workshop references (provided by AMD) | |
| - "Build and Deploy an AI App on AMD MI300X as a Hugging Face Space" β Steve Kimoi, lablab.ai | |
| - "Getting Started on AMD Developer Cloud" β Maharshi Trivedi, AMD | |
| - "AI Agents 101: Building AI Agents with MCP & Open-Source Inference" β Mahdi Ghodsi, AMD | |
| --- | |
| ## Status | |
| Day 1 / ~4 β pivoted from Iris to SignBridge on 2026-05-07. **Submission deadline: 2026-05-11 03:00 MYT.** ~3.5 days remaining. AMD Developer Hackathon, **Track 3 β Vision & Multimodal AI** (only β Build-in-Public dropped 2026-05-07). Currently scaffolding + Day 1 hello-world. | |
| ## Goal | |
| Win the AMD Developer Hackathon (LabLab.ai, May 2026), Track 3, with a real-time webcam-based ASL β English speech translator. A deaf person signs β AI speaks. The demo IS the project: judges literally see two people who couldn't communicate, now do. | |
| ### Success criteria | |
| - Submission accepted by 2026-05-11 03:00 MYT β live HF Space (Gradio) URL + 2β3 min demo video + lablab.ai submission form complete. | |
| - End-to-end working flow: webcam frame β VLM recognizer β Llama-3.1-8B sentence composer β Coqui XTTS-v2 β speech output. **β€ 2 s** from capture to start of speech. | |
| - V1 use cases: (1) ASL fingerspelling alphabet AβZ + 0β9, (2) Top-50 WLASL signs (hello, thank you, name, please, β¦). Target β₯ 75% accuracy on a 30-sample gold set. | |
| - Reverse direction (speech β on-screen text for the deaf user) is a **stretch** for the buffer day only. | |
| - Track 3: top-3 finish at minimum; gold target. | |
| --- | |
| ## Workflow tools | |
| | Task | Skill / Plugin | Why | | |
| |---|---|---| | |
| | Planning (any non-trivial change) | `superpowers:writing-plans` | Hard rule β no free-form prose plans | | |
| | Early-stage exploration | `superpowers:brainstorming` | Use before requirements firm | | |
| | Executing the build plan | `superpowers:executing-plans` | Plan-driven implementation | | |
| | Debugging | `superpowers:systematic-debugging` | Root-cause-first | | |
| | Multi-agent / parallel sub-work | `superpowers:dispatching-parallel-agents` or `:subagent-driven-development` | Decompose by specialist | | |
| | Pre-completion verification | `superpowers:verification-before-completion` | Don't claim done without checks | | |
| | Test-driven implementation | `superpowers:test-driven-development` | Write test before code | | |
| | Long-context cross-file analysis | `cc-gemini-plugin:gemini` | When 1M context window helps | | |
| | Online docs lookup | `context7` (search/resolve) | "Verify online" rule β ROCm + HF + WLASL + MediaPipe specifics | | |
| | Multi-source research with citations | `deep-research` | WLASL prior art, sign-language ML state of the art, ROCm performance | | |
| | Whole-repo bug + logic audit | `deep-check` | 16-category systematic scan before submission | | |
| | Second-opinion / rescue / stuck | `codex:rescue` | Hand off to Codex runtime | | |
| | Code review (own work pre-submission) | `code-review:code-review` or `pr-review-toolkit:review-pr` | Style/bug/security pass before public release | | |
| | Security review | `owasp-security` | OWASP Top 10 / ASVS β webcam + audio handling | | |
| | Browser-based demo verification | `chrome-devtools-mcp:chrome-devtools` | Verify the HF Space before recording | | |
| | Commit / push / PR | `commit-commands:commit-push-pr` | Standard commit flow | | |
| **Hard rule:** every planning task goes through a `superpowers:*` skill β no free-form prose plans. | |
| --- | |
| ## Tech stack (locked) | |
| - Languages: Python 3.12 (primary) | |
| - Submission deliverable: Hugging Face Space (Gradio app, public, MIT) | |
| - Inference backend: FastAPI on AMD Developer Cloud (single MI300X instance), exposed as OpenAI-compatible API | |
| - Transport: HTTPS for V1; WebSocket only if latency demands it post-Day-2 | |
| - Pipeline (concurrent on one MI300X): | |
| - **Pose extraction:** MediaPipe Holistic (Google) β frame β 543-dim landmark vector | |
| - **Sign classifier:** trained-from-scratch small transformer over landmark sequences (WLASL Top-100 + ASL fingerspelling alphabet) β sign tokens | |
| - **Sentence composer:** `meta-llama/Llama-3.1-8B-Instruct` β grammatical English sentence from sign-token stream | |
| - **TTS:** `coqui/XTTS-v2` β audio | |
| - **(Stretch) STT:** `openai/whisper-large-v3` β reverse direction (speech β on-screen text) | |
| - Datasets: [WLASL](https://github.com/dxli94/WLASL) Top-100 subset + ASL fingerspelling alphabet (open) | |
| - HF Hub artifact: `lucas-loo/signbridge-classifier` (trained classifier weights + model card with ROCm training config) | |
| - License: MIT | |
| - GitHub mirror: https://github.com/seekerPrice/signbridge | |
| - HF Space URL: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge | |
| - Submission link: *fill in once started on lablab.ai* | |
| ## Run Commands | |
| ```bash | |
| # Setup (one-time) | |
| pip install -r requirements.txt | |
| cp .env.example .env # fill in HF_TOKEN, AMD_DEV_CLOUD_*, OPENAI_API_KEY (fallback) | |
| # Dev β run Gradio Space locally | |
| python app.py | |
| # Dev β run inference backend (locally for dev, deploys to AMD Dev Cloud for production) | |
| python -m signbridge.backend | |
| # Train the sign classifier on WLASL Top-100 (run on AMD Dev Cloud Day 2) | |
| python -m signbridge.scripts.train_classifier --dataset data/wlasl --epochs 30 | |
| # Tests | |
| pytest | |
| # Lint / format / type | |
| ruff check . && mypy signbridge/ | |
| # Push HF Space update (auto-deploys on git push to HF remote) | |
| git push huggingface main | |
| ``` | |
| ## Workspace layout | |
| ``` | |
| /Users/lucaslt/Documents/side-gig/amd-hackathon/ | |
| βββ README.md # HF Space card via frontmatter | |
| βββ LICENSE # MIT | |
| βββ CLAUDE.md | |
| βββ .claude/ | |
| βββ requirements.txt | |
| βββ .env.example | |
| βββ app.py # HF Space entry β Gradio | |
| βββ signbridge/ | |
| β βββ __init__.py | |
| β βββ space.py # Gradio UI | |
| β βββ backend.py # FastAPI inference server | |
| β βββ recognizer/ | |
| β β βββ __init__.py | |
| β β βββ landmarks.py # MediaPipe Holistic wrapper | |
| β β βββ classifier.py # trained sign classifier | |
| β βββ composer/ | |
| β β βββ __init__.py | |
| β β βββ sentence.py # Llama-3.1-8B sentence composer | |
| β βββ voice/ | |
| β β βββ __init__.py | |
| β β βββ tts.py # Coqui XTTS-v2 | |
| β βββ scripts/ | |
| β βββ __init__.py | |
| β βββ train_classifier.py # WLASL training script | |
| βββ data/ | |
| β βββ wlasl/ # gitignored β WLASL Top-100 dataset | |
| βββ assets/ | |
| β βββ cover.png # 1280Γ640 HF Space + lablab cover | |
| βββ tests/ | |
| β βββ golden/ # 30-sample gold set (Top-50 + alphabet) | |
| βββ docs/ | |
| βββ walkthrough.md # technical walkthrough for submission | |
| ``` | |
| ## References | |
| - **Owner:** Lucas | |
| - **Working dir:** `/Users/lucaslt/Documents/side-gig/amd-hackathon/` | |
| - **Hackathon page:** https://lablab.ai/ai-hackathons/amd-developer | |
| - **AMD article:** https://www.amd.com/en/developer/resources/technical-articles/2026/build-across-the-ai-stack--join-the-amd-x-lablab-ai-hackathon-.html | |
| - **Track:** 3 (Vision & Multimodal AI). Extra Challenge (Build in Public) intentionally skipped 2026-05-07. | |
| - **WLASL dataset:** https://github.com/dxli94/WLASL | |
| - **MediaPipe Holistic:** https://developers.google.com/mediapipe/solutions/vision/holistic_landmarker | |
| - **HF Space:** https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge (moved to event org 2026-05-08) | |
| - **GitHub mirror:** https://github.com/seekerPrice/signbridge (deployed 2026-05-07) | |
| - **Submission link:** *fill in once started on lablab.ai* | |
| - **Plan file:** `/Users/lucaslt/.claude/plans/first-need-to-change-sparkling-dawn.md` | |
| --- | |
| ## Progress log (newest first) | |
| **2026-05-08 β Fix A: HF Space moved to event org.** Now at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. Eligible for HF Special Prize ranking. Personal-namespace `LucasLooTan/signbridge` left as-is (will mark private after the hackathon). | |
| **2026-05-07 β GitHub repo + HF Space live.** GitHub: `seekerPrice/signbridge`. HF Space: `LucasLooTan/signbridge` (Gradio SDK 4.44.1, Apache 2.0). All 16 source files mirrored to both. Awaiting AMD Dev Cloud credit email to wire up real VLM endpoint. | |
| **2026-05-07 β Dropped Build-in-Public extra challenge.** Track 3 only. Frees ~2 hours that were earmarked for the 2 social posts + the external-facing walkthrough framing. Walkthrough doc kept as an internal technical record but no longer a submission deliverable. | |
| **2026-05-07 β Pivoted to SignBridge.** Re-scored against the four judging criteria: SignBridge wins on Originality (10) and Presentation (10) thanks to the live deaf-person-to-hearing-person demo. Business value also stronger (Sorenson VRS comparable, mandated interpreter budgets). Replaced Iris scaffold (`iris/` package, README, requirements deps) with `signbridge/` package. CLAUDE.md, plan file, README rewritten. Day 1 hello-world starts: MediaPipe Holistic on webcam, WLASL data download, Plan-B VLM test. | |
| **2026-05-07 β Initial Iris scaffold (deprecated).** Bootstrapped repo with Iris (visually-impaired navigation) plan, requirements.txt, .gitignore, .env.example, README. Replaced same-day after re-evaluation; kept reusable pieces (.gitignore, structural choices). | |