Commit ·
a5ffd9e
1
Parent(s): 7dc8ce6
docs: README — fine-tune narrative + hybrid pipeline diagram + model URLs
Browse files
README.md
CHANGED
|
@@ -22,11 +22,26 @@ Submission for the **AMD Developer Hackathon** (LabLab.ai, May 2026) — **Track
|
|
| 22 |
## How it works
|
| 23 |
|
| 24 |
```
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
```
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## V1 use cases
|
| 32 |
|
|
@@ -37,7 +52,7 @@ V1 is **one-way**: deaf signs → hearing hears. Reverse direction (speech → o
|
|
| 37 |
|
| 38 |
## Why AMD
|
| 39 |
|
| 40 |
-
The MI300X
|
| 41 |
|
| 42 |
## Why this matters (business case)
|
| 43 |
|
|
|
|
| 22 |
## How it works
|
| 23 |
|
| 24 |
```
|
| 25 |
+
┌─► MediaPipe Hand → trained MLP (90% acc, 50ms CPU)
|
| 26 |
+
webcam frame ────┤ │
|
| 27 |
+
└─► fine-tuned Qwen3-VL-8B (LoRA on AMD MI300X)
|
| 28 |
+
│ (92% acc, motion + fallback)
|
| 29 |
+
▼
|
| 30 |
+
Qwen3-8B sentence composer
|
| 31 |
+
│ (AMD MI300X)
|
| 32 |
+
▼
|
| 33 |
+
Coqui XTTS-v2 TTS
|
| 34 |
+
│
|
| 35 |
+
▼
|
| 36 |
+
🔊 speech
|
| 37 |
```
|
| 38 |
|
| 39 |
+
A hybrid pipeline: a small classical-ML classifier handles static fingerspelling at 90% accuracy with 50 ms CPU latency; a LoRA-fine-tuned Qwen3-VL-8B handles motion-dependent signs and ambiguous static frames; Qwen3-8B turns sign tokens into natural English. The two LLMs run **concurrently on a single AMD Instinct MI300X** via vLLM 0.17.1 on ROCm 7.2 — combined ~34 GB on a 192 GB GPU.
|
| 40 |
+
|
| 41 |
+
The fine-tune itself was trained on a single MI300X in **54 minutes** with LoRA (rank 16, target q/k/v/o, 2 epochs on 9,786 ASL Alphabet samples). Final eval loss 0.48; gold-set accuracy 92.3% — a 4.8× lift over the 19.2% zero-shot baseline.
|
| 42 |
+
|
| 43 |
+
- Fine-tuned model: `huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl`
|
| 44 |
+
- Landmark classifier: `huggingface.co/LucasLooTan/signbridge-asl-classifier`
|
| 45 |
|
| 46 |
## V1 use cases
|
| 47 |
|
|
|
|
| 52 |
|
| 53 |
## Why AMD
|
| 54 |
|
| 55 |
+
The MI300X did three jobs in this project on a single GPU: (1) ran the LoRA fine-tune of Qwen3-VL-8B in 54 minutes; (2) hosts the merged model for inference via vLLM; (3) hosts the Qwen3-8B composer in parallel for sentence composition. 192 GB HBM3 means we never had to reload weights, swap, or shard between training and serving. NVIDIA H100 (80 GB) would require a 3-GPU cluster for the same V2 70B reasoner upgrade — practical accessibility tools running globally need the cost-and-availability profile that AMD enables.
|
| 56 |
|
| 57 |
## Why this matters (business case)
|
| 58 |
|