Spaces:
Configuration error
Configuration error
docs: add live MI300X benchmark results - all 4 kernels compiled on gfx942
Browse files- README.md +13 -0
- docs/LIVE_RESULTS.md +14 -0
README.md
CHANGED
|
@@ -221,3 +221,16 @@ A basic weekend clone can chain hipify and an LLM. The differentiator is reliabl
|
|
| 221 |
## License
|
| 222 |
|
| 223 |
See `LICENSE`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
## License
|
| 222 |
|
| 223 |
See `LICENSE`.
|
| 224 |
+
|
| 225 |
+
## ✅ Live Results on AMD Instinct MI300X
|
| 226 |
+
|
| 227 |
+
All demo kernels migrated, compiled, and profiled on real MI300X hardware (AMD DevCloud, ROCm 7.2, gfx942).
|
| 228 |
+
|
| 229 |
+
| Kernel | Total Changes | Critical AMD Bugs Found | Status |
|
| 230 |
+
|--------|--------------|------------------------|--------|
|
| 231 |
+
| reduction | 9 | warp-32 final stage (silent wrong results) | ✅ Compiled |
|
| 232 |
+
| vector_add | 7 | threadIdx%32 wavefront mismatch | ✅ Compiled |
|
| 233 |
+
| matrix_multiply | 11 | warp-32 + LDS bank conflicts | ✅ Compiled |
|
| 234 |
+
| convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
|
| 235 |
+
|
| 236 |
+
`data_source: real_rocm` — verified on AMD DevCloud MI300X instance.
|
docs/LIVE_RESULTS.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Live Results — AMD Instinct MI300X (gfx942), ROCm 7.2
|
| 2 |
+
|
| 3 |
+
All kernels migrated and compiled successfully on real MI300X hardware.
|
| 4 |
+
|
| 5 |
+
| Kernel | CUDA Changes | LLM Fixes | Critical Bugs Found | Compiled on MI300X |
|
| 6 |
+
|--------|-------------|-----------|--------------------|--------------------|
|
| 7 |
+
| reduction | 7 hipify | 2 LLM | warp-32 final stage (silent wrong results on AMD) | ✅ |
|
| 8 |
+
| vector_add | 5 hipify | 2 LLM | threadIdx%32 wavefront mismatch | ✅ |
|
| 9 |
+
| matrix_multiply | 10 hipify | 1 LLM | warp-32 + LDS bank conflicts | ✅ |
|
| 10 |
+
| convolution_2d | 10 hipify | 3 LLM | warp-32 + LDS padding | ✅ |
|
| 11 |
+
|
| 12 |
+
Hardware: AMD Instinct MI300X VF (gfx942), 192GB HBM3
|
| 13 |
+
Software: ROCm 7.2, hipcc, rocprof
|
| 14 |
+
data_source: real_rocm (not mock)
|