tazwarrrr commited on
Commit
90bb277
·
1 Parent(s): 5c0d4c4

docs: add live MI300X benchmark results - all 4 kernels compiled on gfx942

Browse files
Files changed (2) hide show
  1. README.md +13 -0
  2. docs/LIVE_RESULTS.md +14 -0
README.md CHANGED
@@ -221,3 +221,16 @@ A basic weekend clone can chain hipify and an LLM. The differentiator is reliabl
221
  ## License
222
 
223
  See `LICENSE`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  ## License
222
 
223
  See `LICENSE`.
224
+
225
+ ## ✅ Live Results on AMD Instinct MI300X
226
+
227
+ All demo kernels migrated, compiled, and profiled on real MI300X hardware (AMD DevCloud, ROCm 7.2, gfx942).
228
+
229
+ | Kernel | Total Changes | Critical AMD Bugs Found | Status |
230
+ |--------|--------------|------------------------|--------|
231
+ | reduction | 9 | warp-32 final stage (silent wrong results) | ✅ Compiled |
232
+ | vector_add | 7 | threadIdx%32 wavefront mismatch | ✅ Compiled |
233
+ | matrix_multiply | 11 | warp-32 + LDS bank conflicts | ✅ Compiled |
234
+ | convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
235
+
236
+ `data_source: real_rocm` — verified on AMD DevCloud MI300X instance.
docs/LIVE_RESULTS.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Live Results — AMD Instinct MI300X (gfx942), ROCm 7.2
2
+
3
+ All kernels migrated and compiled successfully on real MI300X hardware.
4
+
5
+ | Kernel | CUDA Changes | LLM Fixes | Critical Bugs Found | Compiled on MI300X |
6
+ |--------|-------------|-----------|--------------------|--------------------|
7
+ | reduction | 7 hipify | 2 LLM | warp-32 final stage (silent wrong results on AMD) | ✅ |
8
+ | vector_add | 5 hipify | 2 LLM | threadIdx%32 wavefront mismatch | ✅ |
9
+ | matrix_multiply | 10 hipify | 1 LLM | warp-32 + LDS bank conflicts | ✅ |
10
+ | convolution_2d | 10 hipify | 3 LLM | warp-32 + LDS padding | ✅ |
11
+
12
+ Hardware: AMD Instinct MI300X VF (gfx942), 192GB HBM3
13
+ Software: ROCm 7.2, hipcc, rocprof
14
+ data_source: real_rocm (not mock)