Spaces:
Configuration error
Configuration error
fixing docs proof
Browse files- README.md +23 -2
- backend/agents/coordinator.py +3 -5
- backend/tools/rocprof_wrapper.py +1 -1
- docs/FAILURE_CASES.md +20 -0
README.md
CHANGED
|
@@ -233,9 +233,26 @@ A basic weekend clone can chain hipify and an LLM. The differentiator is reliabl
|
|
| 233 |
| Backend unavailable | Verify FastAPI server is running on port `8000`. |
|
| 234 |
| No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
|
| 235 |
|
| 236 |
-
## License
|
| 237 |
|
| 238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
|
| 240 |
## ✅ Live Results on AMD Instinct MI300X
|
| 241 |
|
|
@@ -249,3 +266,7 @@ All demo kernels migrated, compiled, and profiled on real MI300X hardware (AMD D
|
|
| 249 |
| convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
|
| 250 |
|
| 251 |
`data_source: real_rocm` — verified on AMD DevCloud MI300X instance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
| Backend unavailable | Verify FastAPI server is running on port `8000`. |
|
| 234 |
| No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
|
| 235 |
|
|
|
|
| 236 |
|
| 237 |
+
## Why Not Just Use hipify?
|
| 238 |
+
|
| 239 |
+
hipify-clang is AMD's official translation tool. ROCmPort AI uses it as a first pass. The problem is what hipify cannot catch.
|
| 240 |
+
|
| 241 |
+
**The reduction kernel example:**
|
| 242 |
+
|
| 243 |
+
hipify successfully translates `reduction.cu` — it compiles, it runs, it returns a result. No errors. But the result is silently wrong on AMD hardware.
|
| 244 |
+
|
| 245 |
+
The root cause: line 59 assumes `warpSize=32` in the final unrolled reduction stage. On AMD, wavefront size is 64. Lanes 32–63 are skipped entirely in the final summation. The output looks plausible but is numerically incorrect.
|
| 246 |
+
|
| 247 |
+
hipify has no knowledge of this. It performs mechanical API renaming. It cannot reason about hardware architecture assumptions baked into kernel logic.
|
| 248 |
+
|
| 249 |
+
ROCmPort AI catches this before execution:
|
| 250 |
+
|
| 251 |
+
- Static scanner flags line 59 as CRITICAL risk: "hardcoded warp-32 conditional — assumes NVIDIA warpSize=32. On AMD wavefront=64 this silently skips lanes 32–63"
|
| 252 |
+
- LLM correction pass rewrites the final reduction stage to be wavefront-64 aware
|
| 253 |
+
- Compiler + rocprof verification confirms the fix compiles and executes correctly on gfx942
|
| 254 |
+
|
| 255 |
+
This is the gap between "it compiles" and "it is correct."
|
| 256 |
|
| 257 |
## ✅ Live Results on AMD Instinct MI300X
|
| 258 |
|
|
|
|
| 266 |
| convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
|
| 267 |
|
| 268 |
`data_source: real_rocm` — verified on AMD DevCloud MI300X instance.
|
| 269 |
+
|
| 270 |
+
## License
|
| 271 |
+
|
| 272 |
+
See `LICENSE`.
|
backend/agents/coordinator.py
CHANGED
|
@@ -24,15 +24,15 @@ def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
|
|
| 24 |
|
| 25 |
if complexity <= 3:
|
| 26 |
manual_weeks = "1-2 weeks"
|
| 27 |
-
savings = "
|
| 28 |
factor = "Low"
|
| 29 |
elif complexity <= 7:
|
| 30 |
manual_weeks = "3-6 weeks"
|
| 31 |
-
savings = "
|
| 32 |
factor = "Medium"
|
| 33 |
else:
|
| 34 |
manual_weeks = "6-10 weeks"
|
| 35 |
-
savings = "
|
| 36 |
factor = "High"
|
| 37 |
|
| 38 |
return CostEstimate(
|
|
@@ -77,8 +77,6 @@ async def run_pipeline(
|
|
| 77 |
simple_mode: bool = False,
|
| 78 |
) -> AsyncGenerator[AgentEvent, None]:
|
| 79 |
"""Run full pipeline and stream AgentEvent objects."""
|
| 80 |
-
_ = simple_mode
|
| 81 |
-
|
| 82 |
yield AgentEvent(
|
| 83 |
agent="analyzer",
|
| 84 |
status=AgentStatus.RUNNING,
|
|
|
|
| 24 |
|
| 25 |
if complexity <= 3:
|
| 26 |
manual_weeks = "1-2 weeks"
|
| 27 |
+
savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
|
| 28 |
factor = "Low"
|
| 29 |
elif complexity <= 7:
|
| 30 |
manual_weeks = "3-6 weeks"
|
| 31 |
+
savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
|
| 32 |
factor = "Medium"
|
| 33 |
else:
|
| 34 |
manual_weeks = "6-10 weeks"
|
| 35 |
+
savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
|
| 36 |
factor = "High"
|
| 37 |
|
| 38 |
return CostEstimate(
|
|
|
|
| 77 |
simple_mode: bool = False,
|
| 78 |
) -> AsyncGenerator[AgentEvent, None]:
|
| 79 |
"""Run full pipeline and stream AgentEvent objects."""
|
|
|
|
|
|
|
| 80 |
yield AgentEvent(
|
| 81 |
agent="analyzer",
|
| 82 |
status=AgentStatus.RUNNING,
|
backend/tools/rocprof_wrapper.py
CHANGED
|
@@ -56,7 +56,7 @@ class RocprofWrapper:
|
|
| 56 |
"""Run executable with rocprof profiling"""
|
| 57 |
if not self.rocm_available:
|
| 58 |
# Return mock profiling data
|
| 59 |
-
return self.
|
| 60 |
|
| 61 |
try:
|
| 62 |
if args is None:
|
|
|
|
| 56 |
"""Run executable with rocprof profiling"""
|
| 57 |
if not self.rocm_available:
|
| 58 |
# Return mock profiling data
|
| 59 |
+
return self.get_mock_profiling_data()
|
| 60 |
|
| 61 |
try:
|
| 62 |
if args is None:
|
docs/FAILURE_CASES.md
CHANGED
|
@@ -36,3 +36,23 @@ __device__ __forceinline__ unsigned lane_id() {
|
|
| 36 |
|
| 37 |
### Trust note
|
| 38 |
This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
### Trust note
|
| 38 |
This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.
|
| 39 |
+
|
| 40 |
+
## Failure Case: Library-Heavy CUDA Code (CUB, Thrust, cuDNN)
|
| 41 |
+
|
| 42 |
+
**Input type**: CUDA kernels that call into CUB, Thrust, or cuDNN directly
|
| 43 |
+
|
| 44 |
+
**Example pattern**:
|
| 45 |
+
```cpp
|
| 46 |
+
#include <cub/cub.cuh>
|
| 47 |
+
cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
**What happens**: hipify-clang renames the include to `<hipcub/hipcub.hpp>` and the namespace to `hipcub`. ROCmPort AI passes this through. The translation is mechanically correct.
|
| 51 |
+
|
| 52 |
+
**The limitation**: hipCUB API coverage is not 1:1 with CUB. Some primitives behave differently under ROCm, and performance characteristics differ significantly due to wavefront width. ROCmPort AI does not currently benchmark library calls against rocPRIM equivalents.
|
| 53 |
+
|
| 54 |
+
**What ROCmPort AI does**: flags the library dependency in the static scan, marks it HIGH risk, and recommends manual review by a ROCm-experienced engineer.
|
| 55 |
+
|
| 56 |
+
**What ROCmPort AI does not do**: guarantee correctness or performance parity for library-heavy code without human validation.
|
| 57 |
+
|
| 58 |
+
**Fix requirement**: Manual comparison of CUB vs hipCUB primitive behavior for the specific use case, or replacement with rocPRIM equivalents.
|