Spaces:

lablab-ai-amd-developer-hackathon
/

ROCmPort-AI

Configuration error

App Files Files Community

tazwarrrr commited on 4 days ago

Commit

3de7600

1 Parent(s): b521314

fixing docs proof

Browse files

Files changed (4) hide show

README.md +23 -2
backend/agents/coordinator.py +3 -5
backend/tools/rocprof_wrapper.py +1 -1
docs/FAILURE_CASES.md +20 -0

README.md CHANGED Viewed

@@ -233,9 +233,26 @@ A basic weekend clone can chain hipify and an LLM. The differentiator is reliabl
 | Backend unavailable | Verify FastAPI server is running on port `8000`. |
 | No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
-## License
-See `LICENSE`.
 ## ✅ Live Results on AMD Instinct MI300X
@@ -249,3 +266,7 @@ All demo kernels migrated, compiled, and profiled on real MI300X hardware (AMD D
 | convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
 `data_source: real_rocm` — verified on AMD DevCloud MI300X instance.

 | Backend unavailable | Verify FastAPI server is running on port `8000`. |
 | No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
+## Why Not Just Use hipify?
+hipify-clang is AMD's official translation tool. ROCmPort AI uses it as a first pass. The problem is what hipify cannot catch.
+**The reduction kernel example:**
+hipify successfully translates `reduction.cu` — it compiles, it runs, it returns a result. No errors. But the result is silently wrong on AMD hardware.
+The root cause: line 59 assumes `warpSize=32` in the final unrolled reduction stage. On AMD, wavefront size is 64. Lanes 32–63 are skipped entirely in the final summation. The output looks plausible but is numerically incorrect.
+hipify has no knowledge of this. It performs mechanical API renaming. It cannot reason about hardware architecture assumptions baked into kernel logic.
+ROCmPort AI catches this before execution:
+- Static scanner flags line 59 as CRITICAL risk: "hardcoded warp-32 conditional — assumes NVIDIA warpSize=32. On AMD wavefront=64 this silently skips lanes 32–63"
+- LLM correction pass rewrites the final reduction stage to be wavefront-64 aware
+- Compiler + rocprof verification confirms the fix compiles and executes correctly on gfx942
+This is the gap between "it compiles" and "it is correct."
 ## ✅ Live Results on AMD Instinct MI300X
 | convolution_2d | 13 | warp-32 + LDS padding | ✅ Compiled |
 `data_source: real_rocm` — verified on AMD DevCloud MI300X instance.
+## License
+See `LICENSE`.

backend/agents/coordinator.py CHANGED Viewed

@@ -24,15 +24,15 @@ def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
     if complexity <= 3:
         manual_weeks = "1-2 weeks"
-        savings = "$5,000-$10,000"
         factor = "Low"
     elif complexity <= 7:
         manual_weeks = "3-6 weeks"
-        savings = "$20,000-$50,000"
         factor = "Medium"
     else:
         manual_weeks = "6-10 weeks"
-        savings = "$50,000-$100,000"
         factor = "High"
     return CostEstimate(
@@ -77,8 +77,6 @@ async def run_pipeline(
     simple_mode: bool = False,
 ) -> AsyncGenerator[AgentEvent, None]:
     """Run full pipeline and stream AgentEvent objects."""
-    _ = simple_mode
     yield AgentEvent(
         agent="analyzer",
         status=AgentStatus.RUNNING,

     if complexity <= 3:
         manual_weeks = "1-2 weeks"
+        savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
         factor = "Low"
     elif complexity <= 7:
         manual_weeks = "3-6 weeks"
+        savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
         factor = "Medium"
     else:
         manual_weeks = "6-10 weeks"
+        savings = f"~{complexity * 5}-{complexity * 10} eng-days × team rate (complexity {complexity}/10)"
         factor = "High"
     return CostEstimate(
     simple_mode: bool = False,
 ) -> AsyncGenerator[AgentEvent, None]:
     """Run full pipeline and stream AgentEvent objects."""
     yield AgentEvent(
         agent="analyzer",
         status=AgentStatus.RUNNING,

backend/tools/rocprof_wrapper.py CHANGED Viewed

@@ -56,7 +56,7 @@ class RocprofWrapper:
         """Run executable with rocprof profiling"""
         if not self.rocm_available:
             # Return mock profiling data
-            return self._get_mock_profiling_data()
         try:
             if args is None:

         """Run executable with rocprof profiling"""
         if not self.rocm_available:
             # Return mock profiling data
+            return self.get_mock_profiling_data()
         try:
             if args is None:

docs/FAILURE_CASES.md CHANGED Viewed

@@ -36,3 +36,23 @@ __device__ __forceinline__ unsigned lane_id() {
 ### Trust note
 This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.

 ### Trust note
 This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.
+## Failure Case: Library-Heavy CUDA Code (CUB, Thrust, cuDNN)
+**Input type**: CUDA kernels that call into CUB, Thrust, or cuDNN directly
+**Example pattern**:
+```cpp
+#include <cub/cub.cuh>
+cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
+```
+**What happens**: hipify-clang renames the include to `<hipcub/hipcub.hpp>` and the namespace to `hipcub`. ROCmPort AI passes this through. The translation is mechanically correct.
+**The limitation**: hipCUB API coverage is not 1:1 with CUB. Some primitives behave differently under ROCm, and performance characteristics differ significantly due to wavefront width. ROCmPort AI does not currently benchmark library calls against rocPRIM equivalents.
+**What ROCmPort AI does**: flags the library dependency in the static scan, marks it HIGH risk, and recommends manual review by a ROCm-experienced engineer.
+**What ROCmPort AI does not do**: guarantee correctness or performance parity for library-heavy code without human validation.
+**Fix requirement**: Manual comparison of CUB vs hipCUB primitive behavior for the specific use case, or replacement with rocPRIM equivalents.