fixing bugs
Browse files- BENCHMARKS.md +70 -60
- README.md +150 -206
- backend/agents/analyzer.py +13 -9
- backend/agents/coordinator.py +179 -159
- backend/agents/optimizer.py +7 -5
- backend/agents/tester.py +53 -67
- backend/agents/translator.py +9 -6
- backend/main.py +149 -29
- backend/models.py +14 -2
- backend/prompts/coordinator_prompt.txt +1 -1
- backend/tools/hipify_wrapper.py +20 -15
- backend/tools/rocprof_wrapper.py +71 -69
- docs/FAILURE_CASES.md +38 -0
- docs/JUDGE_MODE.md +42 -0
- frontend/index.html +1410 -790
BENCHMARKS.md
CHANGED
|
@@ -1,82 +1,92 @@
|
|
| 1 |
-
# ROCmPort AI
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|--------|------|--------------|----------------|---------|-------|
|
| 7 |
-
| **Matrix Multiply** | 1024×1024 | 12.4ms | 9.5ms | **1.31x** | Shared memory tiling applied |
|
| 8 |
-
| **Vector Add** | 10M elements | 3.2ms | 2.9ms | **1.10x** | Memory coalescing fixed |
|
| 9 |
-
| **2D Convolution** | 256×256 | 28.7ms | 21.3ms | **1.35x** | LDS optimization applied |
|
| 10 |
-
| **Parallel Reduction** | 1M elements | 15.2ms | 12.1ms | **1.25x** | Warp-size aligned unrolling |
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
-
- **Compute-bound kernels** show moderate improvements (1.10-1.20x)
|
| 16 |
-
- **Shared memory tiling** is the most effective optimization
|
| 17 |
-
- **Wavefront alignment** consistently improves performance
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
-
-
|
| 23 |
-
- **Optimized ROCm**: 9.5ms (after agent optimizations)
|
| 24 |
-
- **Bandwidth Utilization**: 87% → 94%
|
| 25 |
-
- **Key Optimization**: 32×32 shared memory tiles
|
| 26 |
|
| 27 |
-
|
| 28 |
-
- **Baseline HIP**: 3.2ms
|
| 29 |
-
- **Optimized ROCm**: 2.9ms
|
| 30 |
-
- **Bandwidth Utilization**: 71% → 78%
|
| 31 |
-
- **Key Optimization**: Memory access coalescing
|
| 32 |
|
| 33 |
-
|
| 34 |
-
- **Baseline HIP**: 28.7ms
|
| 35 |
-
- **Optimized ROCm**: 21.3ms
|
| 36 |
-
- **Bandwidth Utilization**: 68% → 91%
|
| 37 |
-
- **Key Optimization**: LDS (Local Data Store) usage
|
| 38 |
|
| 39 |
-
|
| 40 |
-
-
|
| 41 |
-
- **Optimized ROCm**: 12.1ms
|
| 42 |
-
- **Bandwidth Utilization**: 74% → 89%
|
| 43 |
-
- **Key Optimization**: 64-thread wavefront aware unrolling
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
-
|
| 53 |
-
-
|
| 54 |
-
- **Compiler**: hipcc 6.2.0
|
| 55 |
-
- **Profiler**: rocprof v2
|
| 56 |
|
| 57 |
-
|
| 58 |
-
- **OS**: Ubuntu 22.04 LTS
|
| 59 |
-
- **Driver**: AMDGPU 23.40
|
| 60 |
-
- **CPU**: AMD EPYC 9654 (for comparison)
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
|
| 67 |
-
2. **Optimized**: ROCmPort AI agent pipeline applied
|
| 68 |
-
3. **Measurement**: rocprof with kernel execution counters
|
| 69 |
-
4. **Validation**: Output correctness verified via checksum
|
| 70 |
-
5. **Iterations**: 3 runs per kernel, median reported
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ROCmPort AI Benchmarking Guide
|
| 2 |
|
| 3 |
+
This document defines how to report performance without overclaiming.
|
| 4 |
|
| 5 |
+
## Reporting Principles
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
- Compare against a clearly stated baseline.
|
| 8 |
+
- Use reproducible runs with fixed input sizes and environment details.
|
| 9 |
+
- Include correctness checks before accepting performance numbers.
|
| 10 |
+
- Report failures and non-improving cases, not only wins.
|
| 11 |
|
| 12 |
+
## Baseline Definitions
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
Use one of these and name it explicitly in each table:
|
| 15 |
|
| 16 |
+
- Baseline A: Straight `hipify-clang` output with minimal manual edits.
|
| 17 |
+
- Baseline B: Existing hand-written HIP version from the team.
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
Recommended: use Baseline A for measuring migration automation value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
Quick answer format for live review:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
- Q: What is your baseline?
|
| 24 |
+
- A: Straight hipify output with minimal compile edits (Baseline A), measured on the same hardware and inputs.
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
## Required Environment Metadata
|
| 27 |
|
| 28 |
+
Always include:
|
| 29 |
|
| 30 |
+
- GPU model (for example MI300X) and memory size.
|
| 31 |
+
- ROCm version, compiler version, and profiler version.
|
| 32 |
+
- OS and driver versions.
|
| 33 |
+
- Kernel launch parameters and input sizes.
|
| 34 |
+
- Number of runs and aggregation rule (median recommended).
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## Required Measurement Fields
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
For each kernel tested, provide:
|
| 39 |
|
| 40 |
+
- Kernel name and workload shape.
|
| 41 |
+
- Baseline latency.
|
| 42 |
+
- Optimized latency.
|
| 43 |
+
- Speedup ratio.
|
| 44 |
+
- Correctness status (pass/fail and checksum or tolerance).
|
| 45 |
+
- Notes on optimization strategy.
|
| 46 |
|
| 47 |
+
Example table format:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
| Kernel | Shape | Baseline (ms) | Optimized (ms) | Speedup | Correctness | Notes |
|
| 50 |
+
|---|---|---:|---:|---:|---|---|
|
| 51 |
+
| matrix_multiply | 1024x1024 | 12.4 | 9.5 | 1.31x | pass | LDS tiling + wavefront-aware launch |
|
| 52 |
|
| 53 |
+
Include non-win cases in the same table. Example:
|
| 54 |
|
| 55 |
+
| Kernel | Shape | Baseline (ms) | Optimized (ms) | Speedup | Correctness | Notes |
|
| 56 |
+
|---|---|---:|---:|---:|---|---|
|
| 57 |
+
| sparse_scatter | 4M elements | 6.0 | 6.3 | 0.95x | pass | Irregular access pattern; optimization did not help |
|
| 58 |
|
| 59 |
+
## Reproducibility Checklist
|
| 60 |
|
| 61 |
+
Before publishing numbers, verify all items:
|
| 62 |
|
| 63 |
+
- Same input set for baseline and optimized runs.
|
| 64 |
+
- Warm-up runs excluded or consistently handled.
|
| 65 |
+
- At least 3 measured runs (prefer 5+) with median reported.
|
| 66 |
+
- No hidden manual edits after optimization output unless documented.
|
| 67 |
+
- Full command lines and profiler artifacts retained.
|
| 68 |
+
|
| 69 |
+
## Evidence Package for Review
|
| 70 |
+
|
| 71 |
+
A technical review package should include:
|
| 72 |
+
|
| 73 |
+
- CUDA source input.
|
| 74 |
+
- Baseline HIP output.
|
| 75 |
+
- Optimized HIP output.
|
| 76 |
+
- Compile logs and profiler summaries.
|
| 77 |
+
- Final report explaining what changed and why.
|
| 78 |
+
|
| 79 |
+
## Interpreting Results Responsibly
|
| 80 |
+
|
| 81 |
+
- Some kernels will regress or fail initially; this is normal for migration.
|
| 82 |
+
- Improvement ranges vary by memory behavior, occupancy, and control-flow patterns.
|
| 83 |
+
- Do not claim universal speedups.
|
| 84 |
+
|
| 85 |
+
Preferred claim style:
|
| 86 |
+
|
| 87 |
+
"ROCmPort AI improved X out of Y tested kernels against a stated baseline under reproducible MI300X conditions."
|
| 88 |
+
|
| 89 |
+
## Current Repository Status
|
| 90 |
+
|
| 91 |
+
The repository includes demo kernels intended to exercise migration behavior.
|
| 92 |
+
Treat any sample numbers as demonstrations unless accompanied by full reproducibility artifacts from your environment.
|
README.md
CHANGED
|
@@ -1,275 +1,219 @@
|
|
| 1 |
# ROCmPort AI
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
-
1. Paste CUDA code
|
| 11 |
-
2. AI detects issues (warp size, memory bottlenecks)
|
| 12 |
-
3. Converts to ROCm
|
| 13 |
-
4. Tries optimization → fails → retries
|
| 14 |
-
5. Shows real benchmark improvement on AMD GPU
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
-
# Windows
|
| 26 |
-
start.bat
|
| 27 |
-
|
| 28 |
-
# Linux/Mac
|
| 29 |
-
./start.sh
|
| 30 |
-
```
|
| 31 |
-
|
| 32 |
-
This will:
|
| 33 |
-
- Install all dependencies
|
| 34 |
-
- Create .env file from template
|
| 35 |
-
- Start the FastAPI server
|
| 36 |
-
- Open the web interface at `http://localhost:8000`
|
| 37 |
|
| 38 |
-
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
# Add your GROQ_API_KEY to .env file
|
| 45 |
-
uvicorn main:app --reload --port 8000
|
| 46 |
-
```
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
-
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
```
|
| 66 |
-
ROCmPort AI/
|
| 67 |
-
├── backend/
|
| 68 |
-
│ ├── main.py ← FastAPI + SSE streaming endpoint
|
| 69 |
-
│ ├── models.py ← All Pydantic schemas
|
| 70 |
-
│ ├── requirements.txt ← Dependencies (includes openai==1.47.0)
|
| 71 |
-
│ ├── agents/
|
| 72 |
-
│ │ ├── analyzer.py ← Warp size detection, workload classification
|
| 73 |
-
│ │ ├── translator.py ← hipify pass 1 + LLM pass 2
|
| 74 |
-
│ │ ├── optimizer.py ← AMD MI300X-specific optimizations
|
| 75 |
-
│ │ ├── tester.py ← Real rocprof OR mocked (controlled failure)
|
| 76 |
-
│ │ └── coordinator.py ← Full pipeline + retry loop
|
| 77 |
-
│ ├── tools/
|
| 78 |
-
│ │ ├── hipify_wrapper.py ← Real hipify-clang or Python fallback
|
| 79 |
-
│ │ ├── rocprof_wrapper.py ← hipcc compiler + rocprof parser
|
| 80 |
-
│ │ └── llm_client.py ← Groq ↔ vLLM swap for AMD Cloud
|
| 81 |
-
│ ├── demo_kernels/
|
| 82 |
-
│ │ ├── vector_add.cu ← Simple kernel with warp size bug
|
| 83 |
-
│ │ ├── matrix_multiply.cu ← Complex kernel with controlled failure
|
| 84 |
-
│ │ ├── convolution_2d.cu ← Advanced kernel for optimization demo
|
| 85 |
-
│ │ └── reduction.cu ← Classic reduction with warp size unroll bug
|
| 86 |
-
│ └── prompts/
|
| 87 |
-
│ ├── analyzer_prompt.txt
|
| 88 |
-
│ ├── translator_prompt.txt
|
| 89 |
-
│ ├── optimizer_prompt.txt
|
| 90 |
-
│ └── coordinator_prompt.txt
|
| 91 |
-
├── frontend/
|
| 92 |
-
│ └── index.html ← Full UI with dark terminal aesthetic
|
| 93 |
-
├── .env.example ← Environment variables template
|
| 94 |
-
├── start.bat ← Windows startup script
|
| 95 |
-
├── start.sh ← Linux/Mac startup script
|
| 96 |
-
└── README.md ← This file
|
| 97 |
-
```
|
| 98 |
-
|
| 99 |
-
---
|
| 100 |
-
|
| 101 |
-
## 🤖 The 5 Agents
|
| 102 |
-
|
| 103 |
-
### 1. **Analyzer** — Deep Code Analysis
|
| 104 |
-
- Detects all CUDA kernels and APIs
|
| 105 |
-
- **Critical**: Flags warp size assumptions (32→64 threads)
|
| 106 |
-
- Classifies workload: compute-bound vs memory-bound
|
| 107 |
-
- Identifies multi-GPU sharding (unnecessary on MI300X's 192GB)
|
| 108 |
-
|
| 109 |
-
### 2. **Translator** — Two-Pass Conversion
|
| 110 |
-
- **Pass 1**: hipify-clang for mechanical replacements (cuda→hip)
|
| 111 |
-
- **Pass 2**: LLM fixes what hipify misses (warp size, intrinsics)
|
| 112 |
-
- Tracks every change with confidence levels
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
-
|
| 121 |
-
- Compiles with hipcc
|
| 122 |
-
- Profiles with rocprof on real MI300X
|
| 123 |
-
- **Controlled failure**: Iteration 1 performs worse → triggers retry
|
| 124 |
-
- Iteration 2 shows improvement
|
| 125 |
|
| 126 |
-
##
|
| 127 |
-
- Manages retry loop when optimization fails
|
| 128 |
-
- Generates final migration report
|
| 129 |
-
- Explains AMD hardware advantages
|
| 130 |
|
| 131 |
-
|
| 132 |
|
| 133 |
-
|
|
|
|
|
|
|
| 134 |
|
| 135 |
-
##
|
| 136 |
|
| 137 |
-
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
-
|
| 144 |
-
GROQ_MODEL=llama-3.3-70b-versatile
|
| 145 |
|
| 146 |
-
|
| 147 |
-
USE_VLLM=true
|
| 148 |
-
VLLM_BASE_URL=http://your-amd-cloud:8000
|
| 149 |
-
VLLM_API_KEY=your_vllm_key
|
| 150 |
-
VLLM_MODEL=amd/llama-3.3-70b
|
| 151 |
-
|
| 152 |
-
# On AMD Cloud with real hardware
|
| 153 |
-
ROCM_AVAILABLE=true
|
| 154 |
-
HIPCC_PATH=hipcc
|
| 155 |
-
ROCPROF_PATH=rocprof
|
| 156 |
-
```
|
| 157 |
|
| 158 |
-
##
|
| 159 |
|
| 160 |
-
|
| 161 |
-
2. **vLLM (AMD Cloud)**: Deploy vLLM on MI300X with OpenAI-compatible API
|
| 162 |
|
| 163 |
-
-
|
| 164 |
|
| 165 |
-
|
| 166 |
|
| 167 |
-
|
| 168 |
|
| 169 |
-
|
| 170 |
-
2. **Matrix Multiply** - Shows shared memory tiling optimization
|
| 171 |
-
3. **2D Convolution** - Advanced memory access pattern optimization
|
| 172 |
-
4. **Parallel Reduction** - Demonstrates warp-size aware unrolling (32 vs 64)
|
| 173 |
|
| 174 |
-
|
|
|
|
|
|
|
| 175 |
|
| 176 |
-
|
|
|
|
|
|
|
| 177 |
|
| 178 |
-
##
|
| 179 |
|
| 180 |
-
simply set:
|
| 181 |
```bash
|
| 182 |
-
|
| 183 |
-
|
|
|
|
|
|
|
|
|
|
| 184 |
```
|
| 185 |
|
| 186 |
-
|
| 187 |
|
| 188 |
-
|
| 189 |
|
| 190 |
-
## 🔧 Development
|
| 191 |
-
|
| 192 |
-
### Running Tests
|
| 193 |
```bash
|
| 194 |
-
|
| 195 |
-
|
| 196 |
```
|
| 197 |
|
| 198 |
-
##
|
| 199 |
-
- **FastAPI** backend with SSE streaming
|
| 200 |
-
- **Vanilla JS** frontend (no heavy frameworks)
|
| 201 |
-
- **CrewAI** for agent orchestration
|
| 202 |
-
- **Pydantic** for data models
|
| 203 |
|
| 204 |
-
|
| 205 |
-
1. Fork the repository
|
| 206 |
-
2. Create feature branch
|
| 207 |
-
3. Test with demo kernels
|
| 208 |
-
4. Submit PR
|
| 209 |
|
| 210 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
-
|
| 213 |
|
| 214 |
-
##
|
| 215 |
|
| 216 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
|
| 218 |
-
|
| 219 |
|
| 220 |
-
|
| 221 |
|
| 222 |
```bash
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
export USE_VLLM=true
|
| 226 |
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
--gpu-memory-utilization 0.95
|
| 232 |
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
```
|
| 237 |
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
## 🔧 Troubleshooting
|
| 241 |
-
|
| 242 |
-
| Issue | Solution |
|
| 243 |
-
|-------|----------|
|
| 244 |
-
| **"GROQ_API_KEY not found"** | Add your API key to `.env` file from [console.groq.com](https://console.groq.com) |
|
| 245 |
-
| **"hipcc not found"** | Install ROCm: `sudo apt install rocm-dkms` or use AMD Cloud |
|
| 246 |
-
| **"Permission denied"** | Check file permissions: `chmod +x start.sh` |
|
| 247 |
-
| **Frontend not loading** | Ensure backend is running on port 8000 |
|
| 248 |
-
| **No speedup shown** | Check if `ROCM_AVAILABLE=true` for real hardware |
|
| 249 |
-
|
| 250 |
-
---
|
| 251 |
|
| 252 |
-
|
| 253 |
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
4. **Human Override Capability** - Developers can edit and re-test optimized code
|
| 258 |
-
5. **Cost Impact Analysis** - Shows real business value ($20k-$100k savings per module)
|
| 259 |
-
6. **Simple Mode Toggle** - "Explain Like I'm 5" makes complex concepts accessible
|
| 260 |
-
7. **Live SSE Streaming** - Real-time visibility into every agent decision
|
| 261 |
-
8. **GitHub PR Simulation** - One-click export with diffs and reports
|
| 262 |
-
9. **Predictive Analysis** - AI predicts performance gains before optimization
|
| 263 |
-
10. **Honest Performance Claims** - Compares optimized ROCm vs baseline HIP, not fabricated NVIDIA comparisons
|
| 264 |
|
| 265 |
-
|
| 266 |
|
| 267 |
-
##
|
| 268 |
|
| 269 |
-
|
| 270 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
|
| 272 |
-
|
| 273 |
-
[](https://github.com/tazwaryayyyy)
|
| 274 |
|
| 275 |
-
|
|
|
|
| 1 |
# ROCmPort AI
|
| 2 |
|
| 3 |
+
ROCmPort AI helps CUDA teams migrate to AMD by translating, testing, and iteratively optimizing kernels using real hardware feedback.
|
| 4 |
|
| 5 |
+
It is an acceleration system for migration work, not a one-click replacement for CUDA expertise.
|
| 6 |
|
| 7 |
+
## What This Project Is
|
| 8 |
|
| 9 |
+
ROCmPort AI orchestrates a migration loop:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
1. Analyze CUDA code and detect migration risks.
|
| 12 |
+
2. Translate with hipify plus LLM-assisted fixes.
|
| 13 |
+
3. Compile and profile with ROCm tooling.
|
| 14 |
+
4. Propose optimization changes and re-test.
|
| 15 |
+
5. Return artifacts and decision trace.
|
| 16 |
|
| 17 |
+
## What This Project Is Not
|
| 18 |
|
| 19 |
+
- Not guaranteed to auto-fix all CUDA kernels.
|
| 20 |
+
- Not a claim that every kernel improves.
|
| 21 |
+
- Not a replacement for domain experts in performance-critical code.
|
| 22 |
|
| 23 |
+
Complex kernels can fail conversion due to architecture assumptions, undefined behavior, inline PTX, or handcrafted memory logic. The value is reduced migration time and faster debug loops.
|
| 24 |
|
| 25 |
+
## Target User and Business Case
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
Primary product position:
|
| 28 |
+
- Tool for teams evaluating AMD migration cost and performance tradeoffs.
|
| 29 |
|
| 30 |
+
Typical use cases:
|
| 31 |
+
- Port legacy CUDA modules to HIP/ROCm with a measurable baseline.
|
| 32 |
+
- Build a migration backlog ranked by risk and expected impact.
|
| 33 |
+
- Identify kernels where MI300X memory capacity can remove sharding complexity.
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
Cost and performance impact should be calculated from your environment and workload, not fixed marketing ranges.
|
| 36 |
|
| 37 |
+
## AMD-Specific Technical Considerations (MI300X)
|
| 38 |
|
| 39 |
+
ROCmPort AI explicitly reasons about AMD constraints and opportunities, including:
|
| 40 |
|
| 41 |
+
- Wavefront size 64 (vs CUDA warp 32 assumptions), which affects reduction trees, ballot/shuffle idioms, and launch geometry.
|
| 42 |
+
- LDS (local data store) usage and bank behavior for tile staging and reuse.
|
| 43 |
+
- MI300X memory capacity (192GB HBM) and implications for reducing model/data sharding in some workflows.
|
| 44 |
+
- Memory access patterns and occupancy tradeoffs under ROCm compiler behavior.
|
| 45 |
|
| 46 |
+
These are the places where migration often breaks or underperforms even after a successful hipify pass.
|
| 47 |
|
| 48 |
+
### Concrete Wavefront Mismatch Example
|
| 49 |
|
| 50 |
+
From `backend/demo_kernels/reduction.cu`, the reduction tail assumes a 32-thread warp:
|
| 51 |
|
| 52 |
+
```cpp
|
| 53 |
+
// NVIDIA-style assumption (incorrect on AMD wavefront=64)
|
| 54 |
+
if (tid < 32) {
|
| 55 |
+
volatile float* vsmem = sdata;
|
| 56 |
+
vsmem[tid] += vsmem[tid + 32];
|
| 57 |
+
vsmem[tid] += vsmem[tid + 16];
|
| 58 |
+
...
|
| 59 |
+
}
|
| 60 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
A wavefront-aware correction expands the final stage to include the 64-wide lane behavior:
|
| 63 |
+
|
| 64 |
+
```cpp
|
| 65 |
+
// AMD-aware final reduction stage
|
| 66 |
+
if (tid < 64) {
|
| 67 |
+
volatile float* vsmem = sdata;
|
| 68 |
+
vsmem[tid] += vsmem[tid + 32];
|
| 69 |
+
if (tid < 32) {
|
| 70 |
+
vsmem[tid] += vsmem[tid + 16];
|
| 71 |
+
vsmem[tid] += vsmem[tid + 8];
|
| 72 |
+
vsmem[tid] += vsmem[tid + 4];
|
| 73 |
+
vsmem[tid] += vsmem[tid + 2];
|
| 74 |
+
vsmem[tid] += vsmem[tid + 1];
|
| 75 |
+
}
|
| 76 |
+
}
|
| 77 |
+
```
|
| 78 |
|
| 79 |
+
The key point is not the exact rewrite shape; it is that warp-size assumptions must be made explicit and re-validated on AMD.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
## Why This Is More Than Glue
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
ROCmPort AI combines existing tools, but its core value is the control system around them:
|
| 84 |
|
| 85 |
+
- Decision loop: detect failure/perf regressions, apply next strategy, re-run.
|
| 86 |
+
- Explainability: stream each step and rationale (SSE logs + final report).
|
| 87 |
+
- Verification: pair code changes with compile/test/profiler evidence.
|
| 88 |
|
| 89 |
+
## Judge Mode Walkthrough
|
| 90 |
|
| 91 |
+
Use this flow for technical review:
|
| 92 |
|
| 93 |
+
1. Show original CUDA kernel.
|
| 94 |
+
2. Show baseline HIP from straight hipify output.
|
| 95 |
+
3. Run ROCmPort AI and show per-agent trace.
|
| 96 |
+
4. Show final optimized HIP output.
|
| 97 |
+
5. Show measured result against the declared baseline.
|
| 98 |
+
6. Show one case with marginal gain or no gain.
|
| 99 |
|
| 100 |
+
This format makes the comparison falsifiable and avoids curated-demo concerns.
|
|
|
|
| 101 |
|
| 102 |
+
- Full walkthrough: `docs/JUDGE_MODE.md`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
## Documented Failure Case
|
| 105 |
|
| 106 |
+
At least one failure path is documented with source, output, root cause, and fix requirements:
|
|
|
|
| 107 |
|
| 108 |
+
- See `docs/FAILURE_CASES.md`.
|
| 109 |
|
| 110 |
+
This is intentional: credibility improves when the system's failure boundary is visible.
|
| 111 |
|
| 112 |
+
## Quick Start
|
| 113 |
|
| 114 |
+
### Option 1: Startup Script
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
+
```bash
|
| 117 |
+
# Windows
|
| 118 |
+
start.bat
|
| 119 |
|
| 120 |
+
# Linux/Mac
|
| 121 |
+
./start.sh
|
| 122 |
+
```
|
| 123 |
|
| 124 |
+
### Option 2: Manual
|
| 125 |
|
|
|
|
| 126 |
```bash
|
| 127 |
+
cd backend
|
| 128 |
+
pip install -r requirements.txt
|
| 129 |
+
cp .env.example .env
|
| 130 |
+
# add your GROQ_API_KEY
|
| 131 |
+
uvicorn main:app --reload --port 8000
|
| 132 |
```
|
| 133 |
|
| 134 |
+
Open `frontend/index.html` in a browser.
|
| 135 |
|
| 136 |
+
### Option 3: Docker
|
| 137 |
|
|
|
|
|
|
|
|
|
|
| 138 |
```bash
|
| 139 |
+
docker build -t rocmport-ai .
|
| 140 |
+
docker run -p 8000:8000 rocmport-ai
|
| 141 |
```
|
| 142 |
|
| 143 |
+
## Benchmarking and Reproducibility
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
+
Benchmark claims should always include:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
+
- Baseline definition (e.g., straight hipify output).
|
| 148 |
+
- Hardware/software versions.
|
| 149 |
+
- Input sizes and run counts.
|
| 150 |
+
- Correctness verification.
|
| 151 |
+
- Full logs or scripts to reproduce.
|
| 152 |
|
| 153 |
+
See `BENCHMARKS.md` for the recommended reporting format used by this repository.
|
| 154 |
|
| 155 |
+
## Project Structure
|
| 156 |
|
| 157 |
+
```text
|
| 158 |
+
ROCmPort AI/
|
| 159 |
+
├── backend/
|
| 160 |
+
│ ├── main.py
|
| 161 |
+
│ ├── models.py
|
| 162 |
+
│ ├── agents/
|
| 163 |
+
│ │ ├── analyzer.py
|
| 164 |
+
│ │ ├── translator.py
|
| 165 |
+
│ │ ├── optimizer.py
|
| 166 |
+
│ │ ├── tester.py
|
| 167 |
+
│ │ └── coordinator.py
|
| 168 |
+
│ ├── tools/
|
| 169 |
+
│ │ ├── hipify_wrapper.py
|
| 170 |
+
│ │ ├── rocprof_wrapper.py
|
| 171 |
+
│ │ └── llm_client.py
|
| 172 |
+
│ ├── demo_kernels/
|
| 173 |
+
│ └── prompts/
|
| 174 |
+
├── frontend/
|
| 175 |
+
│ └── index.html
|
| 176 |
+
├── BENCHMARKS.md
|
| 177 |
+
└── README.md
|
| 178 |
+
```
|
| 179 |
|
| 180 |
+
## Configuration
|
| 181 |
|
| 182 |
+
Copy `.env.example` to `.env`:
|
| 183 |
|
| 184 |
```bash
|
| 185 |
+
GROQ_API_KEY=your_key
|
| 186 |
+
GROQ_MODEL=llama-3.3-70b-versatile
|
|
|
|
| 187 |
|
| 188 |
+
USE_VLLM=true
|
| 189 |
+
VLLM_BASE_URL=http://your-amd-cloud:8000
|
| 190 |
+
VLLM_API_KEY=your_vllm_key
|
| 191 |
+
VLLM_MODEL=amd/llama-3.3-70b
|
|
|
|
| 192 |
|
| 193 |
+
ROCM_AVAILABLE=true
|
| 194 |
+
HIPCC_PATH=hipcc
|
| 195 |
+
ROCPROF_PATH=rocprof
|
| 196 |
```
|
| 197 |
|
| 198 |
+
## Defensible Scope
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
|
| 200 |
+
This project is harder to replicate than a thin wrapper because it couples:
|
| 201 |
|
| 202 |
+
- Multi-agent orchestration with retry decisions.
|
| 203 |
+
- Structured traceability across analysis, translation, optimization, and test phases.
|
| 204 |
+
- Integrated reporting where claims can be audited against intermediate artifacts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
+
A basic weekend clone can chain hipify and an LLM. The differentiator is reliable decision flow and evidence quality under failure.
|
| 207 |
|
| 208 |
+
## Troubleshooting
|
| 209 |
|
| 210 |
+
| Issue | Resolution |
|
| 211 |
+
|---|---|
|
| 212 |
+
| `GROQ_API_KEY not found` | Add key to `.env`. |
|
| 213 |
+
| `hipcc not found` | Install ROCm toolchain or run in an ROCm-enabled environment. |
|
| 214 |
+
| Backend unavailable | Verify FastAPI server is running on port `8000`. |
|
| 215 |
+
| No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
|
| 216 |
|
| 217 |
+
## License
|
|
|
|
| 218 |
|
| 219 |
+
See `LICENSE`.
|
backend/agents/analyzer.py
CHANGED
|
@@ -1,24 +1,28 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
from models import AnalyzerResult, WorkloadType
|
| 4 |
-
from tools.llm_client import LLMClient
|
| 5 |
-
from tools.json_utils import safe_json_loads
|
| 6 |
|
| 7 |
llm_client = LLMClient()
|
| 8 |
|
|
|
|
| 9 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 10 |
"""Wrapper for LLM client chat completion"""
|
| 11 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 12 |
|
|
|
|
| 13 |
def generate_prediction(workload_type: WorkloadType, line_count: int) -> str:
|
| 14 |
"""Generate performance prediction based on workload analysis"""
|
|
|
|
| 15 |
if workload_type == WorkloadType.MEMORY_BOUND:
|
| 16 |
-
return "🧠 Prediction: This kernel is memory-bound → HIGH potential gain on MI300X (5.3 TB/s vs H100 3.35 TB/s bandwidth)"
|
| 17 |
elif workload_type == WorkloadType.COMPUTE_BOUND:
|
| 18 |
-
return "🧠 Prediction: This kernel is compute-bound → MODERATE gain on MI300X (wavefront efficiency improvements)"
|
| 19 |
else:
|
| 20 |
return "🧠 Prediction: Unknown workload type → LIMITED gain prediction without further analysis"
|
| 21 |
|
|
|
|
| 22 |
SYSTEM_PROMPT = """You are an expert CUDA and GPU architecture engineer analyzing CUDA code before porting it to AMD ROCm/HIP.
|
| 23 |
|
| 24 |
Your job is to deeply analyze CUDA code and output a structured JSON analysis. Be specific and technical.
|
|
@@ -53,7 +57,7 @@ Respond ONLY with this exact JSON structure, no markdown, no extra text:
|
|
| 53 |
def run(cuda_code: str) -> AnalyzerResult:
|
| 54 |
# Count lines for complexity estimation
|
| 55 |
line_count = len([line for line in cuda_code.split('\n') if line.strip()])
|
| 56 |
-
|
| 57 |
try:
|
| 58 |
raw = chat_complete(
|
| 59 |
messages=[
|
|
@@ -77,7 +81,7 @@ def run(cuda_code: str) -> AnalyzerResult:
|
|
| 77 |
"line_count": line_count,
|
| 78 |
"complexity_score": 5
|
| 79 |
}
|
| 80 |
-
|
| 81 |
workload_type = WorkloadType(data.get("workload_type", "unknown"))
|
| 82 |
prediction = generate_prediction(workload_type, line_count)
|
| 83 |
|
|
|
|
| 1 |
+
# pylint: disable=broad-exception-caught
|
| 2 |
+
|
| 3 |
+
from ..models import AnalyzerResult, WorkloadType
|
| 4 |
+
from ..tools.llm_client import LLMClient
|
| 5 |
+
from ..tools.json_utils import safe_json_loads
|
| 6 |
|
| 7 |
llm_client = LLMClient()
|
| 8 |
|
| 9 |
+
|
| 10 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 11 |
"""Wrapper for LLM client chat completion"""
|
| 12 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 13 |
|
| 14 |
+
|
| 15 |
def generate_prediction(workload_type: WorkloadType, line_count: int) -> str:
|
| 16 |
"""Generate performance prediction based on workload analysis"""
|
| 17 |
+
size_hint = "large" if line_count and line_count > 200 else "small/medium"
|
| 18 |
if workload_type == WorkloadType.MEMORY_BOUND:
|
| 19 |
+
return f"🧠 Prediction: This {size_hint} kernel is memory-bound → HIGH potential gain on MI300X (5.3 TB/s vs H100 3.35 TB/s bandwidth)"
|
| 20 |
elif workload_type == WorkloadType.COMPUTE_BOUND:
|
| 21 |
+
return f"🧠 Prediction: This {size_hint} kernel is compute-bound → MODERATE gain on MI300X (wavefront efficiency improvements)"
|
| 22 |
else:
|
| 23 |
return "🧠 Prediction: Unknown workload type → LIMITED gain prediction without further analysis"
|
| 24 |
|
| 25 |
+
|
| 26 |
SYSTEM_PROMPT = """You are an expert CUDA and GPU architecture engineer analyzing CUDA code before porting it to AMD ROCm/HIP.
|
| 27 |
|
| 28 |
Your job is to deeply analyze CUDA code and output a structured JSON analysis. Be specific and technical.
|
|
|
|
| 57 |
def run(cuda_code: str) -> AnalyzerResult:
|
| 58 |
# Count lines for complexity estimation
|
| 59 |
line_count = len([line for line in cuda_code.split('\n') if line.strip()])
|
| 60 |
+
|
| 61 |
try:
|
| 62 |
raw = chat_complete(
|
| 63 |
messages=[
|
|
|
|
| 81 |
"line_count": line_count,
|
| 82 |
"complexity_score": 5
|
| 83 |
}
|
| 84 |
+
|
| 85 |
workload_type = WorkloadType(data.get("workload_type", "unknown"))
|
| 86 |
prediction = generate_prediction(workload_type, line_count)
|
| 87 |
|
backend/agents/coordinator.py
CHANGED
|
@@ -1,202 +1,224 @@
|
|
| 1 |
import asyncio
|
|
|
|
| 2 |
from typing import AsyncGenerator
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
)
|
| 7 |
-
from agents import analyzer, translator, optimizer, tester
|
| 8 |
|
| 9 |
|
| 10 |
def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
|
| 11 |
-
"""Calculate cost impact estimate based on code complexity"""
|
| 12 |
-
line_count = analyzer_result.line_count or 100
|
| 13 |
complexity = analyzer_result.complexity_score or 5
|
| 14 |
-
|
| 15 |
if complexity <= 3:
|
| 16 |
manual_weeks = "1-2 weeks"
|
| 17 |
savings = "$5,000-$10,000"
|
| 18 |
factor = "Low"
|
| 19 |
elif complexity <= 7:
|
| 20 |
-
manual_weeks = "3-6 weeks"
|
| 21 |
savings = "$20,000-$50,000"
|
| 22 |
factor = "Medium"
|
| 23 |
else:
|
| 24 |
manual_weeks = "6-10 weeks"
|
| 25 |
savings = "$50,000-$100,000"
|
| 26 |
factor = "High"
|
| 27 |
-
|
| 28 |
return CostEstimate(
|
| 29 |
manual_porting_weeks=manual_weeks,
|
| 30 |
-
rocmport_minutes="
|
| 31 |
estimated_savings=savings,
|
| 32 |
-
complexity_factor=factor
|
| 33 |
)
|
| 34 |
|
| 35 |
|
| 36 |
def simplify_explanation(report: FinalReport) -> str:
|
| 37 |
-
"""Convert technical
|
| 38 |
simple_text = report.amd_advantage_explanation
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
simple_text = simple_text.replace("3.35 TB/s", "slower memory access")
|
| 43 |
-
simple_text = simple_text.replace(
|
| 44 |
-
|
| 45 |
-
simple_text = simple_text.replace(
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
simple_text = simple_text.replace("coalescing", "accesses memory in order")
|
| 48 |
simple_text = simple_text.replace("optimization", "improvement")
|
| 49 |
simple_text = simple_text.replace("performance", "speed")
|
| 50 |
simple_text = simple_text.replace("benchmark", "test")
|
| 51 |
simple_text = simple_text.replace("iteration", "try")
|
| 52 |
-
|
| 53 |
-
# Make sentences more natural
|
| 54 |
simple_text = simple_text.replace("This kernel is", "This code is")
|
| 55 |
simple_text = simple_text.replace("The optimization", "The improvement")
|
| 56 |
simple_text = simple_text.replace("achieves", "gets")
|
| 57 |
simple_text = simple_text.replace("demonstrates", "shows")
|
| 58 |
-
|
| 59 |
return simple_text
|
| 60 |
|
| 61 |
|
| 62 |
-
async def run_pipeline(
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
|
|
|
|
|
|
| 71 |
|
| 72 |
try:
|
| 73 |
analyzer_result: AnalyzerResult = await asyncio.to_thread(analyzer.run, cuda_code)
|
| 74 |
except Exception as e:
|
| 75 |
-
yield AgentEvent(agent="analyzer", status=AgentStatus.FAILED,
|
| 76 |
-
message="Analysis failed", detail=str(e))
|
| 77 |
return
|
| 78 |
|
| 79 |
-
detail_parts = [
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
| 82 |
|
| 83 |
if analyzer_result.warp_size_issue:
|
| 84 |
-
detail_parts.append(
|
| 85 |
-
|
| 86 |
if analyzer_result.sharding_detected:
|
| 87 |
-
detail_parts.append(
|
| 88 |
-
|
| 89 |
-
# Add prediction if available
|
| 90 |
if analyzer_result.prediction:
|
| 91 |
detail_parts.append(analyzer_result.prediction)
|
| 92 |
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
complexity_factor="Medium"
|
| 103 |
-
)
|
| 104 |
-
|
| 105 |
-
yield AgentEvent(agent="analyzer", status=AgentStatus.DONE,
|
| 106 |
-
message=f"Found {len(analyzer_result.kernels_found)} kernel(s) | {analyzer_result.workload_type.value} workload | Difficulty: {analyzer_result.difficulty}",
|
| 107 |
-
detail="\n".join(detail_parts))
|
| 108 |
-
|
| 109 |
-
# ─── TRANSLATOR ──────────────────────────────────────────────
|
| 110 |
-
yield AgentEvent(agent="translator", status=AgentStatus.RUNNING,
|
| 111 |
-
message="Running hipify-clang (pass 1) then LLM correction (pass 2)...")
|
| 112 |
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
try:
|
| 116 |
-
translator_result: TranslatorResult = await asyncio.to_thread(
|
| 117 |
-
translator.run, cuda_code, analyzer_result
|
| 118 |
-
)
|
| 119 |
except Exception as e:
|
| 120 |
-
yield AgentEvent(agent="translator", status=AgentStatus.FAILED,
|
| 121 |
-
message="Translation failed", detail=str(e))
|
| 122 |
return
|
| 123 |
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
)
|
| 130 |
|
| 131 |
-
yield AgentEvent(
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
yield AgentEvent(agent="optimizer", status=AgentStatus.RUNNING,
|
| 137 |
-
message="Applying AMD MI300X-specific optimizations (iteration 1)...")
|
| 138 |
-
|
| 139 |
-
# Processing...
|
| 140 |
|
| 141 |
try:
|
| 142 |
optimizer_result: OptimizerResult = await asyncio.to_thread(
|
| 143 |
-
optimizer.run,
|
|
|
|
|
|
|
|
|
|
| 144 |
)
|
| 145 |
except Exception as e:
|
| 146 |
-
yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED,
|
| 147 |
-
message="Optimization failed", detail=str(e))
|
| 148 |
return
|
| 149 |
|
| 150 |
-
|
| 151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
)
|
| 153 |
-
yield AgentEvent(agent="optimizer", status=AgentStatus.DONE,
|
| 154 |
-
message=f"{len(optimizer_result.changes)} optimization(s) applied",
|
| 155 |
-
detail=changes_text)
|
| 156 |
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
|
| 163 |
try:
|
| 164 |
tester_result_1: TesterResult = await asyncio.to_thread(
|
| 165 |
-
tester.run,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
)
|
| 167 |
except Exception as e:
|
| 168 |
-
yield AgentEvent(agent="tester", status=AgentStatus.FAILED,
|
| 169 |
-
message="Testing failed", detail=str(e))
|
| 170 |
return
|
| 171 |
|
| 172 |
if not tester_result_1.success:
|
| 173 |
-
yield AgentEvent(
|
| 174 |
-
|
| 175 |
-
|
|
|
|
|
|
|
|
|
|
| 176 |
return
|
| 177 |
|
| 178 |
-
# ─── CONTROLLED FAILURE → RETRY LOOP ─────────────────────────
|
| 179 |
if tester_result_1.speedup < 1.0:
|
| 180 |
yield AgentEvent(
|
| 181 |
-
agent="tester",
|
| 182 |
-
|
| 183 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
)
|
| 185 |
|
| 186 |
yield AgentEvent(
|
| 187 |
-
agent="coordinator",
|
| 188 |
-
|
| 189 |
-
|
|
|
|
| 190 |
)
|
| 191 |
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
# Trace: Optimizer v2
|
| 200 |
|
| 201 |
try:
|
| 202 |
optimizer_result_2: OptimizerResult = await asyncio.to_thread(
|
|
@@ -204,31 +226,36 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
|
|
| 204 |
translator_result.hip_code,
|
| 205 |
analyzer_result,
|
| 206 |
2,
|
| 207 |
-
tester_result_1.notes
|
| 208 |
)
|
| 209 |
except Exception as e:
|
| 210 |
-
yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED,
|
| 211 |
-
message="Re-optimization failed", detail=str(e))
|
| 212 |
return
|
| 213 |
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
message="Re-profiling with alternative optimization (iteration 2)...")
|
| 222 |
|
| 223 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 224 |
|
| 225 |
try:
|
| 226 |
tester_result_final: TesterResult = await asyncio.to_thread(
|
| 227 |
-
tester.run,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
)
|
| 229 |
except Exception as e:
|
| 230 |
-
yield AgentEvent(agent="tester", status=AgentStatus.FAILED,
|
| 231 |
-
message="Re-testing failed", detail=str(e))
|
| 232 |
return
|
| 233 |
|
| 234 |
final_optimizer = optimizer_result_2
|
|
@@ -236,50 +263,45 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
|
|
| 236 |
tester_result_final = tester_result_1
|
| 237 |
final_optimizer = optimizer_result
|
| 238 |
|
| 239 |
-
# ─── TESTER FINAL RESULT ─────────────────────────────────────
|
| 240 |
yield AgentEvent(
|
| 241 |
agent="tester",
|
| 242 |
status=AgentStatus.DONE,
|
| 243 |
-
message=f"
|
| 244 |
detail=(
|
| 245 |
f"Execution time: {tester_result_final.execution_ms:.1f}ms\n"
|
| 246 |
f"Memory bandwidth: {tester_result_final.bandwidth_utilized:.1f}% utilized\n"
|
| 247 |
f"Bottleneck type: {tester_result_final.bottleneck}\n"
|
| 248 |
f"{tester_result_final.notes}"
|
| 249 |
-
)
|
| 250 |
)
|
| 251 |
|
| 252 |
-
|
| 253 |
-
yield AgentEvent(agent="coordinator", status=AgentStatus.RUNNING,
|
| 254 |
-
message="Generating migration report...")
|
| 255 |
|
| 256 |
-
|
|
|
|
| 257 |
|
| 258 |
-
amd_explanation = _build_amd_explanation(analyzer_result, tester_result_final)
|
| 259 |
-
|
| 260 |
-
# Calculate cost estimate
|
| 261 |
try:
|
| 262 |
cost_estimate = calculate_cost_estimate(analyzer_result)
|
| 263 |
-
except Exception
|
| 264 |
-
# Fallback cost estimate if calculation fails
|
| 265 |
cost_estimate = CostEstimate(
|
| 266 |
manual_porting_weeks="3-6 weeks",
|
| 267 |
-
rocmport_minutes="
|
| 268 |
estimated_savings="$20,000-$50,000",
|
| 269 |
-
complexity_factor="Medium"
|
| 270 |
)
|
| 271 |
-
|
| 272 |
-
# Always generate simplified explanation
|
| 273 |
temp_report = FinalReport(
|
| 274 |
migration_success=True,
|
| 275 |
speedup=tester_result_final.speedup,
|
| 276 |
bandwidth_utilized=tester_result_final.bandwidth_utilized,
|
| 277 |
-
total_changes=translator_result.total_changes +
|
|
|
|
| 278 |
bottleneck=tester_result_final.bottleneck,
|
| 279 |
amd_advantage_explanation=amd_explanation,
|
| 280 |
iterations=tester_result_final.iteration,
|
| 281 |
hip_code=translator_result.hip_code,
|
| 282 |
optimized_code=final_optimizer.optimized_code,
|
|
|
|
| 283 |
)
|
| 284 |
simplified_explanation = simplify_explanation(temp_report)
|
| 285 |
|
|
@@ -287,36 +309,34 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
|
|
| 287 |
migration_success=True,
|
| 288 |
speedup=tester_result_final.speedup,
|
| 289 |
bandwidth_utilized=tester_result_final.bandwidth_utilized,
|
| 290 |
-
total_changes=translator_result.total_changes +
|
|
|
|
| 291 |
bottleneck=tester_result_final.bottleneck,
|
| 292 |
amd_advantage_explanation=amd_explanation,
|
| 293 |
iterations=tester_result_final.iteration,
|
| 294 |
hip_code=translator_result.hip_code,
|
| 295 |
optimized_code=final_optimizer.optimized_code,
|
|
|
|
| 296 |
cost_estimate=cost_estimate,
|
| 297 |
-
simplified_explanation=simplified_explanation
|
| 298 |
)
|
| 299 |
|
| 300 |
-
import json
|
| 301 |
yield AgentEvent(
|
| 302 |
agent="coordinator",
|
| 303 |
status=AgentStatus.DONE,
|
| 304 |
message="Migration complete",
|
| 305 |
-
detail=json.dumps(report.model_dump())
|
| 306 |
)
|
| 307 |
|
| 308 |
|
| 309 |
def _build_amd_explanation(analyzer_result: AnalyzerResult, tester_result: TesterResult) -> str:
|
| 310 |
if analyzer_result.workload_type == WorkloadType.MEMORY_BOUND:
|
| 311 |
return (
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
f"
|
| 315 |
-
f"meaning this workload extracts full value from AMD's memory architecture."
|
| 316 |
-
)
|
| 317 |
-
else:
|
| 318 |
-
return (
|
| 319 |
-
f"This is a compute-bound kernel. MI300X delivers 1.3 PFLOPS FP16 "
|
| 320 |
-
f"vs H100's 989 TFLOPS — 31% more raw throughput. "
|
| 321 |
-
f"After wavefront-aligned optimization, compute utilization improved significantly."
|
| 322 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import asyncio
|
| 2 |
+
import json
|
| 3 |
from typing import AsyncGenerator
|
| 4 |
+
|
| 5 |
+
# pylint: disable=broad-exception-caught
|
| 6 |
+
|
| 7 |
+
from . import analyzer, optimizer, tester, translator
|
| 8 |
+
from ..models import (
|
| 9 |
+
AgentEvent,
|
| 10 |
+
AgentStatus,
|
| 11 |
+
AnalyzerResult,
|
| 12 |
+
CostEstimate,
|
| 13 |
+
FinalReport,
|
| 14 |
+
OptimizerResult,
|
| 15 |
+
TesterResult,
|
| 16 |
+
TranslatorResult,
|
| 17 |
+
WorkloadType,
|
| 18 |
)
|
|
|
|
| 19 |
|
| 20 |
|
| 21 |
def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
|
| 22 |
+
"""Calculate cost impact estimate based on code complexity."""
|
|
|
|
| 23 |
complexity = analyzer_result.complexity_score or 5
|
| 24 |
+
|
| 25 |
if complexity <= 3:
|
| 26 |
manual_weeks = "1-2 weeks"
|
| 27 |
savings = "$5,000-$10,000"
|
| 28 |
factor = "Low"
|
| 29 |
elif complexity <= 7:
|
| 30 |
+
manual_weeks = "3-6 weeks"
|
| 31 |
savings = "$20,000-$50,000"
|
| 32 |
factor = "Medium"
|
| 33 |
else:
|
| 34 |
manual_weeks = "6-10 weeks"
|
| 35 |
savings = "$50,000-$100,000"
|
| 36 |
factor = "High"
|
| 37 |
+
|
| 38 |
return CostEstimate(
|
| 39 |
manual_porting_weeks=manual_weeks,
|
| 40 |
+
rocmport_minutes="Varies by kernel",
|
| 41 |
estimated_savings=savings,
|
| 42 |
+
complexity_factor=factor,
|
| 43 |
)
|
| 44 |
|
| 45 |
|
| 46 |
def simplify_explanation(report: FinalReport) -> str:
|
| 47 |
+
"""Convert technical explanation to simpler wording for explain mode."""
|
| 48 |
simple_text = report.amd_advantage_explanation
|
| 49 |
+
|
| 50 |
+
simple_text = simple_text.replace(
|
| 51 |
+
"5.3 TB/s memory bandwidth", "much faster memory access")
|
| 52 |
simple_text = simple_text.replace("3.35 TB/s", "slower memory access")
|
| 53 |
+
simple_text = simple_text.replace(
|
| 54 |
+
"memory-bound", "needs to move a lot of data")
|
| 55 |
+
simple_text = simple_text.replace(
|
| 56 |
+
"compute-bound", "does a lot of calculations")
|
| 57 |
+
simple_text = simple_text.replace(
|
| 58 |
+
"wavefront", "group of threads working together")
|
| 59 |
+
simple_text = simple_text.replace(
|
| 60 |
+
"shared memory tiling", "shares data between threads efficiently")
|
| 61 |
simple_text = simple_text.replace("coalescing", "accesses memory in order")
|
| 62 |
simple_text = simple_text.replace("optimization", "improvement")
|
| 63 |
simple_text = simple_text.replace("performance", "speed")
|
| 64 |
simple_text = simple_text.replace("benchmark", "test")
|
| 65 |
simple_text = simple_text.replace("iteration", "try")
|
| 66 |
+
|
|
|
|
| 67 |
simple_text = simple_text.replace("This kernel is", "This code is")
|
| 68 |
simple_text = simple_text.replace("The optimization", "The improvement")
|
| 69 |
simple_text = simple_text.replace("achieves", "gets")
|
| 70 |
simple_text = simple_text.replace("demonstrates", "shows")
|
|
|
|
| 71 |
return simple_text
|
| 72 |
|
| 73 |
|
| 74 |
+
async def run_pipeline(
|
| 75 |
+
cuda_code: str,
|
| 76 |
+
kernel_name: str = "custom",
|
| 77 |
+
simple_mode: bool = False,
|
| 78 |
+
) -> AsyncGenerator[AgentEvent, None]:
|
| 79 |
+
"""Run full pipeline and stream AgentEvent objects."""
|
| 80 |
+
_ = simple_mode
|
| 81 |
|
| 82 |
+
yield AgentEvent(
|
| 83 |
+
agent="analyzer",
|
| 84 |
+
status=AgentStatus.RUNNING,
|
| 85 |
+
message="Scanning CUDA code for kernels, APIs, and hardware-specific issues...",
|
| 86 |
+
)
|
| 87 |
|
| 88 |
try:
|
| 89 |
analyzer_result: AnalyzerResult = await asyncio.to_thread(analyzer.run, cuda_code)
|
| 90 |
except Exception as e:
|
| 91 |
+
yield AgentEvent(agent="analyzer", status=AgentStatus.FAILED, message="Analysis failed", detail=str(e))
|
|
|
|
| 92 |
return
|
| 93 |
|
| 94 |
+
detail_parts = [
|
| 95 |
+
f"Found {len(analyzer_result.kernels_found)} kernel(s): {', '.join(analyzer_result.kernels_found)}",
|
| 96 |
+
f"Workload: {analyzer_result.workload_type.value}",
|
| 97 |
+
f"Difficulty: {analyzer_result.difficulty} - {analyzer_result.difficulty_reason}",
|
| 98 |
+
]
|
| 99 |
|
| 100 |
if analyzer_result.warp_size_issue:
|
| 101 |
+
detail_parts.append(
|
| 102 |
+
f"WARP SIZE ISSUE: {analyzer_result.warp_size_detail}")
|
| 103 |
if analyzer_result.sharding_detected:
|
| 104 |
+
detail_parts.append(
|
| 105 |
+
"Multi-GPU sharding detected; review if needed on MI300X memory capacity.")
|
|
|
|
| 106 |
if analyzer_result.prediction:
|
| 107 |
detail_parts.append(analyzer_result.prediction)
|
| 108 |
|
| 109 |
+
yield AgentEvent(
|
| 110 |
+
agent="analyzer",
|
| 111 |
+
status=AgentStatus.DONE,
|
| 112 |
+
message=(
|
| 113 |
+
f"Found {len(analyzer_result.kernels_found)} kernel(s) | "
|
| 114 |
+
f"{analyzer_result.workload_type.value} workload | Difficulty: {analyzer_result.difficulty}"
|
| 115 |
+
),
|
| 116 |
+
detail="\n".join(detail_parts),
|
| 117 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
+
yield AgentEvent(
|
| 120 |
+
agent="translator",
|
| 121 |
+
status=AgentStatus.RUNNING,
|
| 122 |
+
message="Running hipify-clang (pass 1) then LLM correction (pass 2)...",
|
| 123 |
+
)
|
| 124 |
|
| 125 |
try:
|
| 126 |
+
translator_result: TranslatorResult = await asyncio.to_thread(translator.run, cuda_code, analyzer_result)
|
|
|
|
|
|
|
| 127 |
except Exception as e:
|
| 128 |
+
yield AgentEvent(agent="translator", status=AgentStatus.FAILED, message="Translation failed", detail=str(e))
|
|
|
|
| 129 |
return
|
| 130 |
|
| 131 |
+
yield AgentEvent(
|
| 132 |
+
agent="translator",
|
| 133 |
+
status=AgentStatus.DONE,
|
| 134 |
+
message=(
|
| 135 |
+
f"{translator_result.total_changes} changes "
|
| 136 |
+
f"({translator_result.hipify_changes} hipify + {translator_result.llm_changes} LLM)"
|
| 137 |
+
),
|
| 138 |
+
detail=(
|
| 139 |
+
f"Total changes: {translator_result.total_changes} "
|
| 140 |
+
f"({translator_result.hipify_changes} hipify, {translator_result.llm_changes} LLM)\n"
|
| 141 |
+
f"Warp size corrected: {analyzer_result.warp_size_issue}\n"
|
| 142 |
+
"Kernel launch syntax updated"
|
| 143 |
+
),
|
| 144 |
)
|
| 145 |
|
| 146 |
+
yield AgentEvent(
|
| 147 |
+
agent="optimizer",
|
| 148 |
+
status=AgentStatus.RUNNING,
|
| 149 |
+
message="Applying AMD MI300X-specific optimizations (iteration 1)...",
|
| 150 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
try:
|
| 153 |
optimizer_result: OptimizerResult = await asyncio.to_thread(
|
| 154 |
+
optimizer.run,
|
| 155 |
+
translator_result.hip_code,
|
| 156 |
+
analyzer_result,
|
| 157 |
+
1,
|
| 158 |
)
|
| 159 |
except Exception as e:
|
| 160 |
+
yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED, message="Optimization failed", detail=str(e))
|
|
|
|
| 161 |
return
|
| 162 |
|
| 163 |
+
yield AgentEvent(
|
| 164 |
+
agent="optimizer",
|
| 165 |
+
status=AgentStatus.DONE,
|
| 166 |
+
message=f"{len(optimizer_result.changes)} optimization(s) applied",
|
| 167 |
+
detail="\n".join(
|
| 168 |
+
f"- {c['description']}" for c in optimizer_result.changes),
|
| 169 |
)
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
+
yield AgentEvent(
|
| 172 |
+
agent="tester",
|
| 173 |
+
status=AgentStatus.RUNNING,
|
| 174 |
+
message="Compiling with hipcc and profiling with rocprof (iteration 1)...",
|
| 175 |
+
)
|
| 176 |
|
| 177 |
try:
|
| 178 |
tester_result_1: TesterResult = await asyncio.to_thread(
|
| 179 |
+
tester.run,
|
| 180 |
+
optimizer_result.optimized_code,
|
| 181 |
+
analyzer_result,
|
| 182 |
+
1,
|
| 183 |
+
kernel_name,
|
| 184 |
)
|
| 185 |
except Exception as e:
|
| 186 |
+
yield AgentEvent(agent="tester", status=AgentStatus.FAILED, message="Testing failed", detail=str(e))
|
|
|
|
| 187 |
return
|
| 188 |
|
| 189 |
if not tester_result_1.success:
|
| 190 |
+
yield AgentEvent(
|
| 191 |
+
agent="tester",
|
| 192 |
+
status=AgentStatus.FAILED,
|
| 193 |
+
message="Compilation or profiling failed",
|
| 194 |
+
detail=tester_result_1.notes,
|
| 195 |
+
)
|
| 196 |
return
|
| 197 |
|
|
|
|
| 198 |
if tester_result_1.speedup < 1.0:
|
| 199 |
yield AgentEvent(
|
| 200 |
+
agent="tester",
|
| 201 |
+
status=AgentStatus.FAILED,
|
| 202 |
+
message=f"Iteration 1: {tester_result_1.speedup}x vs baseline HIP (regression)",
|
| 203 |
+
detail=(
|
| 204 |
+
f"Bandwidth utilized: {tester_result_1.bandwidth_utilized}%\n"
|
| 205 |
+
f"{tester_result_1.notes}"
|
| 206 |
+
),
|
| 207 |
)
|
| 208 |
|
| 209 |
yield AgentEvent(
|
| 210 |
+
agent="coordinator",
|
| 211 |
+
status=AgentStatus.RUNNING,
|
| 212 |
+
message="Performance regressed, retrying optimizer with profiler feedback...",
|
| 213 |
+
detail=f"Profiler feedback: {tester_result_1.notes}",
|
| 214 |
)
|
| 215 |
|
| 216 |
+
yield AgentEvent(
|
| 217 |
+
agent="optimizer",
|
| 218 |
+
status=AgentStatus.RETRYING,
|
| 219 |
+
message="Trying alternative optimization strategy (iteration 2)...",
|
| 220 |
+
detail=f"Previous strategy regressed. Feedback: {tester_result_1.notes}",
|
| 221 |
+
)
|
|
|
|
|
|
|
| 222 |
|
| 223 |
try:
|
| 224 |
optimizer_result_2: OptimizerResult = await asyncio.to_thread(
|
|
|
|
| 226 |
translator_result.hip_code,
|
| 227 |
analyzer_result,
|
| 228 |
2,
|
| 229 |
+
tester_result_1.notes,
|
| 230 |
)
|
| 231 |
except Exception as e:
|
| 232 |
+
yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED, message="Re-optimization failed", detail=str(e))
|
|
|
|
| 233 |
return
|
| 234 |
|
| 235 |
+
yield AgentEvent(
|
| 236 |
+
agent="optimizer",
|
| 237 |
+
status=AgentStatus.DONE,
|
| 238 |
+
message=f"Alternative strategy: {len(optimizer_result_2.changes)} change(s) applied",
|
| 239 |
+
detail="\n".join(
|
| 240 |
+
f"- {c['description']}" for c in optimizer_result_2.changes),
|
| 241 |
+
)
|
|
|
|
| 242 |
|
| 243 |
+
yield AgentEvent(
|
| 244 |
+
agent="tester",
|
| 245 |
+
status=AgentStatus.RUNNING,
|
| 246 |
+
message="Re-profiling with alternative optimization (iteration 2)...",
|
| 247 |
+
)
|
| 248 |
|
| 249 |
try:
|
| 250 |
tester_result_final: TesterResult = await asyncio.to_thread(
|
| 251 |
+
tester.run,
|
| 252 |
+
optimizer_result_2.optimized_code,
|
| 253 |
+
analyzer_result,
|
| 254 |
+
2,
|
| 255 |
+
kernel_name,
|
| 256 |
)
|
| 257 |
except Exception as e:
|
| 258 |
+
yield AgentEvent(agent="tester", status=AgentStatus.FAILED, message="Re-testing failed", detail=str(e))
|
|
|
|
| 259 |
return
|
| 260 |
|
| 261 |
final_optimizer = optimizer_result_2
|
|
|
|
| 263 |
tester_result_final = tester_result_1
|
| 264 |
final_optimizer = optimizer_result
|
| 265 |
|
|
|
|
| 266 |
yield AgentEvent(
|
| 267 |
agent="tester",
|
| 268 |
status=AgentStatus.DONE,
|
| 269 |
+
message=f"Iteration {tester_result_final.iteration}: {tester_result_final.speedup}x vs baseline HIP",
|
| 270 |
detail=(
|
| 271 |
f"Execution time: {tester_result_final.execution_ms:.1f}ms\n"
|
| 272 |
f"Memory bandwidth: {tester_result_final.bandwidth_utilized:.1f}% utilized\n"
|
| 273 |
f"Bottleneck type: {tester_result_final.bottleneck}\n"
|
| 274 |
f"{tester_result_final.notes}"
|
| 275 |
+
),
|
| 276 |
)
|
| 277 |
|
| 278 |
+
yield AgentEvent(agent="coordinator", status=AgentStatus.RUNNING, message="Generating migration report...")
|
|
|
|
|
|
|
| 279 |
|
| 280 |
+
amd_explanation = _build_amd_explanation(
|
| 281 |
+
analyzer_result, tester_result_final)
|
| 282 |
|
|
|
|
|
|
|
|
|
|
| 283 |
try:
|
| 284 |
cost_estimate = calculate_cost_estimate(analyzer_result)
|
| 285 |
+
except Exception:
|
|
|
|
| 286 |
cost_estimate = CostEstimate(
|
| 287 |
manual_porting_weeks="3-6 weeks",
|
| 288 |
+
rocmport_minutes="Varies by kernel",
|
| 289 |
estimated_savings="$20,000-$50,000",
|
| 290 |
+
complexity_factor="Medium",
|
| 291 |
)
|
| 292 |
+
|
|
|
|
| 293 |
temp_report = FinalReport(
|
| 294 |
migration_success=True,
|
| 295 |
speedup=tester_result_final.speedup,
|
| 296 |
bandwidth_utilized=tester_result_final.bandwidth_utilized,
|
| 297 |
+
total_changes=translator_result.total_changes +
|
| 298 |
+
len(final_optimizer.changes),
|
| 299 |
bottleneck=tester_result_final.bottleneck,
|
| 300 |
amd_advantage_explanation=amd_explanation,
|
| 301 |
iterations=tester_result_final.iteration,
|
| 302 |
hip_code=translator_result.hip_code,
|
| 303 |
optimized_code=final_optimizer.optimized_code,
|
| 304 |
+
verification=tester_result_final.verification,
|
| 305 |
)
|
| 306 |
simplified_explanation = simplify_explanation(temp_report)
|
| 307 |
|
|
|
|
| 309 |
migration_success=True,
|
| 310 |
speedup=tester_result_final.speedup,
|
| 311 |
bandwidth_utilized=tester_result_final.bandwidth_utilized,
|
| 312 |
+
total_changes=translator_result.total_changes +
|
| 313 |
+
len(final_optimizer.changes),
|
| 314 |
bottleneck=tester_result_final.bottleneck,
|
| 315 |
amd_advantage_explanation=amd_explanation,
|
| 316 |
iterations=tester_result_final.iteration,
|
| 317 |
hip_code=translator_result.hip_code,
|
| 318 |
optimized_code=final_optimizer.optimized_code,
|
| 319 |
+
verification=tester_result_final.verification,
|
| 320 |
cost_estimate=cost_estimate,
|
| 321 |
+
simplified_explanation=simplified_explanation,
|
| 322 |
)
|
| 323 |
|
|
|
|
| 324 |
yield AgentEvent(
|
| 325 |
agent="coordinator",
|
| 326 |
status=AgentStatus.DONE,
|
| 327 |
message="Migration complete",
|
| 328 |
+
detail=json.dumps(report.model_dump()),
|
| 329 |
)
|
| 330 |
|
| 331 |
|
| 332 |
def _build_amd_explanation(analyzer_result: AnalyzerResult, tester_result: TesterResult) -> str:
|
| 333 |
if analyzer_result.workload_type == WorkloadType.MEMORY_BOUND:
|
| 334 |
return (
|
| 335 |
+
"This is a memory-bound kernel; performance scales with memory bandwidth. "
|
| 336 |
+
"MI300X provides higher memory bandwidth than H100-class hardware, and this workload "
|
| 337 |
+
f"reached {tester_result.bandwidth_utilized:.0f}% utilization after optimization."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 338 |
)
|
| 339 |
+
return (
|
| 340 |
+
"This is a compute-bound kernel; launch geometry and wavefront-aware tuning are key drivers. "
|
| 341 |
+
"After optimization, compute utilization and execution characteristics improved."
|
| 342 |
+
)
|
backend/agents/optimizer.py
CHANGED
|
@@ -1,15 +1,17 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
from models import OptimizerResult, AnalyzerResult, WorkloadType
|
| 4 |
-
from tools.llm_client import LLMClient
|
| 5 |
-
from tools.json_utils import safe_json_loads
|
| 6 |
|
| 7 |
llm_client = LLMClient()
|
| 8 |
|
|
|
|
| 9 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 10 |
"""Wrapper for LLM client chat completion"""
|
| 11 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 12 |
|
|
|
|
| 13 |
ALLOWED_OPTIMIZATIONS = """
|
| 14 |
You may ONLY suggest these specific, well-known AMD MI300X optimizations:
|
| 15 |
1. Shared memory tiling: Replace naive global memory access with 32x32 shared memory tiles (__shared__)
|
|
|
|
| 1 |
+
# pylint: disable=broad-exception-caught
|
| 2 |
+
|
| 3 |
+
from ..models import OptimizerResult, AnalyzerResult, WorkloadType
|
| 4 |
+
from ..tools.llm_client import LLMClient
|
| 5 |
+
from ..tools.json_utils import safe_json_loads
|
| 6 |
|
| 7 |
llm_client = LLMClient()
|
| 8 |
|
| 9 |
+
|
| 10 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 11 |
"""Wrapper for LLM client chat completion"""
|
| 12 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 13 |
|
| 14 |
+
|
| 15 |
ALLOWED_OPTIMIZATIONS = """
|
| 16 |
You may ONLY suggest these specific, well-known AMD MI300X optimizations:
|
| 17 |
1. Shared memory tiling: Replace naive global memory access with 32x32 shared memory tiles (__shared__)
|
backend/agents/tester.py
CHANGED
|
@@ -1,10 +1,7 @@
|
|
| 1 |
import os
|
| 2 |
-
import subprocess
|
| 3 |
-
import tempfile
|
| 4 |
-
import random
|
| 5 |
import hashlib
|
| 6 |
-
from models import TesterResult, AnalyzerResult,
|
| 7 |
-
from tools.rocprof_wrapper import RocprofWrapper
|
| 8 |
|
| 9 |
# Set ROCM_AVAILABLE=true on AMD Cloud
|
| 10 |
ROCM_AVAILABLE = os.environ.get("ROCM_AVAILABLE", "false").lower() == "true"
|
|
@@ -19,27 +16,23 @@ DEMO_KERNEL_CHECKSUMS = {
|
|
| 19 |
}
|
| 20 |
|
| 21 |
|
| 22 |
-
def
|
| 23 |
-
"""Compute checksum
|
| 24 |
-
if not
|
| 25 |
return "empty"
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
# Convert to string and compute SHA256
|
| 31 |
-
sample_str = ','.join([str(x) for x in sample])
|
| 32 |
-
return hashlib.sha256(sample_str.encode()).hexdigest()[:32]
|
| 33 |
|
| 34 |
|
| 35 |
def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationResult:
|
| 36 |
"""Verify demo kernel execution and output correctness"""
|
| 37 |
expected = DEMO_KERNEL_CHECKSUMS.get(kernel_name, "mock_checksum")
|
| 38 |
-
actual =
|
| 39 |
-
|
| 40 |
# In mock mode, indicate this is simulated verification
|
| 41 |
is_mock = not ROCM_AVAILABLE
|
| 42 |
-
|
| 43 |
verification = VerificationResult(
|
| 44 |
compiled_successfully=True,
|
| 45 |
executed_without_error=True,
|
|
@@ -48,18 +41,12 @@ def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationRes
|
|
| 48 |
actual_checksum=actual,
|
| 49 |
mock_mode=is_mock
|
| 50 |
)
|
| 51 |
-
|
| 52 |
-
#
|
| 53 |
-
if
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
if current_time % 2 == 0: # Simulate alternating success/failure
|
| 58 |
-
verification.output_matches_expected = True
|
| 59 |
-
verification.checksum_computed = DEMO_KERNEL_CHECKSUMS[kernel_name]
|
| 60 |
-
else:
|
| 61 |
-
verification.checksum_computed = "wrong_checksum_demo"
|
| 62 |
-
|
| 63 |
return verification
|
| 64 |
|
| 65 |
|
|
@@ -67,27 +54,24 @@ def run(optimized_code: str, analyzer_result: AnalyzerResult,
|
|
| 67 |
iteration: int = 1, kernel_name: str = "matrix_multiply") -> TesterResult:
|
| 68 |
"""
|
| 69 |
On AMD Cloud (ROCM_AVAILABLE=true): runs real hipcc + rocprof
|
| 70 |
-
Locally: returns
|
| 71 |
-
|
| 72 |
-
Controlled failure: iteration 1 always performs worse than baseline.
|
| 73 |
-
Iteration 2 shows the improvement. This is intentional demo design.
|
| 74 |
"""
|
| 75 |
rocprof_wrapper = RocprofWrapper()
|
| 76 |
-
|
| 77 |
# Add verification for demo kernels
|
| 78 |
verification = None
|
| 79 |
if kernel_name in DEMO_KERNEL_CHECKSUMS:
|
| 80 |
verification = verify_demo_kernel(kernel_name, optimized_code)
|
| 81 |
-
|
| 82 |
if ROCM_AVAILABLE:
|
| 83 |
return _run_real(optimized_code, analyzer_result, iteration, rocprof_wrapper, verification)
|
| 84 |
else:
|
| 85 |
-
#
|
| 86 |
-
profiling_data = rocprof_wrapper.
|
| 87 |
-
return _convert_profiling_to_tester_result(profiling_data, analyzer_result, iteration,
|
| 88 |
|
| 89 |
|
| 90 |
-
def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: AnalyzerResult, iteration: int,
|
| 91 |
"""Convert RocprofWrapper output to TesterResult format"""
|
| 92 |
if not profiling_data.get('success', False):
|
| 93 |
return TesterResult(
|
|
@@ -100,25 +84,25 @@ def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: A
|
|
| 100 |
notes=profiling_data.get('error', 'Unknown profiling error'),
|
| 101 |
verification=verification
|
| 102 |
)
|
| 103 |
-
|
| 104 |
exec_ms = profiling_data.get('execution_time_ms', 0.0)
|
| 105 |
bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
notes = "
|
|
|
|
|
|
|
| 115 |
else:
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
notes = "Optimization successful. Shared memory tiling applied and memory coalescing fixed for MI300X."
|
| 121 |
-
|
| 122 |
return TesterResult(
|
| 123 |
success=True,
|
| 124 |
iteration=iteration,
|
|
@@ -135,7 +119,7 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
|
|
| 135 |
"""Real hipcc + rocprof execution on MI300X."""
|
| 136 |
# Compile the code
|
| 137 |
success, message = rocprof_wrapper.compile_hip_code(code)
|
| 138 |
-
|
| 139 |
if not success:
|
| 140 |
return TesterResult(
|
| 141 |
success=False,
|
|
@@ -147,10 +131,11 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
|
|
| 147 |
notes=f"Compilation failed: {message}",
|
| 148 |
verification=verification
|
| 149 |
)
|
| 150 |
-
|
| 151 |
# Run with profiling
|
| 152 |
-
profiling_data = rocprof_wrapper.run_with_profiling(
|
| 153 |
-
|
|
|
|
| 154 |
if not profiling_data.get('success', False):
|
| 155 |
return TesterResult(
|
| 156 |
success=False,
|
|
@@ -162,11 +147,11 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
|
|
| 162 |
notes=f"Profiling failed: {profiling_data.get('error', 'Unknown error')}",
|
| 163 |
verification=verification
|
| 164 |
)
|
| 165 |
-
|
| 166 |
exec_ms = profiling_data.get('execution_time_ms', 0.0)
|
| 167 |
bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
|
| 168 |
-
speedup = _calculate_speedup(exec_ms
|
| 169 |
-
|
| 170 |
return TesterResult(
|
| 171 |
success=True,
|
| 172 |
iteration=iteration,
|
|
@@ -178,8 +163,9 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
|
|
| 178 |
)
|
| 179 |
|
| 180 |
|
| 181 |
-
def _calculate_speedup(exec_ms: float
|
| 182 |
"""Estimate speedup relative to baseline HIP."""
|
| 183 |
-
if
|
| 184 |
-
return
|
| 185 |
-
|
|
|
|
|
|
| 1 |
import os
|
|
|
|
|
|
|
|
|
|
| 2 |
import hashlib
|
| 3 |
+
from ..models import TesterResult, AnalyzerResult, VerificationResult
|
| 4 |
+
from ..tools.rocprof_wrapper import RocprofWrapper
|
| 5 |
|
| 6 |
# Set ROCM_AVAILABLE=true on AMD Cloud
|
| 7 |
ROCM_AVAILABLE = os.environ.get("ROCM_AVAILABLE", "false").lower() == "true"
|
|
|
|
| 16 |
}
|
| 17 |
|
| 18 |
|
| 19 |
+
def compute_code_checksum(code_text: str, sample_size: int = 400) -> str:
|
| 20 |
+
"""Compute a short checksum from code text for traceability in mock mode."""
|
| 21 |
+
if not code_text:
|
| 22 |
return "empty"
|
| 23 |
+
|
| 24 |
+
sample = code_text[:sample_size]
|
| 25 |
+
return hashlib.sha256(sample.encode()).hexdigest()[:32]
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationResult:
|
| 29 |
"""Verify demo kernel execution and output correctness"""
|
| 30 |
expected = DEMO_KERNEL_CHECKSUMS.get(kernel_name, "mock_checksum")
|
| 31 |
+
actual = compute_code_checksum(optimized_code)
|
| 32 |
+
|
| 33 |
# In mock mode, indicate this is simulated verification
|
| 34 |
is_mock = not ROCM_AVAILABLE
|
| 35 |
+
|
| 36 |
verification = VerificationResult(
|
| 37 |
compiled_successfully=True,
|
| 38 |
executed_without_error=True,
|
|
|
|
| 41 |
actual_checksum=actual,
|
| 42 |
mock_mode=is_mock
|
| 43 |
)
|
| 44 |
+
|
| 45 |
+
# Do not fabricate pass/fail in mock mode. Surface that verification is simulated.
|
| 46 |
+
if is_mock:
|
| 47 |
+
verification.output_matches_expected = False
|
| 48 |
+
verification.checksum_computed = actual
|
| 49 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
return verification
|
| 51 |
|
| 52 |
|
|
|
|
| 54 |
iteration: int = 1, kernel_name: str = "matrix_multiply") -> TesterResult:
|
| 55 |
"""
|
| 56 |
On AMD Cloud (ROCM_AVAILABLE=true): runs real hipcc + rocprof
|
| 57 |
+
Locally: returns mock profiling results labeled as simulated.
|
|
|
|
|
|
|
|
|
|
| 58 |
"""
|
| 59 |
rocprof_wrapper = RocprofWrapper()
|
| 60 |
+
|
| 61 |
# Add verification for demo kernels
|
| 62 |
verification = None
|
| 63 |
if kernel_name in DEMO_KERNEL_CHECKSUMS:
|
| 64 |
verification = verify_demo_kernel(kernel_name, optimized_code)
|
| 65 |
+
|
| 66 |
if ROCM_AVAILABLE:
|
| 67 |
return _run_real(optimized_code, analyzer_result, iteration, rocprof_wrapper, verification)
|
| 68 |
else:
|
| 69 |
+
# In non-ROCm environments, run_with_profiling returns simulated metrics.
|
| 70 |
+
profiling_data = rocprof_wrapper.run_with_profiling("mock_executable")
|
| 71 |
+
return _convert_profiling_to_tester_result(profiling_data, analyzer_result, iteration, verification)
|
| 72 |
|
| 73 |
|
| 74 |
+
def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: AnalyzerResult, iteration: int, verification: VerificationResult = None) -> TesterResult:
|
| 75 |
"""Convert RocprofWrapper output to TesterResult format"""
|
| 76 |
if not profiling_data.get('success', False):
|
| 77 |
return TesterResult(
|
|
|
|
| 84 |
notes=profiling_data.get('error', 'Unknown profiling error'),
|
| 85 |
verification=verification
|
| 86 |
)
|
| 87 |
+
|
| 88 |
exec_ms = profiling_data.get('execution_time_ms', 0.0)
|
| 89 |
bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
|
| 90 |
+
|
| 91 |
+
baseline_ms = profiling_data.get('baseline_time_ms', 100.0)
|
| 92 |
+
if exec_ms > 0:
|
| 93 |
+
speedup = round(baseline_ms / exec_ms, 2)
|
| 94 |
+
else:
|
| 95 |
+
speedup = 0.0
|
| 96 |
+
|
| 97 |
+
if speedup < 1.0:
|
| 98 |
+
notes = "Simulated profile indicates regression vs baseline. Retry with an alternative optimization strategy."
|
| 99 |
+
elif speedup < 1.1:
|
| 100 |
+
notes = "Simulated profile indicates marginal improvement. Optimization may be memory- or launch-bound."
|
| 101 |
else:
|
| 102 |
+
notes = "Simulated profile indicates improvement vs baseline after optimization."
|
| 103 |
+
|
| 104 |
+
notes += " Mock mode is enabled (ROCM_AVAILABLE=false); use real ROCm hardware for authoritative numbers."
|
| 105 |
+
|
|
|
|
|
|
|
| 106 |
return TesterResult(
|
| 107 |
success=True,
|
| 108 |
iteration=iteration,
|
|
|
|
| 119 |
"""Real hipcc + rocprof execution on MI300X."""
|
| 120 |
# Compile the code
|
| 121 |
success, message = rocprof_wrapper.compile_hip_code(code)
|
| 122 |
+
|
| 123 |
if not success:
|
| 124 |
return TesterResult(
|
| 125 |
success=False,
|
|
|
|
| 131 |
notes=f"Compilation failed: {message}",
|
| 132 |
verification=verification
|
| 133 |
)
|
| 134 |
+
|
| 135 |
# Run with profiling
|
| 136 |
+
profiling_data = rocprof_wrapper.run_with_profiling(
|
| 137 |
+
message.split(": ")[-1]) # Extract executable path
|
| 138 |
+
|
| 139 |
if not profiling_data.get('success', False):
|
| 140 |
return TesterResult(
|
| 141 |
success=False,
|
|
|
|
| 147 |
notes=f"Profiling failed: {profiling_data.get('error', 'Unknown error')}",
|
| 148 |
verification=verification
|
| 149 |
)
|
| 150 |
+
|
| 151 |
exec_ms = profiling_data.get('execution_time_ms', 0.0)
|
| 152 |
bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
|
| 153 |
+
speedup = _calculate_speedup(exec_ms)
|
| 154 |
+
|
| 155 |
return TesterResult(
|
| 156 |
success=True,
|
| 157 |
iteration=iteration,
|
|
|
|
| 163 |
)
|
| 164 |
|
| 165 |
|
| 166 |
+
def _calculate_speedup(exec_ms: float) -> float:
|
| 167 |
"""Estimate speedup relative to baseline HIP."""
|
| 168 |
+
if exec_ms <= 0:
|
| 169 |
+
return 0.0
|
| 170 |
+
baseline_ms = 100.0
|
| 171 |
+
return round(baseline_ms / exec_ms, 2)
|
backend/agents/translator.py
CHANGED
|
@@ -1,21 +1,24 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
from models import TranslatorResult, AnalyzerResult
|
| 4 |
-
from tools.llm_client import LLMClient
|
| 5 |
-
from tools.hipify_wrapper import HipifyWrapper
|
| 6 |
-
from tools.json_utils import safe_json_loads
|
| 7 |
|
| 8 |
llm_client = LLMClient()
|
| 9 |
hipify_wrapper = HipifyWrapper()
|
| 10 |
|
|
|
|
| 11 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 12 |
"""Wrapper for LLM client chat completion"""
|
| 13 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 14 |
|
|
|
|
| 15 |
def run_hipify(cuda_code: str) -> str:
|
| 16 |
"""Wrapper for hipify wrapper"""
|
| 17 |
return hipify_wrapper.hipify_code(cuda_code)
|
| 18 |
|
|
|
|
| 19 |
SYSTEM_PROMPT = """You are an expert AMD ROCm/HIP engineer. You receive CUDA code that has already gone through hipify (basic syntax replacement) and you fix what hipify missed.
|
| 20 |
|
| 21 |
Your specific jobs:
|
|
|
|
| 1 |
+
# pylint: disable=broad-exception-caught
|
| 2 |
+
|
| 3 |
+
from ..models import TranslatorResult, AnalyzerResult
|
| 4 |
+
from ..tools.llm_client import LLMClient
|
| 5 |
+
from ..tools.hipify_wrapper import HipifyWrapper
|
| 6 |
+
from ..tools.json_utils import safe_json_loads
|
| 7 |
|
| 8 |
llm_client = LLMClient()
|
| 9 |
hipify_wrapper = HipifyWrapper()
|
| 10 |
|
| 11 |
+
|
| 12 |
def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
|
| 13 |
"""Wrapper for LLM client chat completion"""
|
| 14 |
return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
|
| 15 |
|
| 16 |
+
|
| 17 |
def run_hipify(cuda_code: str) -> str:
|
| 18 |
"""Wrapper for hipify wrapper"""
|
| 19 |
return hipify_wrapper.hipify_code(cuda_code)
|
| 20 |
|
| 21 |
+
|
| 22 |
SYSTEM_PROMPT = """You are an expert AMD ROCm/HIP engineer. You receive CUDA code that has already gone through hipify (basic syntax replacement) and you fix what hipify missed.
|
| 23 |
|
| 24 |
Your specific jobs:
|
backend/main.py
CHANGED
|
@@ -1,3 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import json
|
| 2 |
import asyncio
|
| 3 |
import zipfile
|
|
@@ -9,18 +19,10 @@ from dotenv import load_dotenv
|
|
| 9 |
# Load environment variables from .env file
|
| 10 |
load_dotenv()
|
| 11 |
|
| 12 |
-
from fastapi import FastAPI, HTTPException
|
| 13 |
-
from fastapi.middleware.cors import CORSMiddleware
|
| 14 |
-
from fastapi.responses import StreamingResponse
|
| 15 |
-
from fastapi.staticfiles import StaticFiles
|
| 16 |
-
from models import PortRequest, VerificationResult
|
| 17 |
-
from agents.coordinator import run_pipeline
|
| 18 |
-
from agents.tester import run as run_tester
|
| 19 |
-
from agents.analyzer import AnalyzerResult, WorkloadType
|
| 20 |
|
| 21 |
app = FastAPI(
|
| 22 |
title="ROCmPort AI",
|
| 23 |
-
description="
|
| 24 |
version="1.0.0",
|
| 25 |
contact={
|
| 26 |
"name": "Tazwar Ahnaf Enan",
|
|
@@ -59,7 +61,8 @@ async def port_cuda_code(req: PortRequest):
|
|
| 59 |
async for event in run_pipeline(req.cuda_code, req.kernel_name or "custom", req.simple_mode or False):
|
| 60 |
data = json.dumps(event.model_dump())
|
| 61 |
yield f"data: {data}\n\n"
|
| 62 |
-
|
|
|
|
| 63 |
except Exception as e:
|
| 64 |
error_event = {
|
| 65 |
"agent": "coordinator",
|
|
@@ -81,6 +84,121 @@ async def port_cuda_code(req: PortRequest):
|
|
| 81 |
)
|
| 82 |
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
@app.post("/recompile")
|
| 85 |
async def recompile_edited_code(req: dict):
|
| 86 |
"""
|
|
@@ -90,10 +208,10 @@ async def recompile_edited_code(req: dict):
|
|
| 90 |
try:
|
| 91 |
edited_code = req.get("edited_code")
|
| 92 |
kernel_name = req.get("kernel_name", "custom")
|
| 93 |
-
|
| 94 |
if not edited_code or len(edited_code.strip()) < 10:
|
| 95 |
raise HTTPException(status_code=400, detail="No HIP code provided")
|
| 96 |
-
|
| 97 |
# Create a mock analyzer result for testing
|
| 98 |
analyzer_result = AnalyzerResult(
|
| 99 |
kernels_found=["test_kernel"],
|
|
@@ -105,17 +223,18 @@ async def recompile_edited_code(req: dict):
|
|
| 105 |
difficulty="Easy",
|
| 106 |
difficulty_reason="Simple test kernel"
|
| 107 |
)
|
| 108 |
-
|
| 109 |
# Run tester with edited code
|
| 110 |
tester_result = await asyncio.to_thread(run_tester, edited_code, analyzer_result, 2, kernel_name)
|
| 111 |
-
|
| 112 |
return {
|
| 113 |
"success": True,
|
| 114 |
"result": tester_result.model_dump()
|
| 115 |
}
|
| 116 |
-
|
| 117 |
except Exception as e:
|
| 118 |
-
raise HTTPException(
|
|
|
|
| 119 |
|
| 120 |
|
| 121 |
@app.post("/export")
|
|
@@ -128,7 +247,7 @@ async def export_migration_package(req: dict):
|
|
| 128 |
original_cuda = req.get("original_cuda")
|
| 129 |
final_rocm = req.get("final_rocm")
|
| 130 |
migration_report = req.get("migration_report", {})
|
| 131 |
-
|
| 132 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".zip") as tmp_file:
|
| 133 |
with zipfile.ZipFile(tmp_file, 'w', zipfile.ZIP_DEFLATED) as zf:
|
| 134 |
# Add professional unified diff
|
|
@@ -140,7 +259,7 @@ async def export_migration_package(req: dict):
|
|
| 140 |
)
|
| 141 |
diff_text = "".join(diff)
|
| 142 |
zf.writestr("migration.diff", diff_text)
|
| 143 |
-
|
| 144 |
# Add migration report as markdown
|
| 145 |
md_report = f"""# ROCmPort AI Migration Report
|
| 146 |
|
|
@@ -155,43 +274,44 @@ async def export_migration_package(req: dict):
|
|
| 155 |
## Cost Impact
|
| 156 |
{migration_report.get('cost_estimate', 'N/A')}
|
| 157 |
|
| 158 |
-
Generated by ROCmPort AI
|
| 159 |
"""
|
| 160 |
zf.writestr("migration_report.md", md_report)
|
| 161 |
-
|
| 162 |
# Read the zip file content
|
| 163 |
with open(tmp_file, 'rb') as f:
|
| 164 |
zip_content = f.read()
|
| 165 |
-
|
| 166 |
# Clean up
|
| 167 |
os.unlink(tmp_file)
|
| 168 |
-
|
| 169 |
from fastapi.responses import Response
|
| 170 |
return Response(
|
| 171 |
content=zip_content,
|
| 172 |
media_type="application/zip",
|
| 173 |
-
headers={
|
|
|
|
| 174 |
)
|
| 175 |
-
|
| 176 |
except Exception as e:
|
| 177 |
-
raise HTTPException(
|
|
|
|
| 178 |
|
| 179 |
|
| 180 |
@app.get("/demo-kernels")
|
| 181 |
async def list_demo_kernels():
|
| 182 |
-
import os
|
| 183 |
kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
|
| 184 |
kernels = {}
|
| 185 |
for fname in os.listdir(kernels_dir):
|
| 186 |
if fname.endswith(".cu"):
|
| 187 |
name = fname.replace(".cu", "")
|
| 188 |
-
with open(os.path.join(kernels_dir, fname)) as f:
|
| 189 |
kernels[name] = f.read()
|
| 190 |
return kernels
|
| 191 |
|
| 192 |
|
| 193 |
# Serve frontend if built
|
| 194 |
-
import os
|
| 195 |
frontend_path = os.path.join(os.path.dirname(__file__), "..", "frontend")
|
| 196 |
if os.path.exists(frontend_path):
|
| 197 |
-
app.mount("/", StaticFiles(directory=frontend_path,
|
|
|
|
|
|
| 1 |
+
# pylint: disable=broad-exception-caught
|
| 2 |
+
|
| 3 |
+
from backend.agents.analyzer import AnalyzerResult, WorkloadType
|
| 4 |
+
from backend.agents.tester import run as run_tester
|
| 5 |
+
from backend.agents.coordinator import run_pipeline
|
| 6 |
+
from backend.models import PortRequest, ColdStartRequest, AggregateMetricsRequest
|
| 7 |
+
from fastapi.staticfiles import StaticFiles
|
| 8 |
+
from fastapi.responses import StreamingResponse
|
| 9 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 10 |
+
from fastapi import FastAPI, HTTPException
|
| 11 |
import json
|
| 12 |
import asyncio
|
| 13 |
import zipfile
|
|
|
|
| 19 |
# Load environment variables from .env file
|
| 20 |
load_dotenv()
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
app = FastAPI(
|
| 24 |
title="ROCmPort AI",
|
| 25 |
+
description="CUDA-to-ROCm migration assistant with iterative testing and optimization.",
|
| 26 |
version="1.0.0",
|
| 27 |
contact={
|
| 28 |
"name": "Tazwar Ahnaf Enan",
|
|
|
|
| 61 |
async for event in run_pipeline(req.cuda_code, req.kernel_name or "custom", req.simple_mode or False):
|
| 62 |
data = json.dumps(event.model_dump())
|
| 63 |
yield f"data: {data}\n\n"
|
| 64 |
+
# Let the client breathe between events
|
| 65 |
+
await asyncio.sleep(0.05)
|
| 66 |
except Exception as e:
|
| 67 |
error_event = {
|
| 68 |
"agent": "coordinator",
|
|
|
|
| 84 |
)
|
| 85 |
|
| 86 |
|
| 87 |
+
async def _collect_pipeline_events(cuda_code: str, kernel_name: str, simple_mode: bool = False) -> tuple[list[dict], dict | None]:
|
| 88 |
+
"""Collect all pipeline events and extract final report payload when present."""
|
| 89 |
+
events: list[dict] = []
|
| 90 |
+
final_report = None
|
| 91 |
+
|
| 92 |
+
async for event in run_pipeline(cuda_code, kernel_name, simple_mode):
|
| 93 |
+
dumped = event.model_dump()
|
| 94 |
+
events.append(dumped)
|
| 95 |
+
if dumped.get("agent") == "coordinator" and dumped.get("status") == "done" and dumped.get("detail"):
|
| 96 |
+
try:
|
| 97 |
+
final_report = json.loads(dumped["detail"])
|
| 98 |
+
except (json.JSONDecodeError, TypeError):
|
| 99 |
+
final_report = None
|
| 100 |
+
|
| 101 |
+
return events, final_report
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
def _has_adaptation_loop(events: list[dict]) -> bool:
|
| 105 |
+
"""Return True when the run shows retry-based adaptation behavior."""
|
| 106 |
+
saw_regression = any(
|
| 107 |
+
e.get("agent") == "tester" and e.get(
|
| 108 |
+
"status") == "failed" and "regression" in str(e.get("message", "")).lower()
|
| 109 |
+
for e in events
|
| 110 |
+
)
|
| 111 |
+
saw_retry = any(
|
| 112 |
+
e.get("agent") == "optimizer" and e.get("status") == "retrying"
|
| 113 |
+
for e in events
|
| 114 |
+
)
|
| 115 |
+
return saw_regression and saw_retry
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
@app.post("/cold-start")
|
| 119 |
+
async def cold_start_run(req: ColdStartRequest):
|
| 120 |
+
"""
|
| 121 |
+
Single-run endpoint for unknown pasted CUDA input.
|
| 122 |
+
Returns full trace plus summary trust signals.
|
| 123 |
+
"""
|
| 124 |
+
if not req.cuda_code or len(req.cuda_code.strip()) < 10:
|
| 125 |
+
raise HTTPException(status_code=400, detail="No CUDA code provided")
|
| 126 |
+
|
| 127 |
+
events, report = await _collect_pipeline_events(req.cuda_code, req.kernel_name or "unknown_input", False)
|
| 128 |
+
|
| 129 |
+
if report is None:
|
| 130 |
+
raise HTTPException(
|
| 131 |
+
status_code=500, detail="Pipeline completed without final report")
|
| 132 |
+
|
| 133 |
+
return {
|
| 134 |
+
"success": True,
|
| 135 |
+
"kernel_name": req.kernel_name or "unknown_input",
|
| 136 |
+
"adaptation_loop_observed": _has_adaptation_loop(events),
|
| 137 |
+
"event_count": len(events),
|
| 138 |
+
"report": report,
|
| 139 |
+
"events": events,
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
@app.post("/aggregate-metric")
|
| 144 |
+
async def aggregate_metric(req: AggregateMetricsRequest):
|
| 145 |
+
"""
|
| 146 |
+
Evaluate multiple kernels and return one aggregate metric:
|
| 147 |
+
average speedup vs baseline HIP.
|
| 148 |
+
"""
|
| 149 |
+
kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
|
| 150 |
+
requested = req.kernel_names or []
|
| 151 |
+
|
| 152 |
+
available: dict[str, str] = {}
|
| 153 |
+
for fname in os.listdir(kernels_dir):
|
| 154 |
+
if fname.endswith(".cu"):
|
| 155 |
+
kname = fname.replace(".cu", "")
|
| 156 |
+
with open(os.path.join(kernels_dir, fname), encoding="utf-8") as f:
|
| 157 |
+
available[kname] = f.read()
|
| 158 |
+
|
| 159 |
+
selected_names = requested if requested else sorted(available.keys())
|
| 160 |
+
selected_names = [name for name in selected_names if name in available]
|
| 161 |
+
|
| 162 |
+
if not selected_names:
|
| 163 |
+
raise HTTPException(
|
| 164 |
+
status_code=400, detail="No valid kernels selected for aggregation")
|
| 165 |
+
|
| 166 |
+
runs = []
|
| 167 |
+
speedups = []
|
| 168 |
+
|
| 169 |
+
for name in selected_names:
|
| 170 |
+
events, report = await _collect_pipeline_events(available[name], name, False)
|
| 171 |
+
if report is None:
|
| 172 |
+
continue
|
| 173 |
+
|
| 174 |
+
speedup = float(report.get("speedup", 0.0) or 0.0)
|
| 175 |
+
speedups.append(speedup)
|
| 176 |
+
runs.append({
|
| 177 |
+
"kernel": name,
|
| 178 |
+
"speedup": speedup,
|
| 179 |
+
"adaptation_loop_observed": _has_adaptation_loop(events),
|
| 180 |
+
"iterations": report.get("iterations", 1),
|
| 181 |
+
})
|
| 182 |
+
|
| 183 |
+
if not speedups:
|
| 184 |
+
raise HTTPException(
|
| 185 |
+
status_code=500, detail="Unable to produce aggregate metric from selected kernels")
|
| 186 |
+
|
| 187 |
+
avg_speedup = round(sum(speedups) / len(speedups), 3)
|
| 188 |
+
avg_improvement_pct = round((avg_speedup - 1.0) * 100.0, 2)
|
| 189 |
+
|
| 190 |
+
return {
|
| 191 |
+
"success": True,
|
| 192 |
+
"baseline": "straight hipify output with minimal compile edits",
|
| 193 |
+
"kernel_count": len(speedups),
|
| 194 |
+
"aggregate_metric": {
|
| 195 |
+
"average_speedup_vs_baseline": avg_speedup,
|
| 196 |
+
"average_improvement_percent": avg_improvement_pct,
|
| 197 |
+
},
|
| 198 |
+
"runs": runs,
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
|
| 202 |
@app.post("/recompile")
|
| 203 |
async def recompile_edited_code(req: dict):
|
| 204 |
"""
|
|
|
|
| 208 |
try:
|
| 209 |
edited_code = req.get("edited_code")
|
| 210 |
kernel_name = req.get("kernel_name", "custom")
|
| 211 |
+
|
| 212 |
if not edited_code or len(edited_code.strip()) < 10:
|
| 213 |
raise HTTPException(status_code=400, detail="No HIP code provided")
|
| 214 |
+
|
| 215 |
# Create a mock analyzer result for testing
|
| 216 |
analyzer_result = AnalyzerResult(
|
| 217 |
kernels_found=["test_kernel"],
|
|
|
|
| 223 |
difficulty="Easy",
|
| 224 |
difficulty_reason="Simple test kernel"
|
| 225 |
)
|
| 226 |
+
|
| 227 |
# Run tester with edited code
|
| 228 |
tester_result = await asyncio.to_thread(run_tester, edited_code, analyzer_result, 2, kernel_name)
|
| 229 |
+
|
| 230 |
return {
|
| 231 |
"success": True,
|
| 232 |
"result": tester_result.model_dump()
|
| 233 |
}
|
| 234 |
+
|
| 235 |
except Exception as e:
|
| 236 |
+
raise HTTPException(
|
| 237 |
+
status_code=500, detail=f"Recompilation failed: {str(e)}") from e
|
| 238 |
|
| 239 |
|
| 240 |
@app.post("/export")
|
|
|
|
| 247 |
original_cuda = req.get("original_cuda")
|
| 248 |
final_rocm = req.get("final_rocm")
|
| 249 |
migration_report = req.get("migration_report", {})
|
| 250 |
+
|
| 251 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".zip") as tmp_file:
|
| 252 |
with zipfile.ZipFile(tmp_file, 'w', zipfile.ZIP_DEFLATED) as zf:
|
| 253 |
# Add professional unified diff
|
|
|
|
| 259 |
)
|
| 260 |
diff_text = "".join(diff)
|
| 261 |
zf.writestr("migration.diff", diff_text)
|
| 262 |
+
|
| 263 |
# Add migration report as markdown
|
| 264 |
md_report = f"""# ROCmPort AI Migration Report
|
| 265 |
|
|
|
|
| 274 |
## Cost Impact
|
| 275 |
{migration_report.get('cost_estimate', 'N/A')}
|
| 276 |
|
| 277 |
+
Generated by ROCmPort AI.
|
| 278 |
"""
|
| 279 |
zf.writestr("migration_report.md", md_report)
|
| 280 |
+
|
| 281 |
# Read the zip file content
|
| 282 |
with open(tmp_file, 'rb') as f:
|
| 283 |
zip_content = f.read()
|
| 284 |
+
|
| 285 |
# Clean up
|
| 286 |
os.unlink(tmp_file)
|
| 287 |
+
|
| 288 |
from fastapi.responses import Response
|
| 289 |
return Response(
|
| 290 |
content=zip_content,
|
| 291 |
media_type="application/zip",
|
| 292 |
+
headers={
|
| 293 |
+
"Content-Disposition": "attachment; filename=rocmport_migration.zip"}
|
| 294 |
)
|
| 295 |
+
|
| 296 |
except Exception as e:
|
| 297 |
+
raise HTTPException(
|
| 298 |
+
status_code=500, detail=f"Export failed: {str(e)}") from e
|
| 299 |
|
| 300 |
|
| 301 |
@app.get("/demo-kernels")
|
| 302 |
async def list_demo_kernels():
|
|
|
|
| 303 |
kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
|
| 304 |
kernels = {}
|
| 305 |
for fname in os.listdir(kernels_dir):
|
| 306 |
if fname.endswith(".cu"):
|
| 307 |
name = fname.replace(".cu", "")
|
| 308 |
+
with open(os.path.join(kernels_dir, fname), encoding="utf-8") as f:
|
| 309 |
kernels[name] = f.read()
|
| 310 |
return kernels
|
| 311 |
|
| 312 |
|
| 313 |
# Serve frontend if built
|
|
|
|
| 314 |
frontend_path = os.path.join(os.path.dirname(__file__), "..", "frontend")
|
| 315 |
if os.path.exists(frontend_path):
|
| 316 |
+
app.mount("/", StaticFiles(directory=frontend_path,
|
| 317 |
+
html=True), name="frontend")
|
backend/models.py
CHANGED
|
@@ -23,6 +23,15 @@ class PortRequest(BaseModel):
|
|
| 23 |
simple_mode: Optional[bool] = False # For "Explain Like I'm 5" feature
|
| 24 |
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
class AgentEvent(BaseModel):
|
| 27 |
agent: str # analyzer | translator | optimizer | tester | coordinator
|
| 28 |
status: AgentStatus
|
|
@@ -83,7 +92,8 @@ class TesterResult(BaseModel):
|
|
| 83 |
execution_ms: float
|
| 84 |
bottleneck: str
|
| 85 |
notes: str
|
| 86 |
-
|
|
|
|
| 87 |
|
| 88 |
|
| 89 |
class FinalReport(BaseModel):
|
|
@@ -96,5 +106,7 @@ class FinalReport(BaseModel):
|
|
| 96 |
iterations: int
|
| 97 |
hip_code: str
|
| 98 |
optimized_code: str
|
|
|
|
| 99 |
cost_estimate: Optional[CostEstimate] = None # 💰 Cost impact estimator
|
| 100 |
-
|
|
|
|
|
|
| 23 |
simple_mode: Optional[bool] = False # For "Explain Like I'm 5" feature
|
| 24 |
|
| 25 |
|
| 26 |
+
class ColdStartRequest(BaseModel):
|
| 27 |
+
cuda_code: str
|
| 28 |
+
kernel_name: Optional[str] = "unknown_input"
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
class AggregateMetricsRequest(BaseModel):
|
| 32 |
+
kernel_names: Optional[List[str]] = None
|
| 33 |
+
|
| 34 |
+
|
| 35 |
class AgentEvent(BaseModel):
|
| 36 |
agent: str # analyzer | translator | optimizer | tester | coordinator
|
| 37 |
status: AgentStatus
|
|
|
|
| 92 |
execution_ms: float
|
| 93 |
bottleneck: str
|
| 94 |
notes: str
|
| 95 |
+
# Trust layer verification
|
| 96 |
+
verification: Optional[VerificationResult] = None
|
| 97 |
|
| 98 |
|
| 99 |
class FinalReport(BaseModel):
|
|
|
|
| 106 |
iterations: int
|
| 107 |
hip_code: str
|
| 108 |
optimized_code: str
|
| 109 |
+
verification: Optional[VerificationResult] = None
|
| 110 |
cost_estimate: Optional[CostEstimate] = None # 💰 Cost impact estimator
|
| 111 |
+
# For "Explain Like I'm 5" mode
|
| 112 |
+
simplified_explanation: Optional[str] = None
|
backend/prompts/coordinator_prompt.txt
CHANGED
|
@@ -54,7 +54,7 @@ You'll receive results from each agent:
|
|
| 54 |
- Always compare "Optimized ROCm vs Baseline HIP" (straight hipify output)
|
| 55 |
- Never claim "faster than NVIDIA CUDA" - be honest and credible
|
| 56 |
- Explain WHY AMD hardware advantages apply to this specific workload
|
| 57 |
-
- Include
|
| 58 |
- Provide concrete, actionable insights
|
| 59 |
|
| 60 |
Focus on demonstrating that your agents add real value beyond basic hipify - that's the core claim.
|
|
|
|
| 54 |
- Always compare "Optimized ROCm vs Baseline HIP" (straight hipify output)
|
| 55 |
- Never claim "faster than NVIDIA CUDA" - be honest and credible
|
| 56 |
- Explain WHY AMD hardware advantages apply to this specific workload
|
| 57 |
+
- Include retry and recovery details only when regression actually occurred
|
| 58 |
- Provide concrete, actionable insights
|
| 59 |
|
| 60 |
Focus on demonstrating that your agents add real value beyond basic hipify - that's the core claim.
|
backend/tools/hipify_wrapper.py
CHANGED
|
@@ -1,15 +1,14 @@
|
|
| 1 |
import subprocess
|
| 2 |
import tempfile
|
| 3 |
import os
|
| 4 |
-
import re
|
| 5 |
|
| 6 |
|
| 7 |
class HipifyWrapper:
|
| 8 |
"""Wrapper for hipify-clang tool with Python fallback"""
|
| 9 |
-
|
| 10 |
def __init__(self):
|
| 11 |
pass
|
| 12 |
-
|
| 13 |
def hipify_code(self, cuda_code: str) -> tuple[str, list[dict]]:
|
| 14 |
"""
|
| 15 |
Try to run real hipify-clang if available.
|
|
@@ -24,18 +23,19 @@ class HipifyWrapper:
|
|
| 24 |
|
| 25 |
# Fallback: Python pattern replacement
|
| 26 |
return self._python_hipify(cuda_code)
|
| 27 |
-
|
| 28 |
def _hipify_available(self) -> bool:
|
| 29 |
try:
|
| 30 |
result = subprocess.run(
|
| 31 |
["hipify-clang", "--version"],
|
| 32 |
-
capture_output=True, timeout=5
|
| 33 |
)
|
| 34 |
return result.returncode == 0
|
| 35 |
except (FileNotFoundError, subprocess.TimeoutExpired):
|
| 36 |
return False
|
| 37 |
|
| 38 |
def _run_real_hipify(self, cuda_code: str) -> tuple[str, list[dict]] | None:
|
|
|
|
| 39 |
try:
|
| 40 |
with tempfile.NamedTemporaryFile(suffix=".cu", mode="w", delete=False) as f:
|
| 41 |
f.write(cuda_code)
|
|
@@ -43,36 +43,41 @@ class HipifyWrapper:
|
|
| 43 |
|
| 44 |
# Use -- separator to pass compiler flags to the internal Clang parser
|
| 45 |
# This is critical for Clang-based tools to distinguish tool flags from compiler flags.
|
| 46 |
-
cmd = ["hipify-clang", tmp_path, "--",
|
| 47 |
-
|
|
|
|
| 48 |
# Debug log for build engineering
|
| 49 |
print(f"DEBUG: Running hipify-clang command: {' '.join(cmd)}")
|
| 50 |
-
|
| 51 |
# Set environment variable just in case hipify-clang invokes nvcc internally
|
| 52 |
env = os.environ.copy()
|
| 53 |
env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
|
| 54 |
-
|
| 55 |
result = subprocess.run(
|
| 56 |
cmd,
|
| 57 |
capture_output=True, text=True, timeout=30,
|
| 58 |
-
env=env
|
|
|
|
| 59 |
)
|
| 60 |
|
| 61 |
if result.returncode != 0:
|
| 62 |
-
print(
|
|
|
|
| 63 |
print(f"DEBUG: stderr: {result.stderr}")
|
| 64 |
|
| 65 |
if result.returncode == 0 and result.stdout:
|
| 66 |
-
changes = self._detect_changes(
|
|
|
|
| 67 |
return result.stdout, changes
|
| 68 |
|
| 69 |
return None
|
| 70 |
-
except
|
| 71 |
return None
|
| 72 |
finally:
|
| 73 |
try:
|
| 74 |
-
os.
|
| 75 |
-
|
|
|
|
| 76 |
pass
|
| 77 |
|
| 78 |
def _python_hipify(self, cuda_code: str) -> tuple[str, list[dict]]:
|
|
|
|
| 1 |
import subprocess
|
| 2 |
import tempfile
|
| 3 |
import os
|
|
|
|
| 4 |
|
| 5 |
|
| 6 |
class HipifyWrapper:
|
| 7 |
"""Wrapper for hipify-clang tool with Python fallback"""
|
| 8 |
+
|
| 9 |
def __init__(self):
|
| 10 |
pass
|
| 11 |
+
|
| 12 |
def hipify_code(self, cuda_code: str) -> tuple[str, list[dict]]:
|
| 13 |
"""
|
| 14 |
Try to run real hipify-clang if available.
|
|
|
|
| 23 |
|
| 24 |
# Fallback: Python pattern replacement
|
| 25 |
return self._python_hipify(cuda_code)
|
| 26 |
+
|
| 27 |
def _hipify_available(self) -> bool:
|
| 28 |
try:
|
| 29 |
result = subprocess.run(
|
| 30 |
["hipify-clang", "--version"],
|
| 31 |
+
capture_output=True, timeout=5, check=False
|
| 32 |
)
|
| 33 |
return result.returncode == 0
|
| 34 |
except (FileNotFoundError, subprocess.TimeoutExpired):
|
| 35 |
return False
|
| 36 |
|
| 37 |
def _run_real_hipify(self, cuda_code: str) -> tuple[str, list[dict]] | None:
|
| 38 |
+
tmp_path = None
|
| 39 |
try:
|
| 40 |
with tempfile.NamedTemporaryFile(suffix=".cu", mode="w", delete=False) as f:
|
| 41 |
f.write(cuda_code)
|
|
|
|
| 43 |
|
| 44 |
# Use -- separator to pass compiler flags to the internal Clang parser
|
| 45 |
# This is critical for Clang-based tools to distinguish tool flags from compiler flags.
|
| 46 |
+
cmd = ["hipify-clang", tmp_path, "--",
|
| 47 |
+
"-nocudalib", "-nocudainc", "-arch=sm_60"]
|
| 48 |
+
|
| 49 |
# Debug log for build engineering
|
| 50 |
print(f"DEBUG: Running hipify-clang command: {' '.join(cmd)}")
|
| 51 |
+
|
| 52 |
# Set environment variable just in case hipify-clang invokes nvcc internally
|
| 53 |
env = os.environ.copy()
|
| 54 |
env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
|
| 55 |
+
|
| 56 |
result = subprocess.run(
|
| 57 |
cmd,
|
| 58 |
capture_output=True, text=True, timeout=30,
|
| 59 |
+
env=env,
|
| 60 |
+
check=False,
|
| 61 |
)
|
| 62 |
|
| 63 |
if result.returncode != 0:
|
| 64 |
+
print(
|
| 65 |
+
f"DEBUG: hipify-clang failed with return code {result.returncode}")
|
| 66 |
print(f"DEBUG: stderr: {result.stderr}")
|
| 67 |
|
| 68 |
if result.returncode == 0 and result.stdout:
|
| 69 |
+
changes = self._detect_changes(
|
| 70 |
+
cuda_code, result.stdout, source="hipify-clang")
|
| 71 |
return result.stdout, changes
|
| 72 |
|
| 73 |
return None
|
| 74 |
+
except (OSError, subprocess.SubprocessError):
|
| 75 |
return None
|
| 76 |
finally:
|
| 77 |
try:
|
| 78 |
+
if tmp_path and os.path.exists(tmp_path):
|
| 79 |
+
os.unlink(tmp_path)
|
| 80 |
+
except OSError:
|
| 81 |
pass
|
| 82 |
|
| 83 |
def _python_hipify(self, cuda_code: str) -> tuple[str, list[dict]]:
|
backend/tools/rocprof_wrapper.py
CHANGED
|
@@ -1,105 +1,113 @@
|
|
| 1 |
import subprocess
|
| 2 |
import tempfile
|
| 3 |
import os
|
| 4 |
-
import json
|
| 5 |
import re
|
| 6 |
-
from typing import Dict, List,
|
| 7 |
-
|
| 8 |
|
| 9 |
class RocprofWrapper:
|
| 10 |
"""Wrapper for AMD rocprof profiler and hipcc compiler"""
|
| 11 |
-
|
| 12 |
def __init__(self):
|
| 13 |
-
self.rocm_available = os.getenv(
|
|
|
|
| 14 |
self.hipcc_path = os.getenv("HIPCC_PATH", "hipcc")
|
| 15 |
self.rocprof_path = os.getenv("ROCPROF_PATH", "rocprof")
|
| 16 |
-
|
| 17 |
def compile_hip_code(self, hip_code: str, output_file: str = None) -> Tuple[bool, str]:
|
| 18 |
"""Compile HIP code using hipcc"""
|
| 19 |
if not self.rocm_available:
|
| 20 |
return True, "Mock compilation successful (ROCm not available)"
|
| 21 |
-
|
| 22 |
try:
|
| 23 |
with tempfile.NamedTemporaryFile(mode='w', suffix='.hip', delete=False) as f:
|
| 24 |
f.write(hip_code)
|
| 25 |
temp_file = f.name
|
| 26 |
-
|
| 27 |
if output_file is None:
|
| 28 |
output_file = temp_file.replace('.hip', '.out')
|
| 29 |
-
|
| 30 |
# Add -nocudalib and -arch=sm_60 to solve "Cannot find libdevice for sm_52" error
|
| 31 |
# This ensures compilation works even if CUDA device libraries are missing.
|
| 32 |
-
cmd = [self.hipcc_path, '-o', output_file,
|
| 33 |
-
|
|
|
|
| 34 |
# Set environment variable just in case hipcc invokes nvcc internally
|
| 35 |
env = os.environ.copy()
|
| 36 |
env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
|
| 37 |
-
|
| 38 |
-
result = subprocess.run(
|
| 39 |
-
|
|
|
|
| 40 |
# Cleanup
|
| 41 |
os.unlink(temp_file)
|
| 42 |
-
|
| 43 |
if result.returncode == 0:
|
| 44 |
return True, f"Compilation successful: {output_file}"
|
| 45 |
else:
|
| 46 |
return False, f"Compilation failed: {result.stderr}"
|
| 47 |
-
|
| 48 |
except subprocess.TimeoutExpired:
|
| 49 |
return False, "Compilation timed out"
|
| 50 |
-
except
|
| 51 |
return False, f"Compilation error: {str(e)}"
|
| 52 |
-
|
| 53 |
def run_with_profiling(self, executable_path: str, args: List[str] = None) -> Dict:
|
| 54 |
"""Run executable with rocprof profiling"""
|
| 55 |
if not self.rocm_available:
|
| 56 |
# Return mock profiling data
|
| 57 |
return self._get_mock_profiling_data()
|
| 58 |
-
|
| 59 |
try:
|
| 60 |
if args is None:
|
| 61 |
args = []
|
| 62 |
-
|
| 63 |
# Run with rocprof
|
| 64 |
-
cmd = [self.rocprof_path, '-i', 'default', '--'] +
|
| 65 |
-
|
| 66 |
-
|
|
|
|
|
|
|
| 67 |
# Parse rocprof output
|
| 68 |
-
profiling_data = self._parse_rocprof_output(
|
| 69 |
-
|
|
|
|
| 70 |
return profiling_data
|
| 71 |
-
|
| 72 |
except subprocess.TimeoutExpired:
|
| 73 |
return {"error": "Profiling timed out", "execution_time_ms": 0}
|
| 74 |
-
except
|
| 75 |
return {"error": f"Profiling error: {str(e)}", "execution_time_ms": 0}
|
| 76 |
-
|
| 77 |
-
def _parse_rocprof_output(self, stdout: str,
|
| 78 |
"""Parse rocprof output to extract metrics"""
|
| 79 |
try:
|
| 80 |
# Look for key metrics in rocprof output
|
| 81 |
metrics = {}
|
| 82 |
-
|
| 83 |
# Parse execution time
|
| 84 |
-
time_match = re.search(
|
|
|
|
| 85 |
if time_match:
|
| 86 |
metrics['execution_time_ms'] = float(time_match.group(1))
|
| 87 |
-
|
| 88 |
# Parse memory bandwidth
|
| 89 |
-
bandwidth_match = re.search(
|
|
|
|
| 90 |
if bandwidth_match:
|
| 91 |
-
metrics['memory_bandwidth_gbps'] = float(
|
| 92 |
-
|
|
|
|
| 93 |
# Parse GPU utilization
|
| 94 |
util_match = re.search(r'GPU utilization:\s+(\d+\.\d+)%', stdout)
|
| 95 |
if util_match:
|
| 96 |
metrics['gpu_utilization_percent'] = float(util_match.group(1))
|
| 97 |
-
|
| 98 |
# Parse wavefront count
|
| 99 |
wave_match = re.search(r'SQ_WAVES:\s+(\d+)', stdout)
|
| 100 |
if wave_match:
|
| 101 |
metrics['sq_waves'] = int(wave_match.group(1))
|
| 102 |
-
|
| 103 |
# If no metrics found, return basic execution info
|
| 104 |
if not metrics:
|
| 105 |
metrics = {
|
|
@@ -108,47 +116,40 @@ class RocprofWrapper:
|
|
| 108 |
'gpu_utilization_percent': 75.0,
|
| 109 |
'sq_waves': 1024
|
| 110 |
}
|
| 111 |
-
|
| 112 |
metrics['success'] = True
|
| 113 |
return metrics
|
| 114 |
-
|
| 115 |
-
except
|
| 116 |
return {
|
| 117 |
'success': False,
|
| 118 |
'error': f'Failed to parse rocprof output: {str(e)}',
|
| 119 |
'execution_time_ms': 0
|
| 120 |
}
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
def _get_mock_profiling_data(self) -> Dict:
|
| 123 |
"""Generate mock profiling data for testing without ROCm"""
|
| 124 |
import random
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
# First iteration - worse performance (controlled failure)
|
| 132 |
-
execution_time = base_performance * 1.2 # 20% slower
|
| 133 |
-
bandwidth = 40.0 # Lower bandwidth utilization
|
| 134 |
-
utilization = 60.0 # Lower GPU utilization
|
| 135 |
-
else:
|
| 136 |
-
# Second iteration - better performance
|
| 137 |
-
execution_time = base_performance * 0.75 # 25% faster
|
| 138 |
-
bandwidth = 80.0 # Higher bandwidth utilization
|
| 139 |
-
utilization = 85.0 # Higher GPU utilization
|
| 140 |
-
|
| 141 |
-
self._iteration = iteration + 1
|
| 142 |
-
|
| 143 |
return {
|
| 144 |
'success': True,
|
| 145 |
'execution_time_ms': execution_time,
|
|
|
|
| 146 |
'memory_bandwidth_gbps': bandwidth,
|
| 147 |
'gpu_utilization_percent': utilization,
|
| 148 |
'sq_waves': random.randint(800, 1200),
|
| 149 |
-
'
|
| 150 |
}
|
| 151 |
-
|
| 152 |
def get_hardware_info(self) -> Dict:
|
| 153 |
"""Get AMD GPU hardware information"""
|
| 154 |
if not self.rocm_available:
|
|
@@ -159,26 +160,27 @@ class RocprofWrapper:
|
|
| 159 |
'memory_bandwidth_tb_s': 5.3,
|
| 160 |
'wavefront_size': 64
|
| 161 |
}
|
| 162 |
-
|
| 163 |
try:
|
| 164 |
# Try to get real GPU info using rocminfo or similar
|
| 165 |
cmd = ['rocminfo']
|
| 166 |
-
result = subprocess.run(
|
| 167 |
-
|
|
|
|
| 168 |
if result.returncode == 0:
|
| 169 |
return self._parse_rocminfo(result.stdout)
|
| 170 |
else:
|
| 171 |
return self._get_mock_hardware_info()
|
| 172 |
-
|
| 173 |
-
except
|
| 174 |
return self._get_mock_hardware_info()
|
| 175 |
-
|
| 176 |
-
def _parse_rocminfo(self,
|
| 177 |
"""Parse rocminfo output"""
|
| 178 |
# This would parse real rocminfo output
|
| 179 |
# For now, return mock data
|
| 180 |
return self._get_mock_hardware_info()
|
| 181 |
-
|
| 182 |
def _get_mock_hardware_info(self) -> Dict:
|
| 183 |
"""Mock hardware info for MI300X"""
|
| 184 |
return {
|
|
|
|
| 1 |
import subprocess
|
| 2 |
import tempfile
|
| 3 |
import os
|
|
|
|
| 4 |
import re
|
| 5 |
+
from typing import Dict, List, Tuple
|
| 6 |
+
|
| 7 |
|
| 8 |
class RocprofWrapper:
|
| 9 |
"""Wrapper for AMD rocprof profiler and hipcc compiler"""
|
| 10 |
+
|
| 11 |
def __init__(self):
|
| 12 |
+
self.rocm_available = os.getenv(
|
| 13 |
+
"ROCM_AVAILABLE", "false").lower() == "true"
|
| 14 |
self.hipcc_path = os.getenv("HIPCC_PATH", "hipcc")
|
| 15 |
self.rocprof_path = os.getenv("ROCPROF_PATH", "rocprof")
|
| 16 |
+
|
| 17 |
def compile_hip_code(self, hip_code: str, output_file: str = None) -> Tuple[bool, str]:
|
| 18 |
"""Compile HIP code using hipcc"""
|
| 19 |
if not self.rocm_available:
|
| 20 |
return True, "Mock compilation successful (ROCm not available)"
|
| 21 |
+
|
| 22 |
try:
|
| 23 |
with tempfile.NamedTemporaryFile(mode='w', suffix='.hip', delete=False) as f:
|
| 24 |
f.write(hip_code)
|
| 25 |
temp_file = f.name
|
| 26 |
+
|
| 27 |
if output_file is None:
|
| 28 |
output_file = temp_file.replace('.hip', '.out')
|
| 29 |
+
|
| 30 |
# Add -nocudalib and -arch=sm_60 to solve "Cannot find libdevice for sm_52" error
|
| 31 |
# This ensures compilation works even if CUDA device libraries are missing.
|
| 32 |
+
cmd = [self.hipcc_path, '-o', output_file,
|
| 33 |
+
temp_file, '-nocudalib', '-arch=sm_60']
|
| 34 |
+
|
| 35 |
# Set environment variable just in case hipcc invokes nvcc internally
|
| 36 |
env = os.environ.copy()
|
| 37 |
env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
|
| 38 |
+
|
| 39 |
+
result = subprocess.run(
|
| 40 |
+
cmd, capture_output=True, text=True, timeout=60, env=env, check=False)
|
| 41 |
+
|
| 42 |
# Cleanup
|
| 43 |
os.unlink(temp_file)
|
| 44 |
+
|
| 45 |
if result.returncode == 0:
|
| 46 |
return True, f"Compilation successful: {output_file}"
|
| 47 |
else:
|
| 48 |
return False, f"Compilation failed: {result.stderr}"
|
| 49 |
+
|
| 50 |
except subprocess.TimeoutExpired:
|
| 51 |
return False, "Compilation timed out"
|
| 52 |
+
except (OSError, subprocess.SubprocessError) as e:
|
| 53 |
return False, f"Compilation error: {str(e)}"
|
| 54 |
+
|
| 55 |
def run_with_profiling(self, executable_path: str, args: List[str] = None) -> Dict:
|
| 56 |
"""Run executable with rocprof profiling"""
|
| 57 |
if not self.rocm_available:
|
| 58 |
# Return mock profiling data
|
| 59 |
return self._get_mock_profiling_data()
|
| 60 |
+
|
| 61 |
try:
|
| 62 |
if args is None:
|
| 63 |
args = []
|
| 64 |
+
|
| 65 |
# Run with rocprof
|
| 66 |
+
cmd = [self.rocprof_path, '-i', 'default', '--'] + \
|
| 67 |
+
[executable_path] + args
|
| 68 |
+
result = subprocess.run(
|
| 69 |
+
cmd, capture_output=True, text=True, timeout=120, check=False)
|
| 70 |
+
|
| 71 |
# Parse rocprof output
|
| 72 |
+
profiling_data = self._parse_rocprof_output(
|
| 73 |
+
result.stdout, result.stderr)
|
| 74 |
+
|
| 75 |
return profiling_data
|
| 76 |
+
|
| 77 |
except subprocess.TimeoutExpired:
|
| 78 |
return {"error": "Profiling timed out", "execution_time_ms": 0}
|
| 79 |
+
except (OSError, subprocess.SubprocessError) as e:
|
| 80 |
return {"error": f"Profiling error: {str(e)}", "execution_time_ms": 0}
|
| 81 |
+
|
| 82 |
+
def _parse_rocprof_output(self, stdout: str, _stderr: str) -> Dict:
|
| 83 |
"""Parse rocprof output to extract metrics"""
|
| 84 |
try:
|
| 85 |
# Look for key metrics in rocprof output
|
| 86 |
metrics = {}
|
| 87 |
+
|
| 88 |
# Parse execution time
|
| 89 |
+
time_match = re.search(
|
| 90 |
+
r'Kernel execution time:\s+(\d+\.\d+)\s*ms', stdout)
|
| 91 |
if time_match:
|
| 92 |
metrics['execution_time_ms'] = float(time_match.group(1))
|
| 93 |
+
|
| 94 |
# Parse memory bandwidth
|
| 95 |
+
bandwidth_match = re.search(
|
| 96 |
+
r'Memory bandwidth:\s+(\d+\.\d+)\s*GB/s', stdout)
|
| 97 |
if bandwidth_match:
|
| 98 |
+
metrics['memory_bandwidth_gbps'] = float(
|
| 99 |
+
bandwidth_match.group(1))
|
| 100 |
+
|
| 101 |
# Parse GPU utilization
|
| 102 |
util_match = re.search(r'GPU utilization:\s+(\d+\.\d+)%', stdout)
|
| 103 |
if util_match:
|
| 104 |
metrics['gpu_utilization_percent'] = float(util_match.group(1))
|
| 105 |
+
|
| 106 |
# Parse wavefront count
|
| 107 |
wave_match = re.search(r'SQ_WAVES:\s+(\d+)', stdout)
|
| 108 |
if wave_match:
|
| 109 |
metrics['sq_waves'] = int(wave_match.group(1))
|
| 110 |
+
|
| 111 |
# If no metrics found, return basic execution info
|
| 112 |
if not metrics:
|
| 113 |
metrics = {
|
|
|
|
| 116 |
'gpu_utilization_percent': 75.0,
|
| 117 |
'sq_waves': 1024
|
| 118 |
}
|
| 119 |
+
|
| 120 |
metrics['success'] = True
|
| 121 |
return metrics
|
| 122 |
+
|
| 123 |
+
except (TypeError, ValueError) as e:
|
| 124 |
return {
|
| 125 |
'success': False,
|
| 126 |
'error': f'Failed to parse rocprof output: {str(e)}',
|
| 127 |
'execution_time_ms': 0
|
| 128 |
}
|
| 129 |
+
|
| 130 |
+
def get_mock_profiling_data(self) -> Dict:
|
| 131 |
+
"""Public accessor for mock profiling data used by testing layer."""
|
| 132 |
+
return self._get_mock_profiling_data()
|
| 133 |
+
|
| 134 |
def _get_mock_profiling_data(self) -> Dict:
|
| 135 |
"""Generate mock profiling data for testing without ROCm"""
|
| 136 |
import random
|
| 137 |
+
|
| 138 |
+
baseline_ms = 100.0
|
| 139 |
+
execution_time = random.uniform(85.0, 115.0)
|
| 140 |
+
bandwidth = random.uniform(35.0, 90.0)
|
| 141 |
+
utilization = random.uniform(55.0, 92.0)
|
| 142 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
return {
|
| 144 |
'success': True,
|
| 145 |
'execution_time_ms': execution_time,
|
| 146 |
+
'baseline_time_ms': baseline_ms,
|
| 147 |
'memory_bandwidth_gbps': bandwidth,
|
| 148 |
'gpu_utilization_percent': utilization,
|
| 149 |
'sq_waves': random.randint(800, 1200),
|
| 150 |
+
'simulated': True
|
| 151 |
}
|
| 152 |
+
|
| 153 |
def get_hardware_info(self) -> Dict:
|
| 154 |
"""Get AMD GPU hardware information"""
|
| 155 |
if not self.rocm_available:
|
|
|
|
| 160 |
'memory_bandwidth_tb_s': 5.3,
|
| 161 |
'wavefront_size': 64
|
| 162 |
}
|
| 163 |
+
|
| 164 |
try:
|
| 165 |
# Try to get real GPU info using rocminfo or similar
|
| 166 |
cmd = ['rocminfo']
|
| 167 |
+
result = subprocess.run(
|
| 168 |
+
cmd, capture_output=True, text=True, timeout=10, check=False)
|
| 169 |
+
|
| 170 |
if result.returncode == 0:
|
| 171 |
return self._parse_rocminfo(result.stdout)
|
| 172 |
else:
|
| 173 |
return self._get_mock_hardware_info()
|
| 174 |
+
|
| 175 |
+
except (OSError, subprocess.SubprocessError):
|
| 176 |
return self._get_mock_hardware_info()
|
| 177 |
+
|
| 178 |
+
def _parse_rocminfo(self, _output: str) -> Dict:
|
| 179 |
"""Parse rocminfo output"""
|
| 180 |
# This would parse real rocminfo output
|
| 181 |
# For now, return mock data
|
| 182 |
return self._get_mock_hardware_info()
|
| 183 |
+
|
| 184 |
def _get_mock_hardware_info(self) -> Dict:
|
| 185 |
"""Mock hardware info for MI300X"""
|
| 186 |
return {
|
docs/FAILURE_CASES.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Failure Cases
|
| 2 |
+
|
| 3 |
+
This document records known failure modes with reproducible context.
|
| 4 |
+
|
| 5 |
+
## FC-001: Inline PTX in CUDA Kernel
|
| 6 |
+
|
| 7 |
+
### Why this matters
|
| 8 |
+
Kernels that embed inline PTX are a realistic migration boundary. hipify can translate CUDA APIs, but it cannot preserve NVIDIA-specific assembly semantics on AMD.
|
| 9 |
+
|
| 10 |
+
### Original CUDA pattern (simplified)
|
| 11 |
+
```cpp
|
| 12 |
+
__device__ __forceinline__ unsigned lane_id() {
|
| 13 |
+
unsigned lane;
|
| 14 |
+
asm volatile("mov.u32 %0, %%laneid;" : "=r"(lane));
|
| 15 |
+
return lane;
|
| 16 |
+
}
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
### Typical migration output
|
| 20 |
+
- CUDA runtime calls are translated.
|
| 21 |
+
- Inline PTX block is left unchanged or translated into invalid code for HIP compilation.
|
| 22 |
+
|
| 23 |
+
### Observed failure mode
|
| 24 |
+
- Compile error under hipcc due to unsupported PTX instruction syntax.
|
| 25 |
+
- In some cases, compile succeeds after manual edits but semantics differ because lane behavior assumptions are NVIDIA-specific.
|
| 26 |
+
|
| 27 |
+
### Root cause
|
| 28 |
+
- Inline PTX is vendor-specific and outside mechanical translation scope.
|
| 29 |
+
- Warp-level assumptions in PTX often rely on 32-lane behavior and NVIDIA ISA details.
|
| 30 |
+
|
| 31 |
+
### What is required to fix
|
| 32 |
+
1. Replace inline PTX with HIP or portable intrinsics.
|
| 33 |
+
2. Rework lane-level logic for wavefront-64 behavior where required.
|
| 34 |
+
3. Add correctness tests for edge lanes and reduction boundaries.
|
| 35 |
+
4. Re-profile after rewrite to confirm no occupancy regressions.
|
| 36 |
+
|
| 37 |
+
### Trust note
|
| 38 |
+
This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.
|
docs/JUDGE_MODE.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Judge Mode Walkthrough
|
| 2 |
+
|
| 3 |
+
Use this sequence during technical evaluation.
|
| 4 |
+
|
| 5 |
+
## Goal
|
| 6 |
+
Make every claim falsifiable and easy to verify.
|
| 7 |
+
|
| 8 |
+
## Flow
|
| 9 |
+
1. Show raw CUDA input.
|
| 10 |
+
2. Run baseline translation only (straight hipify output).
|
| 11 |
+
3. Show baseline compile/profiler result.
|
| 12 |
+
4. Run full ROCmPort AI loop.
|
| 13 |
+
5. Show each agent event and decisions.
|
| 14 |
+
6. Compare final output against the declared baseline.
|
| 15 |
+
7. Show one weak result (small gain or no gain) and explain why.
|
| 16 |
+
|
| 17 |
+
## Baseline Policy
|
| 18 |
+
- Primary baseline: straight hipify output with minimal required compile edits.
|
| 19 |
+
- Never switch baselines mid-demo.
|
| 20 |
+
- Repeat baseline definition before showing speedup.
|
| 21 |
+
|
| 22 |
+
## Required Artifacts
|
| 23 |
+
- CUDA source.
|
| 24 |
+
- Baseline HIP output.
|
| 25 |
+
- Optimized HIP output.
|
| 26 |
+
- Compile logs.
|
| 27 |
+
- Profiler summary.
|
| 28 |
+
- Final report with rationale.
|
| 29 |
+
|
| 30 |
+
## Suggested Script
|
| 31 |
+
- "Here is the original CUDA kernel."
|
| 32 |
+
- "Here is baseline HIP produced by hipify only."
|
| 33 |
+
- "Now we run the orchestration loop and show each decision."
|
| 34 |
+
- "This is the final code diff and measured result versus baseline."
|
| 35 |
+
- "Here is a case where gain is limited, and why."
|
| 36 |
+
|
| 37 |
+
## Pass/Fail Criteria
|
| 38 |
+
A demo is credible if:
|
| 39 |
+
- Baseline is explicit.
|
| 40 |
+
- Intermediate artifacts are visible.
|
| 41 |
+
- At least one non-win case is included.
|
| 42 |
+
- Reasoning matches observed profiler data.
|
frontend/index.html
CHANGED
|
@@ -1,503 +1,1112 @@
|
|
| 1 |
<!DOCTYPE html>
|
| 2 |
<html lang="en">
|
|
|
|
| 3 |
<head>
|
| 4 |
-
<meta charset="UTF-8">
|
| 5 |
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
-
<title>ROCmPort AI</title>
|
| 7 |
-
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 8 |
-
<link
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
}
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
}
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
.
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
}
|
| 114 |
-
|
| 115 |
-
.
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
.
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
.
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
.
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
}
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
.
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
.
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
.
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
}
|
| 370 |
-
|
| 371 |
-
.
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
.
|
| 389 |
-
|
| 390 |
-
|
| 391 |
-
|
| 392 |
-
|
| 393 |
-
|
| 394 |
-
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
.
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
| 415 |
-
|
| 416 |
-
|
| 417 |
-
.
|
| 418 |
-
|
| 419 |
-
|
| 420 |
-
|
| 421 |
-
|
| 422 |
-
|
| 423 |
-
|
| 424 |
-
|
| 425 |
-
|
| 426 |
-
|
| 427 |
-
|
| 428 |
-
.
|
| 429 |
-
|
| 430 |
-
|
| 431 |
-
|
| 432 |
-
.
|
| 433 |
-
|
| 434 |
-
|
| 435 |
-
|
| 436 |
-
|
| 437 |
-
|
| 438 |
-
|
| 439 |
-
|
| 440 |
-
|
| 441 |
-
|
| 442 |
-
|
| 443 |
-
|
| 444 |
-
|
| 445 |
-
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
|
| 450 |
-
|
| 451 |
-
|
| 452 |
-
|
| 453 |
-
.
|
| 454 |
-
|
| 455 |
-
|
| 456 |
-
|
| 457 |
-
|
| 458 |
-
|
| 459 |
-
|
| 460 |
-
|
| 461 |
-
|
| 462 |
-
|
| 463 |
-
|
| 464 |
-
|
| 465 |
-
|
| 466 |
-
|
| 467 |
-
|
| 468 |
-
|
| 469 |
-
|
| 470 |
-
|
| 471 |
-
|
| 472 |
-
|
| 473 |
-
|
| 474 |
-
|
| 475 |
-
|
| 476 |
-
|
| 477 |
-
|
| 478 |
-
|
| 479 |
-
|
| 480 |
-
|
| 481 |
-
|
| 482 |
-
|
| 483 |
-
|
| 484 |
-
.
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
|
| 488 |
-
|
| 489 |
-
|
| 490 |
-
|
| 491 |
-
|
| 492 |
-
|
| 493 |
-
|
| 494 |
-
|
| 495 |
-
|
| 496 |
-
|
| 497 |
-
|
| 498 |
-
|
| 499 |
-
|
| 500 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 501 |
</head>
|
| 502 |
<div id="cursor"></div>
|
| 503 |
|
|
@@ -506,13 +1115,16 @@ footer a:hover { color: var(--t2); border-bottom-color: var(--muted); }
|
|
| 506 |
<div class="logo">ROCmPort <em>AI</em></div>
|
| 507 |
<div class="hr">
|
| 508 |
<div class="hd on" id="hdot"></div>
|
| 509 |
-
<span id="hstat">
|
| 510 |
</div>
|
| 511 |
</header>
|
| 512 |
|
| 513 |
<div class="g">
|
| 514 |
<div class="p">
|
| 515 |
-
<div class="ph">
|
|
|
|
|
|
|
|
|
|
| 516 |
<textarea class="code" id="inp" spellcheck="false" placeholder="// Paste CUDA code here
|
| 517 |
// or pick a demo below
|
| 518 |
|
|
@@ -531,7 +1143,10 @@ __global__ void kernel(float* A, float* B, int N) {
|
|
| 531 |
</div>
|
| 532 |
|
| 533 |
<div class="p">
|
| 534 |
-
<div class="ph">
|
|
|
|
|
|
|
|
|
|
| 535 |
<div class="timeline" id="tl">
|
| 536 |
<!-- Nodes injected by JS -->
|
| 537 |
</div>
|
|
@@ -561,243 +1176,247 @@ __global__ void kernel(float* A, float* B, int N) {
|
|
| 561 |
|
| 562 |
<footer>
|
| 563 |
<div>ROCmPort AI — AMD Developer Hackathon 2025</div>
|
| 564 |
-
<div><a href="https://x.com/TazwarEnan" target="_blank">Tazwar Ahnaf Enan</a> · <a
|
|
|
|
| 565 |
</footer>
|
| 566 |
</div>
|
| 567 |
|
| 568 |
<div class="mo" id="modal">
|
| 569 |
<div class="mb">
|
| 570 |
-
<div class="mt">
|
|
|
|
|
|
|
| 571 |
<div class="mc"><textarea id="edt"></textarea></div>
|
| 572 |
-
<div class="mf"><button class="bs" onclick="cm()">Cancel</button><button class="bs r"
|
|
|
|
| 573 |
</div>
|
| 574 |
</div>
|
| 575 |
<script>
|
| 576 |
-
const API = 'http://localhost:8000';
|
| 577 |
-
const S = { code: '', kn: 'custom', run: false, t0: null, iv: null, rep: null, tl: [], kernels: {} };
|
| 578 |
-
const AG = {
|
| 579 |
-
|
| 580 |
-
|
| 581 |
-
|
| 582 |
-
|
| 583 |
-
|
| 584 |
-
};
|
| 585 |
-
|
| 586 |
-
// Custom Cursor Logic
|
| 587 |
-
const cur = document.getElementById('cursor');
|
| 588 |
-
document.addEventListener('mousemove', (e) => {
|
| 589 |
-
cur.style.left = e.clientX + 'px';
|
| 590 |
-
cur.style.top = e.clientY + 'px';
|
| 591 |
-
const target = e.target;
|
| 592 |
-
const isClickable = target.onclick ||
|
| 593 |
-
target.tagName === 'BUTTON' ||
|
| 594 |
-
target.tagName === 'A' ||
|
| 595 |
-
target.tagName === 'TEXTAREA' ||
|
| 596 |
-
target.classList.contains('ch') ||
|
| 597 |
-
target.classList.contains('tab');
|
| 598 |
-
|
| 599 |
-
if (isClickable) {
|
| 600 |
-
cur.classList.add('active');
|
| 601 |
-
if (target.id === 'go') cur.style.background = 'rgba(255, 51, 68, 0.5)';
|
| 602 |
-
else cur.style.background = 'rgba(255, 255, 255, 0.3)';
|
| 603 |
-
} else {
|
| 604 |
-
cur.classList.remove('active');
|
| 605 |
-
cur.style.background = 'rgba(255, 255, 255, 0.2)';
|
| 606 |
-
}
|
| 607 |
-
});
|
| 608 |
-
|
| 609 |
-
async function init() {
|
| 610 |
-
const ta = document.getElementById('inp');
|
| 611 |
-
ta.oninput = () => {
|
| 612 |
-
document.getElementById('lc').textContent = ta.value.split('\n').length + ' lines';
|
| 613 |
-
S.code = ta.value;
|
| 614 |
};
|
| 615 |
-
|
| 616 |
-
|
| 617 |
-
|
| 618 |
-
|
| 619 |
-
|
| 620 |
-
|
| 621 |
-
|
| 622 |
-
|
| 623 |
-
|
| 624 |
-
|
| 625 |
-
|
| 626 |
-
|
| 627 |
-
|
| 628 |
-
|
| 629 |
-
|
| 630 |
-
|
| 631 |
-
|
| 632 |
-
|
| 633 |
-
|
| 634 |
-
|
| 635 |
-
|
| 636 |
-
|
| 637 |
-
async function go() {
|
| 638 |
-
if (S.run) return;
|
| 639 |
-
const code = document.getElementById('inp').value.trim();
|
| 640 |
-
if (!code) return;
|
| 641 |
-
|
| 642 |
-
S.code = code; S.run = true; S.t0 = Date.now(); S.tl = [];
|
| 643 |
-
const btn = document.getElementById('go');
|
| 644 |
-
btn.disabled = true;
|
| 645 |
-
btn.textContent = 'Awaiting Agents...';
|
| 646 |
-
|
| 647 |
-
document.getElementById('hstat').textContent = '🤖 Agents thinking...';
|
| 648 |
-
document.getElementById('rp').classList.add('hide');
|
| 649 |
-
|
| 650 |
-
bLog();
|
| 651 |
-
sTimer();
|
| 652 |
-
|
| 653 |
-
try {
|
| 654 |
-
const simpleModeCheckbox = document.getElementById('sm');
|
| 655 |
-
const res = await fetch(API + '/port', {
|
| 656 |
-
method: 'POST',
|
| 657 |
-
headers: { 'Content-Type': 'application/json' },
|
| 658 |
-
body: JSON.stringify({
|
| 659 |
-
cuda_code: code,
|
| 660 |
-
kernel_name: S.kn,
|
| 661 |
-
simple_mode: simpleModeCheckbox ? simpleModeCheckbox.checked : false
|
| 662 |
-
})
|
| 663 |
-
});
|
| 664 |
-
|
| 665 |
-
// Show results panel with loader immediately
|
| 666 |
-
document.getElementById('rp').classList.remove('hide');
|
| 667 |
-
document.getElementById('t-loader').classList.remove('hide');
|
| 668 |
-
document.getElementById('t-sum').classList.remove('on');
|
| 669 |
-
document.getElementById('t-diff').classList.remove('on');
|
| 670 |
-
document.getElementById('t-det').classList.remove('on');
|
| 671 |
-
|
| 672 |
-
const rd = res.body.getReader(), dc = new TextDecoder();
|
| 673 |
-
let buf = '';
|
| 674 |
-
while (true) {
|
| 675 |
-
const { done, value } = await rd.read();
|
| 676 |
-
if (done) break;
|
| 677 |
-
buf += dc.decode(value, { stream: true });
|
| 678 |
-
const lines = buf.split('\n');
|
| 679 |
-
buf = lines.pop();
|
| 680 |
-
for (const ln of lines) {
|
| 681 |
-
if (!ln.startsWith('data: ')) continue;
|
| 682 |
-
const raw = ln.slice(6).trim();
|
| 683 |
-
if (raw === '[DONE]') { done_(); break; }
|
| 684 |
-
try { hEvt(JSON.parse(raw)); } catch (e) { console.error('Parse error:', e); }
|
| 685 |
-
}
|
| 686 |
}
|
| 687 |
-
}
|
| 688 |
-
|
| 689 |
-
|
| 690 |
-
|
| 691 |
-
|
| 692 |
-
|
| 693 |
-
|
| 694 |
-
|
| 695 |
-
|
| 696 |
-
|
|
|
|
|
|
|
| 697 |
}
|
| 698 |
-
|
| 699 |
-
|
| 700 |
-
|
| 701 |
-
|
| 702 |
-
|
| 703 |
-
|
| 704 |
-
|
| 705 |
-
|
| 706 |
-
|
| 707 |
-
|
| 708 |
-
|
| 709 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 710 |
});
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 711 |
}
|
| 712 |
}
|
| 713 |
-
|
| 714 |
-
|
| 715 |
-
|
| 716 |
-
|
| 717 |
-
|
| 718 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 719 |
}
|
| 720 |
-
}
|
| 721 |
|
| 722 |
-
function done_() {
|
| 723 |
-
|
| 724 |
-
|
| 725 |
-
|
| 726 |
-
|
| 727 |
-
|
|
|
|
| 728 |
}
|
| 729 |
-
|
| 730 |
-
|
| 731 |
-
|
| 732 |
-
|
| 733 |
-
|
| 734 |
-
|
| 735 |
-
|
| 736 |
-
|
| 737 |
-
|
| 738 |
-
|
| 739 |
-
|
| 740 |
-
|
| 741 |
-
|
| 742 |
-
|
| 743 |
-
|
| 744 |
-
d.innerHTML = `
|
| 745 |
<div class="at">
|
| 746 |
<span class="an">${obj.n}</span>
|
| 747 |
<span class="am" id="am-${k}">Waiting</span>
|
| 748 |
</div>
|
| 749 |
<div class="ad" id="ad-${k}"></div>`;
|
| 750 |
-
|
| 751 |
-
|
| 752 |
-
|
| 753 |
-
|
| 754 |
-
|
| 755 |
-
|
| 756 |
-
|
| 757 |
-
|
| 758 |
-
|
| 759 |
-
|
|
|
|
| 760 |
}
|
| 761 |
-
|
| 762 |
-
|
| 763 |
-
|
| 764 |
-
|
| 765 |
-
|
| 766 |
-
|
| 767 |
-
|
| 768 |
-
|
| 769 |
-
|
| 770 |
-
|
| 771 |
-
|
| 772 |
-
|
| 773 |
-
|
| 774 |
-
|
| 775 |
-
|
| 776 |
-
|
| 777 |
-
|
| 778 |
-
|
| 779 |
-
|
| 780 |
-
|
| 781 |
-
|
| 782 |
-
.
|
| 783 |
-
|
| 784 |
}
|
| 785 |
-
}
|
| 786 |
|
| 787 |
-
function rRes(r, tl) {
|
| 788 |
-
|
| 789 |
-
|
| 790 |
-
|
| 791 |
-
|
| 792 |
-
|
| 793 |
-
|
| 794 |
|
| 795 |
-
|
| 796 |
<div class="sum-row">
|
| 797 |
<div class="sum-big">
|
| 798 |
${r.speedup}x
|
| 799 |
<span class="u">vs baseline hipify</span>
|
| 800 |
-
<span class="vic">
|
| 801 |
</div>
|
| 802 |
<div class="sum-sep"></div>
|
| 803 |
<div>
|
|
@@ -819,105 +1438,106 @@ function rRes(r, tl) {
|
|
| 819 |
${r.simplified_explanation ? esc(r.simplified_explanation) : '<em>Simplified explanation will appear here</em>'}
|
| 820 |
</div>`;
|
| 821 |
|
| 822 |
-
|
| 823 |
-
|
| 824 |
<div class="di"><div class="dl">Speedup</div><div class="dv g">${r.speedup}x</div><div class="ds">optimized ROCm vs straight hipify output</div></div>
|
| 825 |
<div class="di"><div class="dl">Bandwidth</div><div class="dv c">${bw != null ? bw.toFixed(1) : '—'}%</div><div class="ds">of MI300X 5.3 TB/s HBM3</div></div>
|
| 826 |
<div class="di"><div class="dl">Changes</div><div class="dv y">${r.total_changes}</div><div class="ds">hipify + LLM + optimizer changes</div></div>
|
| 827 |
<div class="di"><div class="dl">Iterations</div><div class="dv c">${r.iterations || 1}</div><div class="ds">optimizer retry loop count</div></div>
|
| 828 |
<div class="di"><div class="dl">Type</div><div class="dv t">${(r.bottleneck || '—').toUpperCase()}</div><div class="ds">workload classification</div></div>
|
| 829 |
</div>`;
|
| 830 |
-
|
| 831 |
-
|
| 832 |
-
|
| 833 |
-
|
| 834 |
-
|
| 835 |
-
|
| 836 |
<div class="bl">${esc(d.label)}</div>
|
| 837 |
<div class="bt"><div class="bf ${d.good ? 'good' : 'bad'}" style="width: 0" data-w="${pct}%"></div></div>
|
| 838 |
<div class="bv ${d.good ? 'good' : 'bad'}">${d.speedup}x</div>
|
| 839 |
</div>`;
|
| 840 |
-
|
| 841 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 842 |
}
|
| 843 |
-
|
| 844 |
-
|
| 845 |
-
|
| 846 |
-
|
| 847 |
-
|
| 848 |
-
|
| 849 |
-
document.
|
| 850 |
-
b.style.width = b.dataset.w;
|
| 851 |
-
});
|
| 852 |
-
}, 100);
|
| 853 |
-
}
|
| 854 |
-
|
| 855 |
-
function rDiff(o, n) {
|
| 856 |
-
if (!o || !n) return;
|
| 857 |
-
const oe = document.getElementById('d-o'), ne = document.getElementById('d-n');
|
| 858 |
-
if (oe && oe.innerHTML && ne && ne.innerHTML) return; // Already rendered
|
| 859 |
-
|
| 860 |
-
document.getElementById('t-diff').innerHTML = `<div class="dg">
|
| 861 |
<div class="dfs"><div class="dfh"><span class="dft cu">CUDA</span> Original Source</div><pre class="dfp" id="d-o"></pre></div>
|
| 862 |
<div class="dfs"><div class="dfh"><span class="dft ro">ROCm</span> Optimized HIP</div><pre class="dfp" id="d-n"></pre></div>
|
| 863 |
</div>`;
|
| 864 |
-
|
| 865 |
-
|
| 866 |
-
|
| 867 |
-
|
| 868 |
-
|
| 869 |
-
|
| 870 |
-
|
|
|
|
|
|
|
|
|
|
| 871 |
}
|
| 872 |
-
|
| 873 |
-
document.getElementById('
|
| 874 |
-
}
|
| 875 |
-
|
| 876 |
-
function
|
| 877 |
-
|
| 878 |
-
|
| 879 |
-
|
| 880 |
-
|
| 881 |
-
|
| 882 |
-
|
| 883 |
-
}
|
| 884 |
-
|
| 885 |
-
function
|
| 886 |
-
|
| 887 |
-
|
| 888 |
-
|
| 889 |
-
|
| 890 |
-
|
| 891 |
-
|
| 892 |
-
|
| 893 |
-
|
| 894 |
-
|
| 895 |
-
|
| 896 |
-
|
| 897 |
-
|
| 898 |
-
|
| 899 |
-
|
| 900 |
-
|
| 901 |
-
|
| 902 |
-
|
| 903 |
-
|
| 904 |
-
|
| 905 |
-
|
| 906 |
-
|
| 907 |
-
|
| 908 |
-
|
| 909 |
-
|
| 910 |
-
|
| 911 |
-
|
| 912 |
-
|
| 913 |
-
|
| 914 |
-
|
| 915 |
-
|
| 916 |
-
|
| 917 |
-
|
| 918 |
-
};
|
| 919 |
-
|
| 920 |
-
init();
|
| 921 |
</script>
|
| 922 |
</body>
|
|
|
|
| 923 |
</html>
|
|
|
|
| 1 |
<!DOCTYPE html>
|
| 2 |
<html lang="en">
|
| 3 |
+
|
| 4 |
<head>
|
| 5 |
+
<meta charset="UTF-8">
|
| 6 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 7 |
+
<title>ROCmPort AI</title>
|
| 8 |
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 9 |
+
<link
|
| 10 |
+
href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&family=Space+Grotesk:wght@500;600;700&display=swap"
|
| 11 |
+
rel="stylesheet">
|
| 12 |
+
<style>
|
| 13 |
+
:root {
|
| 14 |
+
--bg: #030303;
|
| 15 |
+
--s1: #0a0a0b;
|
| 16 |
+
--s2: #121214;
|
| 17 |
+
--s3: #1a1a1e;
|
| 18 |
+
--b1: rgba(255, 255, 255, 0.08);
|
| 19 |
+
--b2: rgba(255, 255, 255, 0.15);
|
| 20 |
+
--red: #ff3344;
|
| 21 |
+
--red-glow: rgba(255, 51, 68, 0.4);
|
| 22 |
+
--green: #00ff88;
|
| 23 |
+
--green-glow: rgba(0, 255, 136, 0.4);
|
| 24 |
+
--yellow: #ffcc00;
|
| 25 |
+
--cyan: #00d9ff;
|
| 26 |
+
--muted: #88888e;
|
| 27 |
+
--t1: #a1a1aa;
|
| 28 |
+
--t2: #d4d4d8;
|
| 29 |
+
--t3: #ffffff;
|
| 30 |
+
--mono: 'JetBrains Mono', monospace;
|
| 31 |
+
--sans: 'Space Grotesk', sans-serif;
|
| 32 |
+
--spring: cubic-bezier(0.34, 1.56, 0.64, 1);
|
| 33 |
+
}
|
| 34 |
+
|
| 35 |
+
* {
|
| 36 |
+
margin: 0;
|
| 37 |
+
padding: 0;
|
| 38 |
+
box-sizing: border-box;
|
| 39 |
+
cursor: none !important;
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
.hide {
|
| 43 |
+
display: none !important;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
body {
|
| 47 |
+
background: var(--bg);
|
| 48 |
+
color: var(--t1);
|
| 49 |
+
font-family: var(--sans);
|
| 50 |
+
font-size: 14px;
|
| 51 |
+
line-height: 1.6;
|
| 52 |
+
overflow-x: hidden;
|
| 53 |
+
min-height: 100vh;
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
/* Animated Gradient Background */
|
| 57 |
+
body::before {
|
| 58 |
+
content: '';
|
| 59 |
+
position: fixed;
|
| 60 |
+
inset: 0;
|
| 61 |
+
background:
|
| 62 |
+
radial-gradient(circle at 20% 30%, rgba(0, 217, 255, 0.05), transparent 40%),
|
| 63 |
+
radial-gradient(circle at 80% 70%, rgba(255, 51, 68, 0.05), transparent 40%),
|
| 64 |
+
radial-gradient(circle at 50% 50%, rgba(0, 255, 136, 0.03), transparent 60%);
|
| 65 |
+
z-index: -1;
|
| 66 |
+
animation: bgMove 20s ease-in-out infinite alternate;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
@keyframes bgMove {
|
| 70 |
+
0% {
|
| 71 |
+
transform: scale(1) translate(0, 0);
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
50% {
|
| 75 |
+
transform: scale(1.1) translate(20px, -20px);
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
100% {
|
| 79 |
+
transform: scale(1) translate(-20px, 20px);
|
| 80 |
+
}
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.w {
|
| 84 |
+
max-width: 1200px;
|
| 85 |
+
margin: 0 auto;
|
| 86 |
+
padding: 32px 24px;
|
| 87 |
+
position: relative;
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
/* Container Glow */
|
| 91 |
+
.w::after {
|
| 92 |
+
content: '';
|
| 93 |
+
position: absolute;
|
| 94 |
+
inset: 0;
|
| 95 |
+
background: radial-gradient(circle at 50% 0%, rgba(255, 51, 68, 0.08), transparent 70%);
|
| 96 |
+
pointer-events: none;
|
| 97 |
+
z-index: -1;
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
header {
|
| 101 |
+
padding-bottom: 24px;
|
| 102 |
+
border-bottom: 1px solid var(--b1);
|
| 103 |
+
display: flex;
|
| 104 |
+
align-items: center;
|
| 105 |
+
justify-content: space-between;
|
| 106 |
+
margin-bottom: 24px;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.logo {
|
| 110 |
+
font-weight: 700;
|
| 111 |
+
font-size: 18px;
|
| 112 |
+
color: var(--t3);
|
| 113 |
+
letter-spacing: -0.02em;
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
.logo em {
|
| 117 |
+
font-style: normal;
|
| 118 |
+
color: var(--red);
|
| 119 |
+
text-shadow: 0 0 15px var(--red-glow);
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
.hr {
|
| 123 |
+
font-size: 12px;
|
| 124 |
+
color: var(--muted);
|
| 125 |
+
display: flex;
|
| 126 |
+
align-items: center;
|
| 127 |
+
gap: 10px;
|
| 128 |
+
background: var(--s1);
|
| 129 |
+
padding: 6px 12px;
|
| 130 |
+
border-radius: 20px;
|
| 131 |
+
border: 1px solid var(--b1);
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.hd {
|
| 135 |
+
width: 6px;
|
| 136 |
+
height: 6px;
|
| 137 |
+
border-radius: 50%;
|
| 138 |
+
background: var(--green);
|
| 139 |
+
box-shadow: 0 0 10px var(--green-glow);
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
.hd.on {
|
| 143 |
+
animation: pulse 2s ease-in-out infinite;
|
| 144 |
+
}
|
| 145 |
+
|
| 146 |
+
@keyframes pulse {
|
| 147 |
+
|
| 148 |
+
0%,
|
| 149 |
+
100% {
|
| 150 |
+
opacity: 1;
|
| 151 |
+
transform: scale(1);
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
50% {
|
| 155 |
+
opacity: 0.4;
|
| 156 |
+
transform: scale(0.8);
|
| 157 |
+
}
|
| 158 |
+
}
|
| 159 |
+
|
| 160 |
+
.g {
|
| 161 |
+
display: grid;
|
| 162 |
+
grid-template-columns: 1.2fr 0.8fr;
|
| 163 |
+
gap: 24px;
|
| 164 |
+
padding: 0;
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
.fs {
|
| 168 |
+
grid-column: 1 / -1;
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
@media (max-width: 900px) {
|
| 172 |
+
.g {
|
| 173 |
+
grid-template-columns: 1fr;
|
| 174 |
+
}
|
| 175 |
+
}
|
| 176 |
+
|
| 177 |
+
/* Card Styling */
|
| 178 |
+
.p {
|
| 179 |
+
background: var(--s1);
|
| 180 |
+
border: 1px solid var(--b1);
|
| 181 |
+
border-radius: 12px;
|
| 182 |
+
overflow: hidden;
|
| 183 |
+
display: flex;
|
| 184 |
+
flex-direction: column;
|
| 185 |
+
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.4);
|
| 186 |
+
backdrop-filter: blur(10px);
|
| 187 |
+
transition: transform 0.3s var(--spring), border-color 0.3s ease;
|
| 188 |
+
}
|
| 189 |
+
|
| 190 |
+
.p:hover {
|
| 191 |
+
border-color: var(--b2);
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.ph {
|
| 195 |
+
padding: 12px 16px;
|
| 196 |
+
border-bottom: 1px solid var(--b1);
|
| 197 |
+
display: flex;
|
| 198 |
+
align-items: center;
|
| 199 |
+
justify-content: space-between;
|
| 200 |
+
font-size: 12px;
|
| 201 |
+
color: var(--muted);
|
| 202 |
+
background: rgba(255, 255, 255, 0.02);
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
.ph b {
|
| 206 |
+
color: var(--red);
|
| 207 |
+
font-weight: 600;
|
| 208 |
+
text-transform: uppercase;
|
| 209 |
+
letter-spacing: 0.05em;
|
| 210 |
+
}
|
| 211 |
+
|
| 212 |
+
textarea.code {
|
| 213 |
+
width: 100%;
|
| 214 |
+
flex: 1;
|
| 215 |
+
min-height: 300px;
|
| 216 |
+
background: var(--bg);
|
| 217 |
+
border: none;
|
| 218 |
+
color: var(--t2);
|
| 219 |
+
font-family: var(--mono);
|
| 220 |
+
font-size: 13px;
|
| 221 |
+
line-height: 1.7;
|
| 222 |
+
padding: 20px;
|
| 223 |
+
resize: vertical;
|
| 224 |
+
outline: none;
|
| 225 |
+
caret-color: var(--red);
|
| 226 |
+
will-change: transform;
|
| 227 |
+
}
|
| 228 |
+
|
| 229 |
+
.db {
|
| 230 |
+
padding: 12px 16px;
|
| 231 |
+
border-top: 1px solid var(--b1);
|
| 232 |
+
display: flex;
|
| 233 |
+
align-items: center;
|
| 234 |
+
gap: 8px;
|
| 235 |
+
background: var(--s1);
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
.db .l {
|
| 239 |
+
font-size: 11px;
|
| 240 |
+
color: var(--muted);
|
| 241 |
+
font-weight: 500;
|
| 242 |
+
}
|
| 243 |
+
|
| 244 |
+
.ch {
|
| 245 |
+
font-family: var(--sans);
|
| 246 |
+
font-size: 11px;
|
| 247 |
+
padding: 4px 12px;
|
| 248 |
+
background: var(--s2);
|
| 249 |
+
border: 1px solid var(--b1);
|
| 250 |
+
border-radius: 6px;
|
| 251 |
+
color: var(--t1);
|
| 252 |
+
cursor: pointer;
|
| 253 |
+
transition: all 0.2s var(--spring);
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
.ch:hover {
|
| 257 |
+
background: var(--s3);
|
| 258 |
+
color: var(--t3);
|
| 259 |
+
transform: translateY(-1px);
|
| 260 |
+
border-color: var(--b2);
|
| 261 |
+
}
|
| 262 |
+
|
| 263 |
+
.ch.on {
|
| 264 |
+
background: var(--red);
|
| 265 |
+
border-color: var(--red);
|
| 266 |
+
color: #fff;
|
| 267 |
+
box-shadow: 0 0 15px var(--red-glow);
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
.bg {
|
| 271 |
+
margin: 16px;
|
| 272 |
+
padding: 14px;
|
| 273 |
+
background: var(--red);
|
| 274 |
+
border: none;
|
| 275 |
+
border-radius: 8px;
|
| 276 |
+
color: #fff;
|
| 277 |
+
font-family: var(--sans);
|
| 278 |
+
font-size: 14px;
|
| 279 |
+
font-weight: 700;
|
| 280 |
+
cursor: pointer;
|
| 281 |
+
transition: all 0.3s var(--spring);
|
| 282 |
+
text-transform: uppercase;
|
| 283 |
+
letter-spacing: 0.05em;
|
| 284 |
+
box-shadow: 0 4px 15px var(--red-glow);
|
| 285 |
+
}
|
| 286 |
+
|
| 287 |
+
.bg:hover {
|
| 288 |
+
background: #ff4d5a;
|
| 289 |
+
transform: translateY(-2px);
|
| 290 |
+
box-shadow: 0 6px 20px var(--red-glow);
|
| 291 |
+
}
|
| 292 |
+
|
| 293 |
+
.bg:active {
|
| 294 |
+
transform: translateY(0);
|
| 295 |
+
}
|
| 296 |
+
|
| 297 |
+
.bg:disabled {
|
| 298 |
+
opacity: 0.4;
|
| 299 |
+
cursor: not-allowed;
|
| 300 |
+
transform: none;
|
| 301 |
+
box-shadow: none;
|
| 302 |
+
}
|
| 303 |
+
|
| 304 |
+
/* Agent log */
|
| 305 |
+
.al {
|
| 306 |
+
padding: 12px;
|
| 307 |
+
display: flex;
|
| 308 |
+
flex-direction: column;
|
| 309 |
+
gap: 8px;
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
+
.ar {
|
| 313 |
+
padding: 12px 16px;
|
| 314 |
+
border-radius: 8px;
|
| 315 |
+
background: rgba(255, 255, 255, 0.03);
|
| 316 |
+
border: 1px solid transparent;
|
| 317 |
+
transition: all 0.4s var(--spring);
|
| 318 |
+
animation: slideIn 0.5s var(--spring) forwards;
|
| 319 |
+
opacity: 0;
|
| 320 |
+
transform: translateX(20px);
|
| 321 |
+
}
|
| 322 |
+
|
| 323 |
+
@keyframes slideIn {
|
| 324 |
+
to {
|
| 325 |
+
opacity: 1;
|
| 326 |
+
transform: translateX(0);
|
| 327 |
+
}
|
| 328 |
+
}
|
| 329 |
+
|
| 330 |
+
.ar.run {
|
| 331 |
+
border-color: var(--cyan);
|
| 332 |
+
background: rgba(0, 217, 255, 0.05);
|
| 333 |
+
}
|
| 334 |
+
|
| 335 |
+
.ar.done {
|
| 336 |
+
border-color: var(--green);
|
| 337 |
+
background: rgba(0, 255, 136, 0.05);
|
| 338 |
+
}
|
| 339 |
+
|
| 340 |
+
.ar.fail {
|
| 341 |
+
border-color: var(--red);
|
| 342 |
+
background: rgba(255, 51, 68, 0.05);
|
| 343 |
+
}
|
| 344 |
+
|
| 345 |
+
.ar.retry {
|
| 346 |
+
border-color: var(--yellow);
|
| 347 |
+
background: rgba(255, 204, 0, 0.05);
|
| 348 |
+
animation: pulse-border 1.5s ease-in-out infinite;
|
| 349 |
+
}
|
| 350 |
+
|
| 351 |
+
@keyframes pulse-border {
|
| 352 |
+
50% {
|
| 353 |
+
border-color: rgba(255, 204, 0, 0.2);
|
| 354 |
+
}
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
.at {
|
| 358 |
+
display: flex;
|
| 359 |
+
align-items: center;
|
| 360 |
+
gap: 12px;
|
| 361 |
+
}
|
| 362 |
+
|
| 363 |
+
.an {
|
| 364 |
+
font-size: 10px;
|
| 365 |
+
font-weight: 700;
|
| 366 |
+
color: var(--muted);
|
| 367 |
+
min-width: 90px;
|
| 368 |
+
text-transform: uppercase;
|
| 369 |
+
letter-spacing: 0.1em;
|
| 370 |
+
}
|
| 371 |
+
|
| 372 |
+
.am {
|
| 373 |
+
font-size: 13px;
|
| 374 |
+
color: var(--t2);
|
| 375 |
+
font-weight: 500;
|
| 376 |
+
}
|
| 377 |
+
|
| 378 |
+
.ad {
|
| 379 |
+
font-size: 11px;
|
| 380 |
+
color: var(--muted);
|
| 381 |
+
margin-top: 4px;
|
| 382 |
+
padding-left: 102px;
|
| 383 |
+
white-space: pre-wrap;
|
| 384 |
+
line-height: 1.6;
|
| 385 |
+
max-height: 100px;
|
| 386 |
+
overflow-y: auto;
|
| 387 |
+
}
|
| 388 |
+
|
| 389 |
+
.ad .w {
|
| 390 |
+
color: var(--yellow);
|
| 391 |
+
font-weight: 600;
|
| 392 |
+
}
|
| 393 |
+
|
| 394 |
+
.ad .g {
|
| 395 |
+
color: var(--green);
|
| 396 |
+
font-weight: 600;
|
| 397 |
+
}
|
| 398 |
+
|
| 399 |
+
/* Horizontal Timeline */
|
| 400 |
+
.timeline {
|
| 401 |
+
display: flex;
|
| 402 |
+
justify-content: space-between;
|
| 403 |
+
padding: 16px 20px;
|
| 404 |
+
background: rgba(255, 255, 255, 0.02);
|
| 405 |
+
border-bottom: 1px solid var(--b1);
|
| 406 |
+
margin-bottom: 8px;
|
| 407 |
+
}
|
| 408 |
+
|
| 409 |
+
.node {
|
| 410 |
+
display: flex;
|
| 411 |
+
flex-direction: column;
|
| 412 |
+
align-items: center;
|
| 413 |
+
gap: 6px;
|
| 414 |
+
position: relative;
|
| 415 |
+
flex: 1;
|
| 416 |
+
}
|
| 417 |
+
|
| 418 |
+
.node::after {
|
| 419 |
+
content: '';
|
| 420 |
+
position: absolute;
|
| 421 |
+
top: 12px;
|
| 422 |
+
left: 50%;
|
| 423 |
+
width: 100%;
|
| 424 |
+
height: 2px;
|
| 425 |
+
background: var(--b1);
|
| 426 |
+
z-index: 0;
|
| 427 |
+
}
|
| 428 |
+
|
| 429 |
+
.node:last-child::after {
|
| 430 |
+
display: none;
|
| 431 |
+
}
|
| 432 |
+
|
| 433 |
+
.ni {
|
| 434 |
+
width: 24px;
|
| 435 |
+
height: 24px;
|
| 436 |
+
border-radius: 50%;
|
| 437 |
+
background: var(--s3);
|
| 438 |
+
border: 2px solid var(--b1);
|
| 439 |
+
display: flex;
|
| 440 |
+
align-items: center;
|
| 441 |
+
justify-content: center;
|
| 442 |
+
font-size: 12px;
|
| 443 |
+
z-index: 1;
|
| 444 |
+
transition: all 0.4s var(--spring);
|
| 445 |
+
}
|
| 446 |
+
|
| 447 |
+
.node.on .ni {
|
| 448 |
+
background: var(--cyan);
|
| 449 |
+
border-color: var(--cyan);
|
| 450 |
+
color: #000;
|
| 451 |
+
box-shadow: 0 0 15px var(--cyan);
|
| 452 |
+
}
|
| 453 |
+
|
| 454 |
+
.node.done .ni {
|
| 455 |
+
background: var(--green);
|
| 456 |
+
border-color: var(--green);
|
| 457 |
+
color: #000;
|
| 458 |
+
box-shadow: 0 0 15px var(--green);
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
.node.fail .ni {
|
| 462 |
+
background: var(--red);
|
| 463 |
+
border-color: var(--red);
|
| 464 |
+
color: #fff;
|
| 465 |
+
}
|
| 466 |
+
|
| 467 |
+
.node.retry .ni {
|
| 468 |
+
animation: pulse-node 1s var(--spring) infinite;
|
| 469 |
+
background: var(--yellow);
|
| 470 |
+
border-color: var(--yellow);
|
| 471 |
+
}
|
| 472 |
+
|
| 473 |
+
@keyframes pulse-node {
|
| 474 |
+
|
| 475 |
+
0%,
|
| 476 |
+
100% {
|
| 477 |
+
transform: scale(1);
|
| 478 |
+
}
|
| 479 |
+
|
| 480 |
+
50% {
|
| 481 |
+
transform: scale(1.2);
|
| 482 |
+
}
|
| 483 |
+
}
|
| 484 |
+
|
| 485 |
+
.nl {
|
| 486 |
+
font-size: 9px;
|
| 487 |
+
font-weight: 700;
|
| 488 |
+
color: var(--muted);
|
| 489 |
+
text-transform: uppercase;
|
| 490 |
+
letter-spacing: 0.05em;
|
| 491 |
+
}
|
| 492 |
+
|
| 493 |
+
.node.on .nl,
|
| 494 |
+
.node.done .nl {
|
| 495 |
+
color: var(--t3);
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
/* Tabs */
|
| 499 |
+
.tabs {
|
| 500 |
+
display: flex;
|
| 501 |
+
gap: 8px;
|
| 502 |
+
}
|
| 503 |
+
|
| 504 |
+
.tab {
|
| 505 |
+
background: var(--s2);
|
| 506 |
+
border: 1px solid var(--b1);
|
| 507 |
+
padding: 6px 16px;
|
| 508 |
+
border-radius: 8px;
|
| 509 |
+
font-family: var(--sans);
|
| 510 |
+
font-size: 12px;
|
| 511 |
+
font-weight: 600;
|
| 512 |
+
color: var(--muted);
|
| 513 |
+
cursor: pointer;
|
| 514 |
+
transition: all 0.2s var(--spring);
|
| 515 |
+
}
|
| 516 |
+
|
| 517 |
+
.tab:hover {
|
| 518 |
+
color: var(--t2);
|
| 519 |
+
background: var(--s3);
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
.tab.on {
|
| 523 |
+
color: var(--t3);
|
| 524 |
+
background: var(--red);
|
| 525 |
+
border-color: var(--red);
|
| 526 |
+
box-shadow: 0 0 10px var(--red-glow);
|
| 527 |
+
}
|
| 528 |
+
|
| 529 |
+
.tc {
|
| 530 |
+
display: none;
|
| 531 |
+
padding: 0;
|
| 532 |
+
animation: fadeIn 0.4s ease;
|
| 533 |
+
}
|
| 534 |
+
|
| 535 |
+
.tc.on {
|
| 536 |
+
display: block;
|
| 537 |
+
}
|
| 538 |
+
|
| 539 |
+
@keyframes fadeIn {
|
| 540 |
+
from {
|
| 541 |
+
opacity: 0;
|
| 542 |
+
transform: translateY(10px);
|
| 543 |
+
}
|
| 544 |
+
|
| 545 |
+
to {
|
| 546 |
+
opacity: 1;
|
| 547 |
+
transform: translateY(0);
|
| 548 |
+
}
|
| 549 |
+
}
|
| 550 |
+
|
| 551 |
+
/* Summary row */
|
| 552 |
+
.sum-row {
|
| 553 |
+
padding: 24px;
|
| 554 |
+
display: flex;
|
| 555 |
+
align-items: center;
|
| 556 |
+
gap: 32px;
|
| 557 |
+
flex-wrap: wrap;
|
| 558 |
+
border-bottom: 1px solid var(--b1);
|
| 559 |
+
background: rgba(0, 255, 136, 0.02);
|
| 560 |
+
}
|
| 561 |
+
|
| 562 |
+
.sum-big {
|
| 563 |
+
font-size: 32px;
|
| 564 |
+
font-weight: 800;
|
| 565 |
+
color: var(--green);
|
| 566 |
+
line-height: 1;
|
| 567 |
+
letter-spacing: -0.02em;
|
| 568 |
+
text-shadow: 0 0 20px var(--green-glow);
|
| 569 |
+
}
|
| 570 |
+
|
| 571 |
+
.sum-big .u {
|
| 572 |
+
font-size: 13px;
|
| 573 |
+
font-weight: 500;
|
| 574 |
+
color: var(--muted);
|
| 575 |
+
margin-left: 4px;
|
| 576 |
+
display: block;
|
| 577 |
+
margin-top: 4px;
|
| 578 |
+
letter-spacing: 0;
|
| 579 |
+
}
|
| 580 |
+
|
| 581 |
+
.sum-big .vic {
|
| 582 |
+
font-size: 11px;
|
| 583 |
+
color: var(--cyan);
|
| 584 |
+
font-weight: 600;
|
| 585 |
+
display: block;
|
| 586 |
+
margin-top: 8px;
|
| 587 |
+
text-shadow: none;
|
| 588 |
+
opacity: 0.8;
|
| 589 |
+
}
|
| 590 |
+
|
| 591 |
+
.sum-sep {
|
| 592 |
+
width: 1px;
|
| 593 |
+
height: 40px;
|
| 594 |
+
background: var(--b1);
|
| 595 |
+
}
|
| 596 |
+
|
| 597 |
+
.sum-chk {
|
| 598 |
+
display: flex;
|
| 599 |
+
align-items: center;
|
| 600 |
+
gap: 8px;
|
| 601 |
+
font-size: 12px;
|
| 602 |
+
color: var(--t2);
|
| 603 |
+
font-weight: 500;
|
| 604 |
+
}
|
| 605 |
+
|
| 606 |
+
.sum-dot {
|
| 607 |
+
width: 8px;
|
| 608 |
+
height: 8px;
|
| 609 |
+
border-radius: 50%;
|
| 610 |
+
flex-shrink: 0;
|
| 611 |
+
}
|
| 612 |
+
|
| 613 |
+
.sum-dot.ok {
|
| 614 |
+
background: var(--green);
|
| 615 |
+
box-shadow: 0 0 8px var(--green-glow);
|
| 616 |
+
}
|
| 617 |
+
|
| 618 |
+
.sum-dot.no {
|
| 619 |
+
background: var(--red);
|
| 620 |
+
box-shadow: 0 0 8px var(--red-glow);
|
| 621 |
+
}
|
| 622 |
+
|
| 623 |
+
.sum-dot.na {
|
| 624 |
+
background: var(--muted);
|
| 625 |
+
box-shadow: none;
|
| 626 |
+
}
|
| 627 |
+
|
| 628 |
+
.sum-type {
|
| 629 |
+
font-size: 11px;
|
| 630 |
+
color: var(--cyan);
|
| 631 |
+
text-transform: uppercase;
|
| 632 |
+
letter-spacing: 0.1em;
|
| 633 |
+
font-weight: 700;
|
| 634 |
+
padding: 4px 10px;
|
| 635 |
+
background: rgba(0, 217, 255, 0.1);
|
| 636 |
+
border-radius: 4px;
|
| 637 |
+
}
|
| 638 |
+
|
| 639 |
+
.sum-bar {
|
| 640 |
+
padding: 16px 24px;
|
| 641 |
+
display: flex;
|
| 642 |
+
align-items: center;
|
| 643 |
+
gap: 12px;
|
| 644 |
+
flex-wrap: wrap;
|
| 645 |
+
border-bottom: 1px solid var(--b1);
|
| 646 |
+
}
|
| 647 |
+
|
| 648 |
+
.bs {
|
| 649 |
+
font-family: var(--sans);
|
| 650 |
+
font-size: 11px;
|
| 651 |
+
font-weight: 700;
|
| 652 |
+
padding: 8px 16px;
|
| 653 |
+
border-radius: 8px;
|
| 654 |
+
border: 1px solid var(--b1);
|
| 655 |
+
background: var(--s2);
|
| 656 |
+
color: var(--t2);
|
| 657 |
+
cursor: pointer;
|
| 658 |
+
transition: all 0.2s var(--spring);
|
| 659 |
+
text-transform: uppercase;
|
| 660 |
+
letter-spacing: 0.05em;
|
| 661 |
+
}
|
| 662 |
+
|
| 663 |
+
.bs:hover {
|
| 664 |
+
border-color: var(--b2);
|
| 665 |
+
transform: translateY(-1px);
|
| 666 |
+
background: var(--s3);
|
| 667 |
+
}
|
| 668 |
+
|
| 669 |
+
.bs.r {
|
| 670 |
+
background: var(--bg);
|
| 671 |
+
border-color: var(--red);
|
| 672 |
+
color: var(--red);
|
| 673 |
+
}
|
| 674 |
+
|
| 675 |
+
.bs.r:hover {
|
| 676 |
+
background: var(--red);
|
| 677 |
+
color: #fff;
|
| 678 |
+
box-shadow: 0 4px 15px var(--red-glow);
|
| 679 |
+
}
|
| 680 |
+
|
| 681 |
+
.bs.gr {
|
| 682 |
+
background: var(--green);
|
| 683 |
+
border-color: var(--green);
|
| 684 |
+
color: #000;
|
| 685 |
+
}
|
| 686 |
+
|
| 687 |
+
.bs.gr:hover {
|
| 688 |
+
box-shadow: 0 4px 15px var(--green-glow);
|
| 689 |
+
transform: translateY(-2px);
|
| 690 |
+
}
|
| 691 |
+
|
| 692 |
+
.sp {
|
| 693 |
+
flex: 1;
|
| 694 |
+
}
|
| 695 |
+
|
| 696 |
+
/* Details tab */
|
| 697 |
+
.dm {
|
| 698 |
+
display: grid;
|
| 699 |
+
grid-template-columns: repeat(5, 1fr);
|
| 700 |
+
border-bottom: 1px solid var(--b1);
|
| 701 |
+
}
|
| 702 |
+
|
| 703 |
+
@media (max-width: 800px) {
|
| 704 |
+
.dm {
|
| 705 |
+
grid-template-columns: repeat(2, 1fr);
|
| 706 |
+
}
|
| 707 |
+
}
|
| 708 |
+
|
| 709 |
+
.di {
|
| 710 |
+
padding: 20px;
|
| 711 |
+
border-right: 1px solid var(--b1);
|
| 712 |
+
background: rgba(255, 255, 255, 0.01);
|
| 713 |
+
}
|
| 714 |
+
|
| 715 |
+
.di:last-child {
|
| 716 |
+
border-right: none;
|
| 717 |
+
}
|
| 718 |
+
|
| 719 |
+
.dl {
|
| 720 |
+
font-size: 10px;
|
| 721 |
+
color: var(--muted);
|
| 722 |
+
text-transform: uppercase;
|
| 723 |
+
letter-spacing: 0.1em;
|
| 724 |
+
margin-bottom: 8px;
|
| 725 |
+
font-weight: 700;
|
| 726 |
+
}
|
| 727 |
+
|
| 728 |
+
.dv {
|
| 729 |
+
font-size: 20px;
|
| 730 |
+
font-weight: 800;
|
| 731 |
+
line-height: 1;
|
| 732 |
+
margin-bottom: 4px;
|
| 733 |
+
color: var(--t3);
|
| 734 |
+
}
|
| 735 |
+
|
| 736 |
+
.dv.g {
|
| 737 |
+
color: var(--green);
|
| 738 |
+
}
|
| 739 |
+
|
| 740 |
+
.dv.c {
|
| 741 |
+
color: var(--cyan);
|
| 742 |
+
}
|
| 743 |
+
|
| 744 |
+
.dv.y {
|
| 745 |
+
color: var(--yellow);
|
| 746 |
+
}
|
| 747 |
+
|
| 748 |
+
.dv.t {
|
| 749 |
+
color: var(--t2);
|
| 750 |
+
font-size: 13px;
|
| 751 |
+
}
|
| 752 |
+
|
| 753 |
+
.ds {
|
| 754 |
+
font-size: 10px;
|
| 755 |
+
color: var(--muted);
|
| 756 |
+
line-height: 1.4;
|
| 757 |
+
}
|
| 758 |
+
|
| 759 |
+
/* Benchmark bars */
|
| 760 |
+
.bk {
|
| 761 |
+
padding: 24px;
|
| 762 |
+
border-bottom: 1px solid var(--b1);
|
| 763 |
+
}
|
| 764 |
+
|
| 765 |
+
.bk-t {
|
| 766 |
+
font-size: 11px;
|
| 767 |
+
color: var(--muted);
|
| 768 |
+
text-transform: uppercase;
|
| 769 |
+
letter-spacing: 0.1em;
|
| 770 |
+
margin-bottom: 16px;
|
| 771 |
+
font-weight: 700;
|
| 772 |
+
}
|
| 773 |
+
|
| 774 |
+
.br {
|
| 775 |
+
display: flex;
|
| 776 |
+
align-items: center;
|
| 777 |
+
gap: 16px;
|
| 778 |
+
margin-bottom: 12px;
|
| 779 |
+
}
|
| 780 |
+
|
| 781 |
+
.br:last-child {
|
| 782 |
+
margin-bottom: 0;
|
| 783 |
+
}
|
| 784 |
+
|
| 785 |
+
.bl {
|
| 786 |
+
font-size: 12px;
|
| 787 |
+
color: var(--t2);
|
| 788 |
+
width: 140px;
|
| 789 |
+
flex-shrink: 0;
|
| 790 |
+
font-weight: 500;
|
| 791 |
+
}
|
| 792 |
+
|
| 793 |
+
.bt {
|
| 794 |
+
flex: 1;
|
| 795 |
+
height: 8px;
|
| 796 |
+
background: var(--bg);
|
| 797 |
+
border-radius: 4px;
|
| 798 |
+
overflow: hidden;
|
| 799 |
+
border: 1px solid var(--b1);
|
| 800 |
+
}
|
| 801 |
+
|
| 802 |
+
.bf {
|
| 803 |
+
height: 100%;
|
| 804 |
+
border-radius: 4px;
|
| 805 |
+
transition: width 1s var(--spring);
|
| 806 |
+
width: 0;
|
| 807 |
+
}
|
| 808 |
+
|
| 809 |
+
.bf.bad {
|
| 810 |
+
background: linear-gradient(90deg, #ff334466, #ff3344);
|
| 811 |
+
box-shadow: 0 0 10px rgba(255, 51, 68, 0.3);
|
| 812 |
+
}
|
| 813 |
+
|
| 814 |
+
.bf.good {
|
| 815 |
+
background: linear-gradient(90deg, #00ff8866, #00ff88);
|
| 816 |
+
box-shadow: 0 0 10px rgba(0, 255, 136, 0.3);
|
| 817 |
+
}
|
| 818 |
+
|
| 819 |
+
.bv {
|
| 820 |
+
font-size: 12px;
|
| 821 |
+
font-weight: 700;
|
| 822 |
+
width: 40px;
|
| 823 |
+
text-align: right;
|
| 824 |
+
flex-shrink: 0;
|
| 825 |
+
}
|
| 826 |
+
|
| 827 |
+
.bv.bad {
|
| 828 |
+
color: var(--red);
|
| 829 |
+
}
|
| 830 |
+
|
| 831 |
+
.bv.good {
|
| 832 |
+
color: var(--green);
|
| 833 |
+
}
|
| 834 |
+
|
| 835 |
+
/* Simple mode note */
|
| 836 |
+
.sn {
|
| 837 |
+
padding: 20px;
|
| 838 |
+
border: 1px solid var(--cyan);
|
| 839 |
+
border-radius: 12px;
|
| 840 |
+
background: rgba(0, 217, 255, 0.05);
|
| 841 |
+
margin: 24px;
|
| 842 |
+
font-size: 13px;
|
| 843 |
+
color: var(--t2);
|
| 844 |
+
line-height: 1.6;
|
| 845 |
+
border-left-width: 4px;
|
| 846 |
+
}
|
| 847 |
+
|
| 848 |
+
/* Diff */
|
| 849 |
+
.dg {
|
| 850 |
+
display: grid;
|
| 851 |
+
grid-template-columns: 1fr 1fr;
|
| 852 |
+
background: var(--bg);
|
| 853 |
+
}
|
| 854 |
+
|
| 855 |
+
@media (max-width: 780px) {
|
| 856 |
+
.dg {
|
| 857 |
+
grid-template-columns: 1fr;
|
| 858 |
+
}
|
| 859 |
+
|
| 860 |
+
.dfs:first-child {
|
| 861 |
+
border-right: none !important;
|
| 862 |
+
border-bottom: 1px solid var(--b1);
|
| 863 |
+
}
|
| 864 |
+
}
|
| 865 |
+
|
| 866 |
+
.dfs:first-child {
|
| 867 |
+
border-right: 1px solid var(--b1);
|
| 868 |
+
}
|
| 869 |
+
|
| 870 |
+
.dfh {
|
| 871 |
+
padding: 10px 16px;
|
| 872 |
+
border-bottom: 1px solid var(--b1);
|
| 873 |
+
font-size: 11px;
|
| 874 |
+
color: var(--muted);
|
| 875 |
+
display: flex;
|
| 876 |
+
align-items: center;
|
| 877 |
+
gap: 8px;
|
| 878 |
+
font-weight: 600;
|
| 879 |
+
background: var(--s2);
|
| 880 |
+
}
|
| 881 |
+
|
| 882 |
+
.dft {
|
| 883 |
+
font-size: 9px;
|
| 884 |
+
font-weight: 800;
|
| 885 |
+
padding: 2px 6px;
|
| 886 |
+
border-radius: 4px;
|
| 887 |
+
text-transform: uppercase;
|
| 888 |
+
}
|
| 889 |
+
|
| 890 |
+
.dft.cu {
|
| 891 |
+
background: rgba(255, 51, 68, 0.2);
|
| 892 |
+
color: var(--red);
|
| 893 |
+
}
|
| 894 |
+
|
| 895 |
+
.dft.ro {
|
| 896 |
+
background: rgba(0, 255, 136, 0.2);
|
| 897 |
+
color: var(--green);
|
| 898 |
+
}
|
| 899 |
+
|
| 900 |
+
.dfp {
|
| 901 |
+
padding: 20px;
|
| 902 |
+
font-family: var(--mono);
|
| 903 |
+
font-size: 12px;
|
| 904 |
+
line-height: 1.7;
|
| 905 |
+
overflow: auto;
|
| 906 |
+
max-height: 500px;
|
| 907 |
+
white-space: pre;
|
| 908 |
+
color: var(--t2);
|
| 909 |
+
}
|
| 910 |
+
|
| 911 |
+
.dlo {
|
| 912 |
+
background: rgba(255, 51, 68, 0.1);
|
| 913 |
+
color: var(--red);
|
| 914 |
+
text-decoration: line-through;
|
| 915 |
+
display: block;
|
| 916 |
+
width: 100%;
|
| 917 |
+
}
|
| 918 |
+
|
| 919 |
+
.dln {
|
| 920 |
+
background: rgba(0, 255, 136, 0.1);
|
| 921 |
+
color: var(--green);
|
| 922 |
+
display: block;
|
| 923 |
+
width: 100%;
|
| 924 |
+
}
|
| 925 |
+
|
| 926 |
+
/* Loading Skeleton */
|
| 927 |
+
.skeleton {
|
| 928 |
+
position: relative;
|
| 929 |
+
overflow: hidden;
|
| 930 |
+
background: var(--s2);
|
| 931 |
+
border-radius: 12px;
|
| 932 |
+
height: 200px;
|
| 933 |
+
margin-top: 24px;
|
| 934 |
+
}
|
| 935 |
+
|
| 936 |
+
.skeleton::after {
|
| 937 |
+
content: '';
|
| 938 |
+
position: absolute;
|
| 939 |
+
inset: 0;
|
| 940 |
+
transform: translateX(-100%);
|
| 941 |
+
background: linear-gradient(90deg, transparent, rgba(255, 255, 255, 0.05), transparent);
|
| 942 |
+
animation: shimmer 1.5s infinite;
|
| 943 |
+
}
|
| 944 |
+
|
| 945 |
+
@keyframes shimmer {
|
| 946 |
+
100% {
|
| 947 |
+
transform: translateX(100%);
|
| 948 |
+
}
|
| 949 |
+
}
|
| 950 |
+
|
| 951 |
+
/* Custom Cursor */
|
| 952 |
+
#cursor {
|
| 953 |
+
position: fixed;
|
| 954 |
+
width: 20px;
|
| 955 |
+
height: 20px;
|
| 956 |
+
background: rgba(255, 255, 255, 0.2);
|
| 957 |
+
border: 1px solid rgba(255, 255, 255, 0.4);
|
| 958 |
+
border-radius: 50%;
|
| 959 |
+
pointer-events: none;
|
| 960 |
+
z-index: 9999;
|
| 961 |
+
transition: transform 0.1s ease, width 0.3s var(--spring), height 0.3s var(--spring), background 0.3s ease;
|
| 962 |
+
mix-blend-mode: difference;
|
| 963 |
+
}
|
| 964 |
+
|
| 965 |
+
#cursor.active {
|
| 966 |
+
transform: scale(3);
|
| 967 |
+
background: rgba(255, 51, 68, 0.3);
|
| 968 |
+
border-color: var(--red);
|
| 969 |
+
}
|
| 970 |
+
|
| 971 |
+
/* Modal */
|
| 972 |
+
.mo {
|
| 973 |
+
display: none;
|
| 974 |
+
position: fixed;
|
| 975 |
+
inset: 0;
|
| 976 |
+
background: rgba(0, 0, 0, 0.85);
|
| 977 |
+
z-index: 1000;
|
| 978 |
+
place-items: center;
|
| 979 |
+
backdrop-filter: blur(8px);
|
| 980 |
+
}
|
| 981 |
+
|
| 982 |
+
.mo.open {
|
| 983 |
+
display: grid;
|
| 984 |
+
}
|
| 985 |
+
|
| 986 |
+
.mb {
|
| 987 |
+
background: var(--s1);
|
| 988 |
+
border: 1px solid var(--b1);
|
| 989 |
+
border-radius: 16px;
|
| 990 |
+
width: 90%;
|
| 991 |
+
max-width: 800px;
|
| 992 |
+
max-height: 90vh;
|
| 993 |
+
overflow: hidden;
|
| 994 |
+
box-shadow: 0 20px 50px rgba(0, 0, 0, 0.6);
|
| 995 |
+
}
|
| 996 |
+
|
| 997 |
+
.mt {
|
| 998 |
+
padding: 16px 24px;
|
| 999 |
+
border-bottom: 1px solid var(--b1);
|
| 1000 |
+
display: flex;
|
| 1001 |
+
justify-content: space-between;
|
| 1002 |
+
align-items: center;
|
| 1003 |
+
background: var(--s2);
|
| 1004 |
+
}
|
| 1005 |
+
|
| 1006 |
+
.mt h3 {
|
| 1007 |
+
font-size: 16px;
|
| 1008 |
+
color: var(--t3);
|
| 1009 |
+
font-weight: 700;
|
| 1010 |
+
}
|
| 1011 |
+
|
| 1012 |
+
.mx {
|
| 1013 |
+
background: none;
|
| 1014 |
+
border: none;
|
| 1015 |
+
color: var(--muted);
|
| 1016 |
+
font-size: 24px;
|
| 1017 |
+
cursor: pointer !important;
|
| 1018 |
+
line-height: 1;
|
| 1019 |
+
transition: color 0.2s;
|
| 1020 |
+
}
|
| 1021 |
+
|
| 1022 |
+
.mx:hover {
|
| 1023 |
+
color: var(--t3);
|
| 1024 |
+
}
|
| 1025 |
+
|
| 1026 |
+
.mc {
|
| 1027 |
+
padding: 24px;
|
| 1028 |
+
}
|
| 1029 |
+
|
| 1030 |
+
.mc textarea {
|
| 1031 |
+
width: 100%;
|
| 1032 |
+
height: 400px;
|
| 1033 |
+
background: var(--bg);
|
| 1034 |
+
border: 1px solid var(--b1);
|
| 1035 |
+
border-radius: 8px;
|
| 1036 |
+
padding: 16px;
|
| 1037 |
+
color: var(--cyan);
|
| 1038 |
+
font-family: var(--mono);
|
| 1039 |
+
font-size: 12px;
|
| 1040 |
+
line-height: 1.6;
|
| 1041 |
+
resize: vertical;
|
| 1042 |
+
outline: none;
|
| 1043 |
+
}
|
| 1044 |
+
|
| 1045 |
+
.mc textarea:focus {
|
| 1046 |
+
border-color: var(--cyan);
|
| 1047 |
+
box-shadow: 0 0 10px rgba(0, 217, 255, 0.2);
|
| 1048 |
+
}
|
| 1049 |
+
|
| 1050 |
+
.mf {
|
| 1051 |
+
padding: 16px 24px;
|
| 1052 |
+
border-top: 1px solid var(--b1);
|
| 1053 |
+
display: flex;
|
| 1054 |
+
justify-content: flex-end;
|
| 1055 |
+
gap: 12px;
|
| 1056 |
+
background: var(--s2);
|
| 1057 |
+
}
|
| 1058 |
+
|
| 1059 |
+
::-webkit-scrollbar {
|
| 1060 |
+
width: 6px;
|
| 1061 |
+
height: 6px;
|
| 1062 |
+
}
|
| 1063 |
+
|
| 1064 |
+
::-webkit-scrollbar-track {
|
| 1065 |
+
background: transparent;
|
| 1066 |
+
}
|
| 1067 |
+
|
| 1068 |
+
::-webkit-scrollbar-thumb {
|
| 1069 |
+
background: var(--b1);
|
| 1070 |
+
border-radius: 10px;
|
| 1071 |
+
}
|
| 1072 |
+
|
| 1073 |
+
::-webkit-scrollbar-thumb:hover {
|
| 1074 |
+
background: var(--b2);
|
| 1075 |
+
}
|
| 1076 |
+
|
| 1077 |
+
footer {
|
| 1078 |
+
padding: 32px 0;
|
| 1079 |
+
border-top: 1px solid var(--b1);
|
| 1080 |
+
display: flex;
|
| 1081 |
+
justify-content: space-between;
|
| 1082 |
+
font-size: 11px;
|
| 1083 |
+
color: var(--muted);
|
| 1084 |
+
font-weight: 500;
|
| 1085 |
+
}
|
| 1086 |
+
|
| 1087 |
+
footer a {
|
| 1088 |
+
color: var(--muted);
|
| 1089 |
+
text-decoration: none;
|
| 1090 |
+
transition: color 0.2s;
|
| 1091 |
+
border-bottom: 1px solid transparent;
|
| 1092 |
+
}
|
| 1093 |
+
|
| 1094 |
+
footer a:hover {
|
| 1095 |
+
color: var(--t2);
|
| 1096 |
+
border-bottom-color: var(--muted);
|
| 1097 |
+
}
|
| 1098 |
+
|
| 1099 |
+
.idle {
|
| 1100 |
+
flex: 1;
|
| 1101 |
+
display: flex;
|
| 1102 |
+
align-items: center;
|
| 1103 |
+
justify-content: center;
|
| 1104 |
+
color: var(--b2);
|
| 1105 |
+
font-size: 13px;
|
| 1106 |
+
font-weight: 500;
|
| 1107 |
+
min-height: 100px;
|
| 1108 |
+
}
|
| 1109 |
+
</style>
|
| 1110 |
</head>
|
| 1111 |
<div id="cursor"></div>
|
| 1112 |
|
|
|
|
| 1115 |
<div class="logo">ROCmPort <em>AI</em></div>
|
| 1116 |
<div class="hr">
|
| 1117 |
<div class="hd on" id="hdot"></div>
|
| 1118 |
+
<span id="hstat">Ready</span>
|
| 1119 |
</div>
|
| 1120 |
</header>
|
| 1121 |
|
| 1122 |
<div class="g">
|
| 1123 |
<div class="p">
|
| 1124 |
+
<div class="ph">
|
| 1125 |
+
<div><b>//</b> CUDA source</div>
|
| 1126 |
+
<div id="lc">0 lines</div>
|
| 1127 |
+
</div>
|
| 1128 |
<textarea class="code" id="inp" spellcheck="false" placeholder="// Paste CUDA code here
|
| 1129 |
// or pick a demo below
|
| 1130 |
|
|
|
|
| 1143 |
</div>
|
| 1144 |
|
| 1145 |
<div class="p">
|
| 1146 |
+
<div class="ph">
|
| 1147 |
+
<div><b>//</b> Pipeline</div>
|
| 1148 |
+
<div id="pt">0.0s</div>
|
| 1149 |
+
</div>
|
| 1150 |
<div class="timeline" id="tl">
|
| 1151 |
<!-- Nodes injected by JS -->
|
| 1152 |
</div>
|
|
|
|
| 1176 |
|
| 1177 |
<footer>
|
| 1178 |
<div>ROCmPort AI — AMD Developer Hackathon 2025</div>
|
| 1179 |
+
<div><a href="https://x.com/TazwarEnan" target="_blank">Tazwar Ahnaf Enan</a> · <a
|
| 1180 |
+
href="https://github.com/tazwaryayyyy" target="_blank">GitHub</a></div>
|
| 1181 |
</footer>
|
| 1182 |
</div>
|
| 1183 |
|
| 1184 |
<div class="mo" id="modal">
|
| 1185 |
<div class="mb">
|
| 1186 |
+
<div class="mt">
|
| 1187 |
+
<h3>Edit ROCm code</h3><button class="mx" onclick="cm()">×</button>
|
| 1188 |
+
</div>
|
| 1189 |
<div class="mc"><textarea id="edt"></textarea></div>
|
| 1190 |
+
<div class="mf"><button class="bs" onclick="cm()">Cancel</button><button class="bs r"
|
| 1191 |
+
onclick="rec()">Re-test</button></div>
|
| 1192 |
</div>
|
| 1193 |
</div>
|
| 1194 |
<script>
|
| 1195 |
+
const API = 'http://localhost:8000';
|
| 1196 |
+
const S = { code: '', kn: 'custom', run: false, t0: null, iv: null, rep: null, tl: [], kernels: {} };
|
| 1197 |
+
const AG = {
|
| 1198 |
+
analyzer: { n: 'ANALYZER', i: '🔍' },
|
| 1199 |
+
translator: { n: 'TRANSLATOR', i: '🔄' },
|
| 1200 |
+
optimizer: { n: 'OPTIMIZER', i: '⚡' },
|
| 1201 |
+
tester: { n: 'TESTER', i: '🧪' },
|
| 1202 |
+
coordinator: { n: 'COORDINATOR', i: '📋' }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1203 |
};
|
| 1204 |
+
|
| 1205 |
+
// Custom Cursor Logic
|
| 1206 |
+
const cur = document.getElementById('cursor');
|
| 1207 |
+
document.addEventListener('mousemove', (e) => {
|
| 1208 |
+
cur.style.left = e.clientX + 'px';
|
| 1209 |
+
cur.style.top = e.clientY + 'px';
|
| 1210 |
+
const target = e.target;
|
| 1211 |
+
const isClickable = target.onclick ||
|
| 1212 |
+
target.tagName === 'BUTTON' ||
|
| 1213 |
+
target.tagName === 'A' ||
|
| 1214 |
+
target.tagName === 'TEXTAREA' ||
|
| 1215 |
+
target.classList.contains('ch') ||
|
| 1216 |
+
target.classList.contains('tab');
|
| 1217 |
+
|
| 1218 |
+
if (isClickable) {
|
| 1219 |
+
cur.classList.add('active');
|
| 1220 |
+
if (target.id === 'go') cur.style.background = 'rgba(255, 51, 68, 0.5)';
|
| 1221 |
+
else cur.style.background = 'rgba(255, 255, 255, 0.3)';
|
| 1222 |
+
} else {
|
| 1223 |
+
cur.classList.remove('active');
|
| 1224 |
+
cur.style.background = 'rgba(255, 255, 255, 0.2)';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1225 |
}
|
| 1226 |
+
});
|
| 1227 |
+
|
| 1228 |
+
async function init() {
|
| 1229 |
+
const ta = document.getElementById('inp');
|
| 1230 |
+
ta.oninput = () => {
|
| 1231 |
+
document.getElementById('lc').textContent = ta.value.split('\n').length + ' lines';
|
| 1232 |
+
S.code = ta.value;
|
| 1233 |
+
};
|
| 1234 |
+
try {
|
| 1235 |
+
const r = await fetch(API + '/demo-kernels');
|
| 1236 |
+
S.kernels = await r.json();
|
| 1237 |
+
} catch (e) { S.kernels = FB; }
|
| 1238 |
}
|
| 1239 |
+
|
| 1240 |
+
function lk(n, btn) {
|
| 1241 |
+
document.querySelectorAll('.ch').forEach(c => c.classList.remove('on'));
|
| 1242 |
+
btn.classList.add('on');
|
| 1243 |
+
const code = S.kernels[n] || FB[n] || '', ta = document.getElementById('inp');
|
| 1244 |
+
ta.value = code; S.code = code; S.kn = n;
|
| 1245 |
+
document.getElementById('lc').textContent = code.split('\n').length + ' lines';
|
| 1246 |
+
}
|
| 1247 |
+
|
| 1248 |
+
function stab(id, btn) {
|
| 1249 |
+
document.querySelectorAll('.tab').forEach(t => t.classList.remove('on'));
|
| 1250 |
+
document.querySelectorAll('.tc').forEach(t => t.classList.remove('on'));
|
| 1251 |
+
btn.classList.add('on');
|
| 1252 |
+
document.getElementById('t-' + id).classList.add('on');
|
| 1253 |
+
if (id === 'diff' && S.rep) rDiff(S.code, S.rep.optimized_code);
|
| 1254 |
+
}
|
| 1255 |
+
|
| 1256 |
+
async function go() {
|
| 1257 |
+
if (S.run) return;
|
| 1258 |
+
const code = document.getElementById('inp').value.trim();
|
| 1259 |
+
if (!code) return;
|
| 1260 |
+
|
| 1261 |
+
S.code = code; S.run = true; S.t0 = Date.now(); S.tl = [];
|
| 1262 |
+
const btn = document.getElementById('go');
|
| 1263 |
+
btn.disabled = true;
|
| 1264 |
+
btn.textContent = 'Running pipeline...';
|
| 1265 |
+
|
| 1266 |
+
document.getElementById('hstat').textContent = 'Pipeline running...';
|
| 1267 |
+
document.getElementById('rp').classList.add('hide');
|
| 1268 |
+
|
| 1269 |
+
bLog();
|
| 1270 |
+
sTimer();
|
| 1271 |
+
|
| 1272 |
+
try {
|
| 1273 |
+
const simpleModeCheckbox = document.getElementById('sm');
|
| 1274 |
+
const res = await fetch(API + '/port', {
|
| 1275 |
+
method: 'POST',
|
| 1276 |
+
headers: { 'Content-Type': 'application/json' },
|
| 1277 |
+
body: JSON.stringify({
|
| 1278 |
+
cuda_code: code,
|
| 1279 |
+
kernel_name: S.kn,
|
| 1280 |
+
simple_mode: simpleModeCheckbox ? simpleModeCheckbox.checked : false
|
| 1281 |
+
})
|
| 1282 |
});
|
| 1283 |
+
|
| 1284 |
+
// Show results panel with loader immediately
|
| 1285 |
+
document.getElementById('rp').classList.remove('hide');
|
| 1286 |
+
document.getElementById('t-loader').classList.remove('hide');
|
| 1287 |
+
document.getElementById('t-sum').classList.remove('on');
|
| 1288 |
+
document.getElementById('t-diff').classList.remove('on');
|
| 1289 |
+
document.getElementById('t-det').classList.remove('on');
|
| 1290 |
+
|
| 1291 |
+
const rd = res.body.getReader(), dc = new TextDecoder();
|
| 1292 |
+
let buf = '';
|
| 1293 |
+
while (true) {
|
| 1294 |
+
const { done, value } = await rd.read();
|
| 1295 |
+
if (done) break;
|
| 1296 |
+
buf += dc.decode(value, { stream: true });
|
| 1297 |
+
const lines = buf.split('\n');
|
| 1298 |
+
buf = lines.pop();
|
| 1299 |
+
for (const ln of lines) {
|
| 1300 |
+
if (!ln.startsWith('data: ')) continue;
|
| 1301 |
+
const raw = ln.slice(6).trim();
|
| 1302 |
+
if (raw === '[DONE]') { done_(); break; }
|
| 1303 |
+
try { hEvt(JSON.parse(raw)); } catch (e) { console.error('Parse error:', e); }
|
| 1304 |
+
}
|
| 1305 |
+
}
|
| 1306 |
+
} catch (e) {
|
| 1307 |
+
document.getElementById('hstat').textContent = 'Pipeline error';
|
| 1308 |
+
document.getElementById('t-loader').classList.add('hide'); // Hide loader on error
|
| 1309 |
+
console.error(e);
|
| 1310 |
+
} finally {
|
| 1311 |
+
xTimer();
|
| 1312 |
+
S.run = false;
|
| 1313 |
+
btn.disabled = false;
|
| 1314 |
+
btn.textContent = 'Port to ROCm';
|
| 1315 |
+
document.getElementById('t-loader').classList.add('hide');
|
| 1316 |
}
|
| 1317 |
}
|
| 1318 |
+
|
| 1319 |
+
function hEvt(ev) {
|
| 1320 |
+
uLog(ev.agent, ev.status, ev.message, ev.detail);
|
| 1321 |
+
if (ev.agent === 'tester' && (ev.status === 'done' || ev.status === 'failed')) {
|
| 1322 |
+
const m = ev.message.match(/([\d.]+)x/);
|
| 1323 |
+
if (m) {
|
| 1324 |
+
const sp = parseFloat(m[1]), ok = sp >= 1, im = ev.message.match(/Iteration (\d+)/i);
|
| 1325 |
+
S.tl.push({
|
| 1326 |
+
label: 'Iteration ' + (im ? im[1] : S.tl.length + 1) + (ok ? ' (optimized)' : ' (baseline)'),
|
| 1327 |
+
speedup: sp,
|
| 1328 |
+
good: ok
|
| 1329 |
+
});
|
| 1330 |
+
}
|
| 1331 |
+
}
|
| 1332 |
+
if (ev.agent === 'coordinator' && ev.status === 'done' && ev.detail) {
|
| 1333 |
+
try {
|
| 1334 |
+
const r = JSON.parse(ev.detail);
|
| 1335 |
+
S.rep = r;
|
| 1336 |
+
rRes(r, S.tl);
|
| 1337 |
+
} catch (e) { console.error('Coordinator detail parse error:', e); }
|
| 1338 |
+
}
|
| 1339 |
}
|
|
|
|
| 1340 |
|
| 1341 |
+
function done_() {
|
| 1342 |
+
document.getElementById('hstat').textContent = 'Pipeline complete';
|
| 1343 |
+
document.getElementById('t-loader').classList.add('hide');
|
| 1344 |
+
if (!S.rep) {
|
| 1345 |
+
document.getElementById('t-sum').innerHTML = '<div class="idle">Migration finished but no report was generated. Check agent logs for details.</div>';
|
| 1346 |
+
document.getElementById('t-sum').classList.add('on');
|
| 1347 |
+
}
|
| 1348 |
}
|
| 1349 |
+
|
| 1350 |
+
function bLog() {
|
| 1351 |
+
const el = document.getElementById('al');
|
| 1352 |
+
const tl = document.getElementById('tl');
|
| 1353 |
+
el.innerHTML = '';
|
| 1354 |
+
tl.innerHTML = '';
|
| 1355 |
+
|
| 1356 |
+
let i = 0;
|
| 1357 |
+
for (const [k, obj] of Object.entries(AG)) {
|
| 1358 |
+
// Log row
|
| 1359 |
+
const d = document.createElement('div');
|
| 1360 |
+
d.className = 'ar';
|
| 1361 |
+
d.id = 'ar-' + k;
|
| 1362 |
+
d.style.animationDelay = (i * 0.1) + 's';
|
| 1363 |
+
d.innerHTML = `
|
|
|
|
| 1364 |
<div class="at">
|
| 1365 |
<span class="an">${obj.n}</span>
|
| 1366 |
<span class="am" id="am-${k}">Waiting</span>
|
| 1367 |
</div>
|
| 1368 |
<div class="ad" id="ad-${k}"></div>`;
|
| 1369 |
+
el.appendChild(d);
|
| 1370 |
+
|
| 1371 |
+
// Timeline node
|
| 1372 |
+
const n = document.createElement('div');
|
| 1373 |
+
n.className = 'node';
|
| 1374 |
+
n.id = 'nd-' + k;
|
| 1375 |
+
n.title = obj.n;
|
| 1376 |
+
n.innerHTML = `<div class="ni">${obj.i}</div><div class="nl">${obj.n.slice(0, 3)}</div>`;
|
| 1377 |
+
tl.appendChild(n);
|
| 1378 |
+
i++;
|
| 1379 |
+
}
|
| 1380 |
}
|
| 1381 |
+
|
| 1382 |
+
function uLog(a, s, m, d) {
|
| 1383 |
+
const row = document.getElementById('ar-' + a);
|
| 1384 |
+
const node = document.getElementById('nd-' + a);
|
| 1385 |
+
if (!row || !node) return;
|
| 1386 |
+
|
| 1387 |
+
const statusClass = { running: 'run', done: 'done', failed: 'fail', retrying: 'retry' }[s] || '';
|
| 1388 |
+
row.className = 'ar ' + statusClass;
|
| 1389 |
+
node.className = 'node ' + (s === 'running' ? 'on' : s === 'retrying' ? 'retry' : s === 'done' ? 'done' : s === 'failed' ? 'fail' : '');
|
| 1390 |
+
|
| 1391 |
+
const me = document.getElementById('am-' + a);
|
| 1392 |
+
if (me) me.textContent = m;
|
| 1393 |
+
|
| 1394 |
+
// Node tooltip message update
|
| 1395 |
+
node.title = m;
|
| 1396 |
+
|
| 1397 |
+
const de = document.getElementById('ad-' + a);
|
| 1398 |
+
if (de && d) {
|
| 1399 |
+
de.innerHTML = esc(d)
|
| 1400 |
+
.replace(/\u26a0\ufe0f([^\n]*)/g, '<span class="w">⚠️ $1</span>')
|
| 1401 |
+
.replace(/\u2705([^\n]*)/g, '<span class="g">✅ $1</span>');
|
| 1402 |
+
de.scrollTop = de.scrollHeight;
|
| 1403 |
+
}
|
| 1404 |
}
|
|
|
|
| 1405 |
|
| 1406 |
+
function rRes(r, tl) {
|
| 1407 |
+
// Hide loader, show summary
|
| 1408 |
+
document.getElementById('t-loader').classList.add('hide');
|
| 1409 |
+
document.getElementById('t-sum').classList.add('on');
|
| 1410 |
+
|
| 1411 |
+
const v = r.verification || {}, bw = r.bandwidth_utilized;
|
| 1412 |
+
const dot = ok => `<div class="sum-dot ${ok === true ? 'ok' : ok === false ? 'no' : 'na'}"></div>`;
|
| 1413 |
|
| 1414 |
+
document.getElementById('t-sum').innerHTML = `
|
| 1415 |
<div class="sum-row">
|
| 1416 |
<div class="sum-big">
|
| 1417 |
${r.speedup}x
|
| 1418 |
<span class="u">vs baseline hipify</span>
|
| 1419 |
+
<span class="vic">Measured against declared baseline.</span>
|
| 1420 |
</div>
|
| 1421 |
<div class="sum-sep"></div>
|
| 1422 |
<div>
|
|
|
|
| 1438 |
${r.simplified_explanation ? esc(r.simplified_explanation) : '<em>Simplified explanation will appear here</em>'}
|
| 1439 |
</div>`;
|
| 1440 |
|
| 1441 |
+
// Details tab
|
| 1442 |
+
let dh = `<div class="dm">
|
| 1443 |
<div class="di"><div class="dl">Speedup</div><div class="dv g">${r.speedup}x</div><div class="ds">optimized ROCm vs straight hipify output</div></div>
|
| 1444 |
<div class="di"><div class="dl">Bandwidth</div><div class="dv c">${bw != null ? bw.toFixed(1) : '—'}%</div><div class="ds">of MI300X 5.3 TB/s HBM3</div></div>
|
| 1445 |
<div class="di"><div class="dl">Changes</div><div class="dv y">${r.total_changes}</div><div class="ds">hipify + LLM + optimizer changes</div></div>
|
| 1446 |
<div class="di"><div class="dl">Iterations</div><div class="dv c">${r.iterations || 1}</div><div class="ds">optimizer retry loop count</div></div>
|
| 1447 |
<div class="di"><div class="dl">Type</div><div class="dv t">${(r.bottleneck || '—').toUpperCase()}</div><div class="ds">workload classification</div></div>
|
| 1448 |
</div>`;
|
| 1449 |
+
|
| 1450 |
+
if (tl.length) {
|
| 1451 |
+
dh += '<div class="bk"><div class="bk-t">Benchmark iterations (optimized vs baseline hipify)</div>';
|
| 1452 |
+
tl.forEach(d => {
|
| 1453 |
+
const pct = Math.min(Math.max((d.speedup / 2) * 100, 3), 95);
|
| 1454 |
+
dh += `<div class="br">
|
| 1455 |
<div class="bl">${esc(d.label)}</div>
|
| 1456 |
<div class="bt"><div class="bf ${d.good ? 'good' : 'bad'}" style="width: 0" data-w="${pct}%"></div></div>
|
| 1457 |
<div class="bv ${d.good ? 'good' : 'bad'}">${d.speedup}x</div>
|
| 1458 |
</div>`;
|
| 1459 |
+
});
|
| 1460 |
+
dh += '</div>';
|
| 1461 |
+
}
|
| 1462 |
+
|
| 1463 |
+
document.getElementById('t-det').innerHTML = dh;
|
| 1464 |
+
tsm(); // Ensure simple note visibility matches current toggle state
|
| 1465 |
+
|
| 1466 |
+
// Progress bar animation
|
| 1467 |
+
setTimeout(() => {
|
| 1468 |
+
document.querySelectorAll('.bf[data-w]').forEach(b => {
|
| 1469 |
+
b.style.width = b.dataset.w;
|
| 1470 |
+
});
|
| 1471 |
+
}, 100);
|
| 1472 |
}
|
| 1473 |
+
|
| 1474 |
+
function rDiff(o, n) {
|
| 1475 |
+
if (!o || !n) return;
|
| 1476 |
+
const oe = document.getElementById('d-o'), ne = document.getElementById('d-n');
|
| 1477 |
+
if (oe && oe.innerHTML && ne && ne.innerHTML) return; // Already rendered
|
| 1478 |
+
|
| 1479 |
+
document.getElementById('t-diff').innerHTML = `<div class="dg">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1480 |
<div class="dfs"><div class="dfh"><span class="dft cu">CUDA</span> Original Source</div><pre class="dfp" id="d-o"></pre></div>
|
| 1481 |
<div class="dfs"><div class="dfh"><span class="dft ro">ROCm</span> Optimized HIP</div><pre class="dfp" id="d-n"></pre></div>
|
| 1482 |
</div>`;
|
| 1483 |
+
|
| 1484 |
+
const oL = o.split('\n'), nL = n.split('\n'), mx = Math.max(oL.length, nL.length);
|
| 1485 |
+
let oH = '', nH = '';
|
| 1486 |
+
for (let i = 0; i < mx; i++) {
|
| 1487 |
+
const a = oL[i] ?? '', b = nL[i] ?? '', c = a !== b;
|
| 1488 |
+
oH += `<span class="${c ? 'dlo' : ''}">${esc(a)}\n</span>`;
|
| 1489 |
+
nH += `<span class="${c ? 'dln' : ''}">${esc(b)}\n</span>`;
|
| 1490 |
+
}
|
| 1491 |
+
document.getElementById('d-o').innerHTML = oH;
|
| 1492 |
+
document.getElementById('d-n').innerHTML = nH;
|
| 1493 |
}
|
| 1494 |
+
|
| 1495 |
+
function sTimer() { S.iv = setInterval(() => { document.getElementById('pt').textContent = ((Date.now() - S.t0) / 1000).toFixed(1) + 's' }, 100) }
|
| 1496 |
+
function xTimer() { clearInterval(S.iv) }
|
| 1497 |
+
|
| 1498 |
+
function dlR() {
|
| 1499 |
+
const r = S.rep; if (!r) return;
|
| 1500 |
+
const md = `# ROCmPort AI — Migration Report\n\n## Results\n- **Speedup**: ${r.speedup}x\n- **Bandwidth**: ${r.bandwidth_utilized ? r.bandwidth_utilized.toFixed(1) : '—'}%\n- **Changes**: ${r.total_changes}\n- **Iterations**: ${r.iterations}\n- **Type**: ${r.bottleneck}\n\n${r.amd_advantage_explanation ? '> ' + r.amd_advantage_explanation + '\n\n' : ''}${r.cost_estimate ? '## Cost Impact\n- Manual: ' + r.cost_estimate.manual_porting_weeks + '\n- ROCmPort: ' + r.cost_estimate.rocmport_minutes + '\n- Savings: ' + r.cost_estimate.estimated_savings + '\n\n' : ''}## ROCm/HIP Code\n\`\`\`cpp\n${r.optimized_code || ''}\n\`\`\`\n\n---\n*Generated by ROCmPort AI*\n`;
|
| 1501 |
+
const a = document.createElement('a'); a.href = URL.createObjectURL(new Blob([md], { type: 'text/markdown' })); a.download = 'rocmport-migration-report.md'; a.click();
|
| 1502 |
+
}
|
| 1503 |
+
|
| 1504 |
+
function om() { if (!S.rep) return alert('No results yet!'); document.getElementById('edt').value = S.rep?.optimized_code || ''; document.getElementById('modal').classList.add('open') }
|
| 1505 |
+
function cm() { document.getElementById('modal').classList.remove('open') }
|
| 1506 |
+
|
| 1507 |
+
async function rec() {
|
| 1508 |
+
const code = document.getElementById('edt').value.trim(); if (!code) return;
|
| 1509 |
+
try {
|
| 1510 |
+
const res = await fetch(API + '/recompile', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ edited_code: code, kernel_name: S.kn }) });
|
| 1511 |
+
const r = await res.json();
|
| 1512 |
+
if (r.success) { cm(); if (r.result) rRes(r.result, S.tl); }
|
| 1513 |
+
else alert('Failed: ' + (r.detail || 'Unknown'))
|
| 1514 |
+
} catch (e) { alert('Error: ' + e.message) }
|
| 1515 |
+
}
|
| 1516 |
+
|
| 1517 |
+
async function exM() {
|
| 1518 |
+
if (!S.rep) return;
|
| 1519 |
+
try {
|
| 1520 |
+
const res = await fetch(API + '/export', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ original_cuda: S.code, final_rocm: S.rep.optimized_code, migration_report: S.rep }) });
|
| 1521 |
+
if (res.ok) { const a = document.createElement('a'); a.href = URL.createObjectURL(await res.blob()); a.download = 'rocmport-migration.zip'; a.click() }
|
| 1522 |
+
} catch (e) { alert('Export error') }
|
| 1523 |
+
}
|
| 1524 |
+
|
| 1525 |
+
function tsm() {
|
| 1526 |
+
const sn = document.getElementById('sn');
|
| 1527 |
+
if (sn) sn.classList.remove('hide');
|
| 1528 |
+
}
|
| 1529 |
+
|
| 1530 |
+
function esc(s) { return String(s ?? '').replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>') }
|
| 1531 |
+
|
| 1532 |
+
const FB = {
|
| 1533 |
+
vector_add: `#include <cuda_runtime.h>\n\n__global__ void vector_add_kernel(float* A, float* B, float* C, int N) {\n int idx = blockIdx.x * blockDim.x + threadIdx.x;\n if (idx < N) {\n C[idx] = A[idx] + B[idx];\n }\n}\n\nint main() {\n int N = 1 << 24;\n size_t size = N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n int threads = 128;\n int blocks = (N + threads - 1) / threads;\n vector_add_kernel<<<blocks, threads>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
|
| 1534 |
+
matrix_multiply: `#include <cuda_runtime.h>\n#define WARP_SIZE 32\n\n__global__ void matmul_kernel(float* A, float* B, float* C, int N) {\n int row = blockIdx.y * blockDim.y + threadIdx.y;\n int col = blockIdx.x * blockDim.x + threadIdx.x;\n float sum = 0.0f;\n if (row < N && col < N) {\n for (int k = 0; k < N; k++)\n sum += A[row * N + k] * B[k * N + col];\n C[row * N + col] = sum;\n }\n}\n\n__global__ void warp_reduce(float* data, float* result, int N) {\n int tid = threadIdx.x;\n extern __shared__ float sdata[];\n sdata[tid] = (tid < N) ? data[tid] : 0;\n __syncthreads();\n for (int s = WARP_SIZE/2; s > 0; s >>= 1) {\n if (tid < s) sdata[tid] += sdata[tid + s];\n __syncthreads();\n }\n if (tid == 0) result[blockIdx.x] = sdata[0];\n}\n\nint main() {\n int N = 1024;\n size_t size = N * N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n dim3 block(16, 16);\n dim3 grid((N+15)/16, (N+15)/16);\n matmul_kernel<<<grid, block>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
|
| 1535 |
+
convolution_2d: `#include <cuda_runtime.h>\n#define BLOCK_SIZE 16\n\n__global__ void conv2d_kernel(\n float* input, float* kernel, float* output,\n int width, int height\n) {\n int x = blockIdx.x * blockDim.x + threadIdx.x;\n int y = blockIdx.y * blockDim.y + threadIdx.y;\n if (x >= width || y >= height) return;\n float sum = 0.0f;\n for (int ky = -1; ky <= 1; ky++) {\n for (int kx = -1; kx <= 1; kx++) {\n int ix = x + kx, iy = y + ky;\n if (ix >= 0 && ix < width && iy >= 0 && iy < height)\n sum += input[iy * width + ix] * kernel[(ky+1)*3 + (kx+1)];\n }\n }\n output[y * width + x] = sum;\n}\n\nint main() {\n int W = 2048, H = 2048;\n float *d_in, *d_ker, *d_out;\n cudaMalloc(&d_in, W*H*sizeof(float));\n cudaMalloc(&d_ker, 9*sizeof(float));\n cudaMalloc(&d_out, W*H*sizeof(float));\n dim3 block(BLOCK_SIZE, BLOCK_SIZE);\n dim3 grid((W+BLOCK_SIZE-1)/BLOCK_SIZE, (H+BLOCK_SIZE-1)/BLOCK_SIZE);\n conv2d_kernel<<<grid, block>>>(d_in, d_ker, d_out, W, H);\n cudaDeviceSynchronize();\n cudaFree(d_in); cudaFree(d_ker); cudaFree(d_out);\n return 0;\n}`,
|
| 1536 |
+
reduction: `#include <cuda_runtime.h>\n#include <stdio.h>\n#include <iostream>\n#include <vector>\n#include <numeric>\n\n// Tree-based reduction kernel\n__global__ void reduction_kernel(float* g_idata, float* g_odata, unsigned int n) {\n extern __shared__ float sdata[];\n unsigned int tid = threadIdx.x;\n unsigned int i = blockIdx.x * (blockDim.x * 2) + threadIdx.x;\n\n float mySum = (i < n) ? g_idata[i] : 0;\n if (i + blockDim.x < n) mySum += g_idata[i + blockDim.x];\n sdata[tid] = mySum;\n __syncthreads();\n\n for (unsigned int s = blockDim.x / 2; s > 32; s >>= 1) {\n if (tid < s) sdata[tid] = mySum = mySum + sdata[tid + s];\n __syncthreads();\n }\n\n // DELIBERATE WARP-SIZE BUG: Unroll to 32 instead of 64\n if (tid < 32) {\n volatile float* vsmem = sdata;\n vsmem[tid] = mySum = mySum + vsmem[tid + 32];\n vsmem[tid] = mySum = mySum + vsmem[tid + 16];\n vsmem[tid] = mySum = mySum + vsmem[tid + 8];\n vsmem[tid] = mySum = mySum + vsmem[tid + 4];\n vsmem[tid] = mySum = mySum + vsmem[tid + 2];\n vsmem[tid] = mySum = mySum + vsmem[tid + 1];\n }\n\n if (tid == 0) g_odata[blockIdx.x] = sdata[0];\n}\n\nint main() {\n const int N = 1048576;\n // ... Host code for Parallel Reduction demo\n printf("Parallel Reduction demo loaded.\\n");\n return 0;\n}`
|
| 1537 |
+
};
|
| 1538 |
+
|
| 1539 |
+
init();
|
|
|
|
|
|
|
|
|
|
| 1540 |
</script>
|
| 1541 |
</body>
|
| 1542 |
+
|
| 1543 |
</html>
|