tazwarrrr commited on
Commit
a5be23e
·
1 Parent(s): 28263c0

fixing bugs

Browse files
BENCHMARKS.md CHANGED
@@ -1,82 +1,92 @@
1
- # ROCmPort AI - Benchmark Results
2
 
3
- ## 📊 Performance Results on AMD MI300X (Real rocprof)
4
 
5
- | Kernel | Size | Baseline HIP | Optimized ROCm | Speedup | Notes |
6
- |--------|------|--------------|----------------|---------|-------|
7
- | **Matrix Multiply** | 1024×1024 | 12.4ms | 9.5ms | **1.31x** | Shared memory tiling applied |
8
- | **Vector Add** | 10M elements | 3.2ms | 2.9ms | **1.10x** | Memory coalescing fixed |
9
- | **2D Convolution** | 256×256 | 28.7ms | 21.3ms | **1.35x** | LDS optimization applied |
10
- | **Parallel Reduction** | 1M elements | 15.2ms | 12.1ms | **1.25x** | Warp-size aligned unrolling |
11
 
12
- ### 🎯 Key Findings
 
 
 
13
 
14
- - **Memory-bound kernels** show the highest gains (up to 1.35x)
15
- - **Compute-bound kernels** show moderate improvements (1.10-1.20x)
16
- - **Shared memory tiling** is the most effective optimization
17
- - **Wavefront alignment** consistently improves performance
18
 
19
- ### 📈 Performance Breakdown
20
 
21
- #### Matrix Multiply (1024×1024)
22
- - **Baseline HIP**: 12.4ms (straight hipify output)
23
- - **Optimized ROCm**: 9.5ms (after agent optimizations)
24
- - **Bandwidth Utilization**: 87% → 94%
25
- - **Key Optimization**: 32×32 shared memory tiles
26
 
27
- #### Vector Add (10M elements)
28
- - **Baseline HIP**: 3.2ms
29
- - **Optimized ROCm**: 2.9ms
30
- - **Bandwidth Utilization**: 71% → 78%
31
- - **Key Optimization**: Memory access coalescing
32
 
33
- #### 2D Convolution (256×256)
34
- - **Baseline HIP**: 28.7ms
35
- - **Optimized ROCm**: 21.3ms
36
- - **Bandwidth Utilization**: 68% → 91%
37
- - **Key Optimization**: LDS (Local Data Store) usage
38
 
39
- #### Parallel Reduction (1M elements)
40
- - **Baseline HIP**: 15.2ms
41
- - **Optimized ROCm**: 12.1ms
42
- - **Bandwidth Utilization**: 74% → 89%
43
- - **Key Optimization**: 64-thread wavefront aware unrolling
44
 
45
- ---
46
 
47
- ### 🔬 Hardware Configuration
48
 
49
- **Test System:**
50
- - **GPU**: AMD Instinct MI300X
51
- - **Memory**: 192GB HBM3
52
- - **Bandwidth**: 5.3 TB/s theoretical
53
- - **ROCm Version**: 6.2
54
- - **Compiler**: hipcc 6.2.0
55
- - **Profiler**: rocprof v2
56
 
57
- **Environment:**
58
- - **OS**: Ubuntu 22.04 LTS
59
- - **Driver**: AMDGPU 23.40
60
- - **CPU**: AMD EPYC 9654 (for comparison)
61
 
62
- ---
63
 
64
- ### 📝 Methodology
 
 
 
 
 
65
 
66
- 1. **Baseline**: Generated using `hipify-clang` with no optimizations
67
- 2. **Optimized**: ROCmPort AI agent pipeline applied
68
- 3. **Measurement**: rocprof with kernel execution counters
69
- 4. **Validation**: Output correctness verified via checksum
70
- 5. **Iterations**: 3 runs per kernel, median reported
71
 
72
- ---
 
 
73
 
74
- ### 🏆 Performance Claims
75
 
76
- > **ROCmPort AI delivers 1.10x to 1.35x speedup over baseline HIP**
 
 
77
 
78
- **Important**: All comparisons are **Optimized ROCm vs Baseline HIP** (straight hipify output). We do not compare against NVIDIA CUDA performance - we prove our agents add value beyond mechanical translation.
79
 
80
- ---
81
 
82
- *Benchmarked on AMD Instinct MI300X, ROCm 6.2, rocprof counters. Results may vary based on input size and system configuration.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ROCmPort AI Benchmarking Guide
2
 
3
+ This document defines how to report performance without overclaiming.
4
 
5
+ ## Reporting Principles
 
 
 
 
 
6
 
7
+ - Compare against a clearly stated baseline.
8
+ - Use reproducible runs with fixed input sizes and environment details.
9
+ - Include correctness checks before accepting performance numbers.
10
+ - Report failures and non-improving cases, not only wins.
11
 
12
+ ## Baseline Definitions
 
 
 
13
 
14
+ Use one of these and name it explicitly in each table:
15
 
16
+ - Baseline A: Straight `hipify-clang` output with minimal manual edits.
17
+ - Baseline B: Existing hand-written HIP version from the team.
 
 
 
18
 
19
+ Recommended: use Baseline A for measuring migration automation value.
 
 
 
 
20
 
21
+ Quick answer format for live review:
 
 
 
 
22
 
23
+ - Q: What is your baseline?
24
+ - A: Straight hipify output with minimal compile edits (Baseline A), measured on the same hardware and inputs.
 
 
 
25
 
26
+ ## Required Environment Metadata
27
 
28
+ Always include:
29
 
30
+ - GPU model (for example MI300X) and memory size.
31
+ - ROCm version, compiler version, and profiler version.
32
+ - OS and driver versions.
33
+ - Kernel launch parameters and input sizes.
34
+ - Number of runs and aggregation rule (median recommended).
 
 
35
 
36
+ ## Required Measurement Fields
 
 
 
37
 
38
+ For each kernel tested, provide:
39
 
40
+ - Kernel name and workload shape.
41
+ - Baseline latency.
42
+ - Optimized latency.
43
+ - Speedup ratio.
44
+ - Correctness status (pass/fail and checksum or tolerance).
45
+ - Notes on optimization strategy.
46
 
47
+ Example table format:
 
 
 
 
48
 
49
+ | Kernel | Shape | Baseline (ms) | Optimized (ms) | Speedup | Correctness | Notes |
50
+ |---|---|---:|---:|---:|---|---|
51
+ | matrix_multiply | 1024x1024 | 12.4 | 9.5 | 1.31x | pass | LDS tiling + wavefront-aware launch |
52
 
53
+ Include non-win cases in the same table. Example:
54
 
55
+ | Kernel | Shape | Baseline (ms) | Optimized (ms) | Speedup | Correctness | Notes |
56
+ |---|---|---:|---:|---:|---|---|
57
+ | sparse_scatter | 4M elements | 6.0 | 6.3 | 0.95x | pass | Irregular access pattern; optimization did not help |
58
 
59
+ ## Reproducibility Checklist
60
 
61
+ Before publishing numbers, verify all items:
62
 
63
+ - Same input set for baseline and optimized runs.
64
+ - Warm-up runs excluded or consistently handled.
65
+ - At least 3 measured runs (prefer 5+) with median reported.
66
+ - No hidden manual edits after optimization output unless documented.
67
+ - Full command lines and profiler artifacts retained.
68
+
69
+ ## Evidence Package for Review
70
+
71
+ A technical review package should include:
72
+
73
+ - CUDA source input.
74
+ - Baseline HIP output.
75
+ - Optimized HIP output.
76
+ - Compile logs and profiler summaries.
77
+ - Final report explaining what changed and why.
78
+
79
+ ## Interpreting Results Responsibly
80
+
81
+ - Some kernels will regress or fail initially; this is normal for migration.
82
+ - Improvement ranges vary by memory behavior, occupancy, and control-flow patterns.
83
+ - Do not claim universal speedups.
84
+
85
+ Preferred claim style:
86
+
87
+ "ROCmPort AI improved X out of Y tested kernels against a stated baseline under reproducible MI300X conditions."
88
+
89
+ ## Current Repository Status
90
+
91
+ The repository includes demo kernels intended to exercise migration behavior.
92
+ Treat any sample numbers as demonstrations unless accompanied by full reproducibility artifacts from your environment.
README.md CHANGED
@@ -1,275 +1,219 @@
1
  # ROCmPort AI
2
 
3
- **The fastest way to escape CUDA lock-in and run on AMD.**
4
 
5
- Paste CUDA code 5 AI agents automatically port it to ROCm/HIP → optimize for MI300X → benchmark on real hardware → show you the performance improvement — live, with full visibility into every decision the agents make.
6
 
7
- ---
8
 
9
- ## 🎬 What Happens in 10 Seconds
10
- 1. Paste CUDA code
11
- 2. AI detects issues (warp size, memory bottlenecks)
12
- 3. Converts to ROCm
13
- 4. Tries optimization → fails → retries
14
- 5. Shows real benchmark improvement on AMD GPU
15
 
16
- Result: Working, optimized AMD code in minutes.
 
 
 
 
17
 
18
- ---
19
 
20
- ## 🚀 Quick Start
 
 
21
 
22
- ### Option 1: One-Click Start (Recommended)
23
 
24
- ```bash
25
- # Windows
26
- start.bat
27
-
28
- # Linux/Mac
29
- ./start.sh
30
- ```
31
-
32
- This will:
33
- - Install all dependencies
34
- - Create .env file from template
35
- - Start the FastAPI server
36
- - Open the web interface at `http://localhost:8000`
37
 
38
- ### Option 2: Manual Setup
 
39
 
40
- ```bash
41
- cd backend
42
- pip install -r requirements.txt
43
- cp .env.example .env
44
- # Add your GROQ_API_KEY to .env file
45
- uvicorn main:app --reload --port 8000
46
- ```
47
 
48
- Then open `frontend/index.html` in your browser.
49
 
50
- ---
51
 
52
- ## One-Command Demo with Docker
53
 
54
- ```bash
55
- docker build -t rocmport-ai .
56
- docker run -p 8000:8000 rocmport-ai
57
- ```
58
 
59
- Then open http://localhost:8000 in your browser.
60
 
61
- ---
62
 
63
- ## Project Structure
64
 
 
 
 
 
 
 
 
 
65
  ```
66
- ROCmPort AI/
67
- ├── backend/
68
- │ ├── main.py ← FastAPI + SSE streaming endpoint
69
- │ ├── models.py ← All Pydantic schemas
70
- │ ├── requirements.txt ← Dependencies (includes openai==1.47.0)
71
- │ ├── agents/
72
- │ │ ├── analyzer.py ← Warp size detection, workload classification
73
- │ │ ├── translator.py ← hipify pass 1 + LLM pass 2
74
- │ │ ├── optimizer.py ← AMD MI300X-specific optimizations
75
- │ │ ├── tester.py ← Real rocprof OR mocked (controlled failure)
76
- │ │ └── coordinator.py ← Full pipeline + retry loop
77
- │ ├── tools/
78
- │ │ ├── hipify_wrapper.py ← Real hipify-clang or Python fallback
79
- │ │ ├── rocprof_wrapper.py ← hipcc compiler + rocprof parser
80
- │ │ └── llm_client.py ← Groq ↔ vLLM swap for AMD Cloud
81
- │ ├── demo_kernels/
82
- │ │ ├── vector_add.cu ← Simple kernel with warp size bug
83
- │ │ ├── matrix_multiply.cu ← Complex kernel with controlled failure
84
- │ │ ├── convolution_2d.cu ← Advanced kernel for optimization demo
85
- │ │ └── reduction.cu ← Classic reduction with warp size unroll bug
86
- │ └── prompts/
87
- │ ├── analyzer_prompt.txt
88
- │ ├── translator_prompt.txt
89
- │ ├── optimizer_prompt.txt
90
- │ └── coordinator_prompt.txt
91
- ├── frontend/
92
- │ └── index.html ← Full UI with dark terminal aesthetic
93
- ├── .env.example ← Environment variables template
94
- ├── start.bat ← Windows startup script
95
- ├── start.sh ← Linux/Mac startup script
96
- └── README.md ← This file
97
- ```
98
-
99
- ---
100
-
101
- ## 🤖 The 5 Agents
102
-
103
- ### 1. **Analyzer** — Deep Code Analysis
104
- - Detects all CUDA kernels and APIs
105
- - **Critical**: Flags warp size assumptions (32→64 threads)
106
- - Classifies workload: compute-bound vs memory-bound
107
- - Identifies multi-GPU sharding (unnecessary on MI300X's 192GB)
108
-
109
- ### 2. **Translator** — Two-Pass Conversion
110
- - **Pass 1**: hipify-clang for mechanical replacements (cuda→hip)
111
- - **Pass 2**: LLM fixes what hipify misses (warp size, intrinsics)
112
- - Tracks every change with confidence levels
113
 
114
- ### 3. **Optimizer** MI300X-Specific Tuning
115
- - Shared memory tiling (32×32 blocks)
116
- - Memory coalescing fixes
117
- - Wavefront alignment (256 thread blocks)
118
- - Removes GPU sharding code
 
 
 
 
 
 
 
 
 
 
 
119
 
120
- ### 4. **Tester** Real Hardware Benchmarking
121
- - Compiles with hipcc
122
- - Profiles with rocprof on real MI300X
123
- - **Controlled failure**: Iteration 1 performs worse → triggers retry
124
- - Iteration 2 shows improvement
125
 
126
- ### 5. **Coordinator** Intelligent Orchestration
127
- - Manages retry loop when optimization fails
128
- - Generates final migration report
129
- - Explains AMD hardware advantages
130
 
131
- ---
132
 
133
- ## ⚙️ Configuration
 
 
134
 
135
- ### Environment Variables
136
 
137
- Copy `.env.example` to `.env` and configure:
138
 
139
- ```bash
140
- # Required for local development
141
- GROQ_API_KEY=your_groq_api_key_here
 
 
 
142
 
143
- # Optional: Override Groq model
144
- GROQ_MODEL=llama-3.3-70b-versatile
145
 
146
- # For AMD Cloud deployment
147
- USE_VLLM=true
148
- VLLM_BASE_URL=http://your-amd-cloud:8000
149
- VLLM_API_KEY=your_vllm_key
150
- VLLM_MODEL=amd/llama-3.3-70b
151
-
152
- # On AMD Cloud with real hardware
153
- ROCM_AVAILABLE=true
154
- HIPCC_PATH=hipcc
155
- ROCPROF_PATH=rocprof
156
- ```
157
 
158
- ### Getting API Keys
159
 
160
- 1. **Groq (Local Development)**: Free at [console.groq.com](https://console.groq.com)
161
- 2. **vLLM (AMD Cloud)**: Deploy vLLM on MI300X with OpenAI-compatible API
162
 
163
- ---
164
 
165
- ## 🎯 Demo Kernels
166
 
167
- Three pre-tested CUDA examples included:
168
 
169
- 1. **Vector Add** - Simple kernel demonstrating basic pipeline
170
- 2. **Matrix Multiply** - Shows shared memory tiling optimization
171
- 3. **2D Convolution** - Advanced memory access pattern optimization
172
- 4. **Parallel Reduction** - Demonstrates warp-size aware unrolling (32 vs 64)
173
 
174
- All contain intentional warp size bugs to demonstrate AMD-specific fixes.
 
 
175
 
176
- ---
 
 
177
 
178
- ## 🌐 AMD Cloud Deployment
179
 
180
- simply set:
181
  ```bash
182
- ROCM_AVAILABLE=true
183
- USE_VLLM=true
 
 
 
184
  ```
185
 
186
- Everything else is already wired up for real MI300X hardware.
187
 
188
- ---
189
 
190
- ## 🔧 Development
191
-
192
- ### Running Tests
193
  ```bash
194
- cd backend
195
- python -m pytest tests/
196
  ```
197
 
198
- ### Code Structure
199
- - **FastAPI** backend with SSE streaming
200
- - **Vanilla JS** frontend (no heavy frameworks)
201
- - **CrewAI** for agent orchestration
202
- - **Pydantic** for data models
203
 
204
- ### Contributing
205
- 1. Fork the repository
206
- 2. Create feature branch
207
- 3. Test with demo kernels
208
- 4. Submit PR
209
 
210
- ---
 
 
 
 
211
 
212
- ---
213
 
214
- ## 🎥 Watch the 2-min Demo
215
 
216
- [ROCmPort AI on AMD MI300X](https://youtu.be/your-link)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
 
218
- ---
219
 
220
- ## ☁️ Run on AMD Cloud (Real MI300X)
221
 
222
  ```bash
223
- # Set environment for real hardware
224
- export ROCM_AVAILABLE=true
225
- export USE_VLLM=true
226
 
227
- # Deploy vLLM on MI300X
228
- docker run --gpus all -p 8000:8000 \
229
- vllm/vllm:latest \
230
- --model amd/llama-3.3-70b \
231
- --gpu-memory-utilization 0.95
232
 
233
- # Start ROCmPort AI
234
- cd backend
235
- uvicorn main:app --host 0.0.0.0 --port 8000
236
  ```
237
 
238
- ---
239
-
240
- ## 🔧 Troubleshooting
241
-
242
- | Issue | Solution |
243
- |-------|----------|
244
- | **"GROQ_API_KEY not found"** | Add your API key to `.env` file from [console.groq.com](https://console.groq.com) |
245
- | **"hipcc not found"** | Install ROCm: `sudo apt install rocm-dkms` or use AMD Cloud |
246
- | **"Permission denied"** | Check file permissions: `chmod +x start.sh` |
247
- | **Frontend not loading** | Ensure backend is running on port 8000 |
248
- | **No speedup shown** | Check if `ROCM_AVAILABLE=true` for real hardware |
249
-
250
- ---
251
 
252
- ## 🎯 Why ROCmPort AI Wins This Hackathon
253
 
254
- 1. **Real Hardware Integration** - Actual MI300X benchmarking with rocprof, not mocked data
255
- 2. **Intelligent Agent Pipeline** - 5 specialized AI agents working in sequence with retry logic
256
- 3. **Trust Layer Verification** - Checksum verification ensures migrated code actually works
257
- 4. **Human Override Capability** - Developers can edit and re-test optimized code
258
- 5. **Cost Impact Analysis** - Shows real business value ($20k-$100k savings per module)
259
- 6. **Simple Mode Toggle** - "Explain Like I'm 5" makes complex concepts accessible
260
- 7. **Live SSE Streaming** - Real-time visibility into every agent decision
261
- 8. **GitHub PR Simulation** - One-click export with diffs and reports
262
- 9. **Predictive Analysis** - AI predicts performance gains before optimization
263
- 10. **Honest Performance Claims** - Compares optimized ROCm vs baseline HIP, not fabricated NVIDIA comparisons
264
 
265
- ---
266
 
267
- ## 👤 Creator
268
 
269
- **Tazwar Ahnaf Enan**
270
- AI Engineer & GPU Systems Builder
 
 
 
 
271
 
272
- [![X (Twitter)](https://img.shields.io/badge/X-@TazwarEnan-1DA1F2?style=flat-square&logo=x)](https://x.com/TazwarEnan)
273
- [![GitHub](https://img.shields.io/badge/GitHub-tazwaryayyyy-181717?style=flat-square&logo=github)](https://github.com/tazwaryayyyy)
274
 
275
- *Built with 🔥 for AMD Developer Hackathon 2026*
 
1
  # ROCmPort AI
2
 
3
+ ROCmPort AI helps CUDA teams migrate to AMD by translating, testing, and iteratively optimizing kernels using real hardware feedback.
4
 
5
+ It is an acceleration system for migration work, not a one-click replacement for CUDA expertise.
6
 
7
+ ## What This Project Is
8
 
9
+ ROCmPort AI orchestrates a migration loop:
 
 
 
 
 
10
 
11
+ 1. Analyze CUDA code and detect migration risks.
12
+ 2. Translate with hipify plus LLM-assisted fixes.
13
+ 3. Compile and profile with ROCm tooling.
14
+ 4. Propose optimization changes and re-test.
15
+ 5. Return artifacts and decision trace.
16
 
17
+ ## What This Project Is Not
18
 
19
+ - Not guaranteed to auto-fix all CUDA kernels.
20
+ - Not a claim that every kernel improves.
21
+ - Not a replacement for domain experts in performance-critical code.
22
 
23
+ Complex kernels can fail conversion due to architecture assumptions, undefined behavior, inline PTX, or handcrafted memory logic. The value is reduced migration time and faster debug loops.
24
 
25
+ ## Target User and Business Case
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ Primary product position:
28
+ - Tool for teams evaluating AMD migration cost and performance tradeoffs.
29
 
30
+ Typical use cases:
31
+ - Port legacy CUDA modules to HIP/ROCm with a measurable baseline.
32
+ - Build a migration backlog ranked by risk and expected impact.
33
+ - Identify kernels where MI300X memory capacity can remove sharding complexity.
 
 
 
34
 
35
+ Cost and performance impact should be calculated from your environment and workload, not fixed marketing ranges.
36
 
37
+ ## AMD-Specific Technical Considerations (MI300X)
38
 
39
+ ROCmPort AI explicitly reasons about AMD constraints and opportunities, including:
40
 
41
+ - Wavefront size 64 (vs CUDA warp 32 assumptions), which affects reduction trees, ballot/shuffle idioms, and launch geometry.
42
+ - LDS (local data store) usage and bank behavior for tile staging and reuse.
43
+ - MI300X memory capacity (192GB HBM) and implications for reducing model/data sharding in some workflows.
44
+ - Memory access patterns and occupancy tradeoffs under ROCm compiler behavior.
45
 
46
+ These are the places where migration often breaks or underperforms even after a successful hipify pass.
47
 
48
+ ### Concrete Wavefront Mismatch Example
49
 
50
+ From `backend/demo_kernels/reduction.cu`, the reduction tail assumes a 32-thread warp:
51
 
52
+ ```cpp
53
+ // NVIDIA-style assumption (incorrect on AMD wavefront=64)
54
+ if (tid < 32) {
55
+ volatile float* vsmem = sdata;
56
+ vsmem[tid] += vsmem[tid + 32];
57
+ vsmem[tid] += vsmem[tid + 16];
58
+ ...
59
+ }
60
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
+ A wavefront-aware correction expands the final stage to include the 64-wide lane behavior:
63
+
64
+ ```cpp
65
+ // AMD-aware final reduction stage
66
+ if (tid < 64) {
67
+ volatile float* vsmem = sdata;
68
+ vsmem[tid] += vsmem[tid + 32];
69
+ if (tid < 32) {
70
+ vsmem[tid] += vsmem[tid + 16];
71
+ vsmem[tid] += vsmem[tid + 8];
72
+ vsmem[tid] += vsmem[tid + 4];
73
+ vsmem[tid] += vsmem[tid + 2];
74
+ vsmem[tid] += vsmem[tid + 1];
75
+ }
76
+ }
77
+ ```
78
 
79
+ The key point is not the exact rewrite shape; it is that warp-size assumptions must be made explicit and re-validated on AMD.
 
 
 
 
80
 
81
+ ## Why This Is More Than Glue
 
 
 
82
 
83
+ ROCmPort AI combines existing tools, but its core value is the control system around them:
84
 
85
+ - Decision loop: detect failure/perf regressions, apply next strategy, re-run.
86
+ - Explainability: stream each step and rationale (SSE logs + final report).
87
+ - Verification: pair code changes with compile/test/profiler evidence.
88
 
89
+ ## Judge Mode Walkthrough
90
 
91
+ Use this flow for technical review:
92
 
93
+ 1. Show original CUDA kernel.
94
+ 2. Show baseline HIP from straight hipify output.
95
+ 3. Run ROCmPort AI and show per-agent trace.
96
+ 4. Show final optimized HIP output.
97
+ 5. Show measured result against the declared baseline.
98
+ 6. Show one case with marginal gain or no gain.
99
 
100
+ This format makes the comparison falsifiable and avoids curated-demo concerns.
 
101
 
102
+ - Full walkthrough: `docs/JUDGE_MODE.md`.
 
 
 
 
 
 
 
 
 
 
103
 
104
+ ## Documented Failure Case
105
 
106
+ At least one failure path is documented with source, output, root cause, and fix requirements:
 
107
 
108
+ - See `docs/FAILURE_CASES.md`.
109
 
110
+ This is intentional: credibility improves when the system's failure boundary is visible.
111
 
112
+ ## Quick Start
113
 
114
+ ### Option 1: Startup Script
 
 
 
115
 
116
+ ```bash
117
+ # Windows
118
+ start.bat
119
 
120
+ # Linux/Mac
121
+ ./start.sh
122
+ ```
123
 
124
+ ### Option 2: Manual
125
 
 
126
  ```bash
127
+ cd backend
128
+ pip install -r requirements.txt
129
+ cp .env.example .env
130
+ # add your GROQ_API_KEY
131
+ uvicorn main:app --reload --port 8000
132
  ```
133
 
134
+ Open `frontend/index.html` in a browser.
135
 
136
+ ### Option 3: Docker
137
 
 
 
 
138
  ```bash
139
+ docker build -t rocmport-ai .
140
+ docker run -p 8000:8000 rocmport-ai
141
  ```
142
 
143
+ ## Benchmarking and Reproducibility
 
 
 
 
144
 
145
+ Benchmark claims should always include:
 
 
 
 
146
 
147
+ - Baseline definition (e.g., straight hipify output).
148
+ - Hardware/software versions.
149
+ - Input sizes and run counts.
150
+ - Correctness verification.
151
+ - Full logs or scripts to reproduce.
152
 
153
+ See `BENCHMARKS.md` for the recommended reporting format used by this repository.
154
 
155
+ ## Project Structure
156
 
157
+ ```text
158
+ ROCmPort AI/
159
+ ├── backend/
160
+ │ ├── main.py
161
+ │ ├── models.py
162
+ │ ├── agents/
163
+ │ │ ├── analyzer.py
164
+ │ │ ├── translator.py
165
+ │ │ ├── optimizer.py
166
+ │ │ ├── tester.py
167
+ │ │ └── coordinator.py
168
+ │ ├── tools/
169
+ │ │ ├── hipify_wrapper.py
170
+ │ │ ├── rocprof_wrapper.py
171
+ │ │ └── llm_client.py
172
+ │ ├── demo_kernels/
173
+ │ └── prompts/
174
+ ├── frontend/
175
+ │ └── index.html
176
+ ├── BENCHMARKS.md
177
+ └── README.md
178
+ ```
179
 
180
+ ## Configuration
181
 
182
+ Copy `.env.example` to `.env`:
183
 
184
  ```bash
185
+ GROQ_API_KEY=your_key
186
+ GROQ_MODEL=llama-3.3-70b-versatile
 
187
 
188
+ USE_VLLM=true
189
+ VLLM_BASE_URL=http://your-amd-cloud:8000
190
+ VLLM_API_KEY=your_vllm_key
191
+ VLLM_MODEL=amd/llama-3.3-70b
 
192
 
193
+ ROCM_AVAILABLE=true
194
+ HIPCC_PATH=hipcc
195
+ ROCPROF_PATH=rocprof
196
  ```
197
 
198
+ ## Defensible Scope
 
 
 
 
 
 
 
 
 
 
 
 
199
 
200
+ This project is harder to replicate than a thin wrapper because it couples:
201
 
202
+ - Multi-agent orchestration with retry decisions.
203
+ - Structured traceability across analysis, translation, optimization, and test phases.
204
+ - Integrated reporting where claims can be audited against intermediate artifacts.
 
 
 
 
 
 
 
205
 
206
+ A basic weekend clone can chain hipify and an LLM. The differentiator is reliable decision flow and evidence quality under failure.
207
 
208
+ ## Troubleshooting
209
 
210
+ | Issue | Resolution |
211
+ |---|---|
212
+ | `GROQ_API_KEY not found` | Add key to `.env`. |
213
+ | `hipcc not found` | Install ROCm toolchain or run in an ROCm-enabled environment. |
214
+ | Backend unavailable | Verify FastAPI server is running on port `8000`. |
215
+ | No improvement observed | Re-check baseline definition, kernel size, and profiler counters. |
216
 
217
+ ## License
 
218
 
219
+ See `LICENSE`.
backend/agents/analyzer.py CHANGED
@@ -1,24 +1,28 @@
1
- import json
2
- import re
3
- from models import AnalyzerResult, WorkloadType
4
- from tools.llm_client import LLMClient
5
- from tools.json_utils import safe_json_loads
6
 
7
  llm_client = LLMClient()
8
 
 
9
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
10
  """Wrapper for LLM client chat completion"""
11
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
12
 
 
13
  def generate_prediction(workload_type: WorkloadType, line_count: int) -> str:
14
  """Generate performance prediction based on workload analysis"""
 
15
  if workload_type == WorkloadType.MEMORY_BOUND:
16
- return "🧠 Prediction: This kernel is memory-bound → HIGH potential gain on MI300X (5.3 TB/s vs H100 3.35 TB/s bandwidth)"
17
  elif workload_type == WorkloadType.COMPUTE_BOUND:
18
- return "🧠 Prediction: This kernel is compute-bound → MODERATE gain on MI300X (wavefront efficiency improvements)"
19
  else:
20
  return "🧠 Prediction: Unknown workload type → LIMITED gain prediction without further analysis"
21
 
 
22
  SYSTEM_PROMPT = """You are an expert CUDA and GPU architecture engineer analyzing CUDA code before porting it to AMD ROCm/HIP.
23
 
24
  Your job is to deeply analyze CUDA code and output a structured JSON analysis. Be specific and technical.
@@ -53,7 +57,7 @@ Respond ONLY with this exact JSON structure, no markdown, no extra text:
53
  def run(cuda_code: str) -> AnalyzerResult:
54
  # Count lines for complexity estimation
55
  line_count = len([line for line in cuda_code.split('\n') if line.strip()])
56
-
57
  try:
58
  raw = chat_complete(
59
  messages=[
@@ -77,7 +81,7 @@ def run(cuda_code: str) -> AnalyzerResult:
77
  "line_count": line_count,
78
  "complexity_score": 5
79
  }
80
-
81
  workload_type = WorkloadType(data.get("workload_type", "unknown"))
82
  prediction = generate_prediction(workload_type, line_count)
83
 
 
1
+ # pylint: disable=broad-exception-caught
2
+
3
+ from ..models import AnalyzerResult, WorkloadType
4
+ from ..tools.llm_client import LLMClient
5
+ from ..tools.json_utils import safe_json_loads
6
 
7
  llm_client = LLMClient()
8
 
9
+
10
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
11
  """Wrapper for LLM client chat completion"""
12
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
13
 
14
+
15
  def generate_prediction(workload_type: WorkloadType, line_count: int) -> str:
16
  """Generate performance prediction based on workload analysis"""
17
+ size_hint = "large" if line_count and line_count > 200 else "small/medium"
18
  if workload_type == WorkloadType.MEMORY_BOUND:
19
+ return f"🧠 Prediction: This {size_hint} kernel is memory-bound → HIGH potential gain on MI300X (5.3 TB/s vs H100 3.35 TB/s bandwidth)"
20
  elif workload_type == WorkloadType.COMPUTE_BOUND:
21
+ return f"🧠 Prediction: This {size_hint} kernel is compute-bound → MODERATE gain on MI300X (wavefront efficiency improvements)"
22
  else:
23
  return "🧠 Prediction: Unknown workload type → LIMITED gain prediction without further analysis"
24
 
25
+
26
  SYSTEM_PROMPT = """You are an expert CUDA and GPU architecture engineer analyzing CUDA code before porting it to AMD ROCm/HIP.
27
 
28
  Your job is to deeply analyze CUDA code and output a structured JSON analysis. Be specific and technical.
 
57
  def run(cuda_code: str) -> AnalyzerResult:
58
  # Count lines for complexity estimation
59
  line_count = len([line for line in cuda_code.split('\n') if line.strip()])
60
+
61
  try:
62
  raw = chat_complete(
63
  messages=[
 
81
  "line_count": line_count,
82
  "complexity_score": 5
83
  }
84
+
85
  workload_type = WorkloadType(data.get("workload_type", "unknown"))
86
  prediction = generate_prediction(workload_type, line_count)
87
 
backend/agents/coordinator.py CHANGED
@@ -1,202 +1,224 @@
1
  import asyncio
 
2
  from typing import AsyncGenerator
3
- from models import (
4
- AgentEvent, AgentStatus, AnalyzerResult, TranslatorResult,
5
- OptimizerResult, TesterResult, FinalReport, WorkloadType, CostEstimate
 
 
 
 
 
 
 
 
 
 
 
6
  )
7
- from agents import analyzer, translator, optimizer, tester
8
 
9
 
10
  def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
11
- """Calculate cost impact estimate based on code complexity"""
12
- line_count = analyzer_result.line_count or 100
13
  complexity = analyzer_result.complexity_score or 5
14
-
15
  if complexity <= 3:
16
  manual_weeks = "1-2 weeks"
17
  savings = "$5,000-$10,000"
18
  factor = "Low"
19
  elif complexity <= 7:
20
- manual_weeks = "3-6 weeks"
21
  savings = "$20,000-$50,000"
22
  factor = "Medium"
23
  else:
24
  manual_weeks = "6-10 weeks"
25
  savings = "$50,000-$100,000"
26
  factor = "High"
27
-
28
  return CostEstimate(
29
  manual_porting_weeks=manual_weeks,
30
- rocmport_minutes="5 minutes",
31
  estimated_savings=savings,
32
- complexity_factor=factor
33
  )
34
 
35
 
36
  def simplify_explanation(report: FinalReport) -> str:
37
- """Convert technical explanations to simple language for "Explain Like I'm 5" mode"""
38
  simple_text = report.amd_advantage_explanation
39
-
40
- # Replace technical terms with simple, natural explanations
41
- simple_text = simple_text.replace("5.3 TB/s memory bandwidth", "much faster memory access")
42
  simple_text = simple_text.replace("3.35 TB/s", "slower memory access")
43
- simple_text = simple_text.replace("memory-bound", "needs to move a lot of data")
44
- simple_text = simple_text.replace("compute-bound", "does a lot of calculations")
45
- simple_text = simple_text.replace("wavefront", "group of threads working together")
46
- simple_text = simple_text.replace("shared memory tiling", "shares data between threads efficiently")
 
 
 
 
47
  simple_text = simple_text.replace("coalescing", "accesses memory in order")
48
  simple_text = simple_text.replace("optimization", "improvement")
49
  simple_text = simple_text.replace("performance", "speed")
50
  simple_text = simple_text.replace("benchmark", "test")
51
  simple_text = simple_text.replace("iteration", "try")
52
-
53
- # Make sentences more natural
54
  simple_text = simple_text.replace("This kernel is", "This code is")
55
  simple_text = simple_text.replace("The optimization", "The improvement")
56
  simple_text = simple_text.replace("achieves", "gets")
57
  simple_text = simple_text.replace("demonstrates", "shows")
58
-
59
  return simple_text
60
 
61
 
62
- async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode: bool = False) -> AsyncGenerator[AgentEvent, None]:
63
- """
64
- Full agent pipeline. Yields AgentEvent objects as SSE data.
65
- Coordinator handles the retry loop when Tester fails iteration 1.
66
- """
 
 
67
 
68
- # ─── ANALYZER ───────────────────────────────────────────────
69
- yield AgentEvent(agent="analyzer", status=AgentStatus.RUNNING,
70
- message="Scanning CUDA code for kernels, APIs, and hardware-specific issues...")
 
 
71
 
72
  try:
73
  analyzer_result: AnalyzerResult = await asyncio.to_thread(analyzer.run, cuda_code)
74
  except Exception as e:
75
- yield AgentEvent(agent="analyzer", status=AgentStatus.FAILED,
76
- message="Analysis failed", detail=str(e))
77
  return
78
 
79
- detail_parts = [f"Found {len(analyzer_result.kernels_found)} kernel(s): {', '.join(analyzer_result.kernels_found)}"]
80
- detail_parts.append(f"Workload: {analyzer_result.workload_type.value}")
81
- detail_parts.append(f"Difficulty: {analyzer_result.difficulty} — {analyzer_result.difficulty_reason}")
 
 
82
 
83
  if analyzer_result.warp_size_issue:
84
- detail_parts.append(f"⚠️ WARP SIZE ISSUE: {analyzer_result.warp_size_detail}")
85
-
86
  if analyzer_result.sharding_detected:
87
- detail_parts.append("⚠️ Multi-GPU sharding detected — unnecessary on MI300X (192GB)")
88
-
89
- # Add prediction if available
90
  if analyzer_result.prediction:
91
  detail_parts.append(analyzer_result.prediction)
92
 
93
- # Calculate cost estimate
94
- try:
95
- cost_estimate = calculate_cost_estimate(analyzer_result)
96
- except Exception as e:
97
- # Fallback cost estimate if calculation fails
98
- cost_estimate = CostEstimate(
99
- manual_porting_weeks="3-6 weeks",
100
- rocmport_minutes="5 minutes",
101
- estimated_savings="$20,000-$50,000",
102
- complexity_factor="Medium"
103
- )
104
-
105
- yield AgentEvent(agent="analyzer", status=AgentStatus.DONE,
106
- message=f"Found {len(analyzer_result.kernels_found)} kernel(s) | {analyzer_result.workload_type.value} workload | Difficulty: {analyzer_result.difficulty}",
107
- detail="\n".join(detail_parts))
108
-
109
- # ─── TRANSLATOR ──────────────────────────────────────────────
110
- yield AgentEvent(agent="translator", status=AgentStatus.RUNNING,
111
- message="Running hipify-clang (pass 1) then LLM correction (pass 2)...")
112
 
113
- # Processing...
 
 
 
 
114
 
115
  try:
116
- translator_result: TranslatorResult = await asyncio.to_thread(
117
- translator.run, cuda_code, analyzer_result
118
- )
119
  except Exception as e:
120
- yield AgentEvent(agent="translator", status=AgentStatus.FAILED,
121
- message="Translation failed", detail=str(e))
122
  return
123
 
124
- detail = (
125
- f"Total changes: {translator_result.total_changes} "
126
- f"({translator_result.hipify_changes} hipify, {translator_result.llm_changes} LLM)\n"
127
- f"Warp size corrected: {analyzer_result.warp_size_issue}\n"
128
- f"Kernel launch syntax updated"
 
 
 
 
 
 
 
 
129
  )
130
 
131
- yield AgentEvent(agent="translator", status=AgentStatus.DONE,
132
- message=f"{translator_result.total_changes} changes ({translator_result.hipify_changes} hipify + {translator_result.llm_changes} LLM)",
133
- detail=detail)
134
-
135
- # ─── OPTIMIZER (iteration 1) ──────────────────────────────────
136
- yield AgentEvent(agent="optimizer", status=AgentStatus.RUNNING,
137
- message="Applying AMD MI300X-specific optimizations (iteration 1)...")
138
-
139
- # Processing...
140
 
141
  try:
142
  optimizer_result: OptimizerResult = await asyncio.to_thread(
143
- optimizer.run, translator_result.hip_code, analyzer_result, 1
 
 
 
144
  )
145
  except Exception as e:
146
- yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED,
147
- message="Optimization failed", detail=str(e))
148
  return
149
 
150
- changes_text = "\n".join(
151
- f"• {c['description']}" for c in optimizer_result.changes
 
 
 
 
152
  )
153
- yield AgentEvent(agent="optimizer", status=AgentStatus.DONE,
154
- message=f"{len(optimizer_result.changes)} optimization(s) applied",
155
- detail=changes_text)
156
 
157
- # ─── TESTER (iteration 1) ────────────────────────────────────
158
- yield AgentEvent(agent="tester", status=AgentStatus.RUNNING,
159
- message="Compiling with hipcc and profiling with rocprof (iteration 1)...")
160
-
161
- # Testing...
162
 
163
  try:
164
  tester_result_1: TesterResult = await asyncio.to_thread(
165
- tester.run, optimizer_result.optimized_code, analyzer_result, 1, kernel_name
 
 
 
 
166
  )
167
  except Exception as e:
168
- yield AgentEvent(agent="tester", status=AgentStatus.FAILED,
169
- message="Testing failed", detail=str(e))
170
  return
171
 
172
  if not tester_result_1.success:
173
- yield AgentEvent(agent="tester", status=AgentStatus.FAILED,
174
- message="Compilation failed — using cached benchmark",
175
- detail=tester_result_1.notes)
 
 
 
176
  return
177
 
178
- # ─── CONTROLLED FAILURE → RETRY LOOP ─────────────────────────
179
  if tester_result_1.speedup < 1.0:
180
  yield AgentEvent(
181
- agent="tester", status=AgentStatus.FAILED,
182
- message=f"❌ Iteration 1: {tester_result_1.speedup}x — worse than baseline HIP",
183
- detail=f"Bandwidth utilized: {tester_result_1.bandwidth_utilized}%\n{tester_result_1.notes}"
 
 
 
 
184
  )
185
 
186
  yield AgentEvent(
187
- agent="coordinator", status=AgentStatus.RUNNING,
188
- message="Performance degraded — re-running Optimizer with profiler feedback...",
189
- detail=f"Profiler says: {tester_result_1.notes}\nSwitching optimization strategy."
 
190
  )
191
 
192
- # Testing...
193
-
194
- # Optimizer iteration 2 with profiler feedback
195
- yield AgentEvent(agent="optimizer", status=AgentStatus.RETRYING,
196
- message="Trying alternative optimization strategy (iteration 2)...",
197
- detail=f"Previous strategy caused regression. Profiler feedback: {tester_result_1.notes}")
198
-
199
- # Trace: Optimizer v2
200
 
201
  try:
202
  optimizer_result_2: OptimizerResult = await asyncio.to_thread(
@@ -204,31 +226,36 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
204
  translator_result.hip_code,
205
  analyzer_result,
206
  2,
207
- tester_result_1.notes
208
  )
209
  except Exception as e:
210
- yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED,
211
- message="Re-optimization failed", detail=str(e))
212
  return
213
 
214
- changes_text_2 = "\n".join(f"• {c['description']}" for c in optimizer_result_2.changes)
215
- yield AgentEvent(agent="optimizer", status=AgentStatus.DONE,
216
- message=f"Alternative strategy: {len(optimizer_result_2.changes)} change(s) applied",
217
- detail=changes_text_2)
218
-
219
- # Tester iteration 2
220
- yield AgentEvent(agent="tester", status=AgentStatus.RUNNING,
221
- message="Re-profiling with alternative optimization (iteration 2)...")
222
 
223
- # Testing...
 
 
 
 
224
 
225
  try:
226
  tester_result_final: TesterResult = await asyncio.to_thread(
227
- tester.run, optimizer_result_2.optimized_code, analyzer_result, 2, kernel_name
 
 
 
 
228
  )
229
  except Exception as e:
230
- yield AgentEvent(agent="tester", status=AgentStatus.FAILED,
231
- message="Re-testing failed", detail=str(e))
232
  return
233
 
234
  final_optimizer = optimizer_result_2
@@ -236,50 +263,45 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
236
  tester_result_final = tester_result_1
237
  final_optimizer = optimizer_result
238
 
239
- # ─── TESTER FINAL RESULT ─────────────────────────────────────
240
  yield AgentEvent(
241
  agent="tester",
242
  status=AgentStatus.DONE,
243
- message=f"Iteration {tester_result_final.iteration}: {tester_result_final.speedup}x faster than baseline HIP",
244
  detail=(
245
  f"Execution time: {tester_result_final.execution_ms:.1f}ms\n"
246
  f"Memory bandwidth: {tester_result_final.bandwidth_utilized:.1f}% utilized\n"
247
  f"Bottleneck type: {tester_result_final.bottleneck}\n"
248
  f"{tester_result_final.notes}"
249
- )
250
  )
251
 
252
- # ─── COORDINATOR FINAL REPORT ────────────────────────────────
253
- yield AgentEvent(agent="coordinator", status=AgentStatus.RUNNING,
254
- message="Generating migration report...")
255
 
256
- # Processing...
 
257
 
258
- amd_explanation = _build_amd_explanation(analyzer_result, tester_result_final)
259
-
260
- # Calculate cost estimate
261
  try:
262
  cost_estimate = calculate_cost_estimate(analyzer_result)
263
- except Exception as e:
264
- # Fallback cost estimate if calculation fails
265
  cost_estimate = CostEstimate(
266
  manual_porting_weeks="3-6 weeks",
267
- rocmport_minutes="5 minutes",
268
  estimated_savings="$20,000-$50,000",
269
- complexity_factor="Medium"
270
  )
271
-
272
- # Always generate simplified explanation
273
  temp_report = FinalReport(
274
  migration_success=True,
275
  speedup=tester_result_final.speedup,
276
  bandwidth_utilized=tester_result_final.bandwidth_utilized,
277
- total_changes=translator_result.total_changes + len(final_optimizer.changes),
 
278
  bottleneck=tester_result_final.bottleneck,
279
  amd_advantage_explanation=amd_explanation,
280
  iterations=tester_result_final.iteration,
281
  hip_code=translator_result.hip_code,
282
  optimized_code=final_optimizer.optimized_code,
 
283
  )
284
  simplified_explanation = simplify_explanation(temp_report)
285
 
@@ -287,36 +309,34 @@ async def run_pipeline(cuda_code: str, kernel_name: str = "custom", simple_mode:
287
  migration_success=True,
288
  speedup=tester_result_final.speedup,
289
  bandwidth_utilized=tester_result_final.bandwidth_utilized,
290
- total_changes=translator_result.total_changes + len(final_optimizer.changes),
 
291
  bottleneck=tester_result_final.bottleneck,
292
  amd_advantage_explanation=amd_explanation,
293
  iterations=tester_result_final.iteration,
294
  hip_code=translator_result.hip_code,
295
  optimized_code=final_optimizer.optimized_code,
 
296
  cost_estimate=cost_estimate,
297
- simplified_explanation=simplified_explanation
298
  )
299
 
300
- import json
301
  yield AgentEvent(
302
  agent="coordinator",
303
  status=AgentStatus.DONE,
304
  message="Migration complete",
305
- detail=json.dumps(report.model_dump())
306
  )
307
 
308
 
309
  def _build_amd_explanation(analyzer_result: AnalyzerResult, tester_result: TesterResult) -> str:
310
  if analyzer_result.workload_type == WorkloadType.MEMORY_BOUND:
311
  return (
312
- f"This is a memory-bound kernel performance scales with memory bandwidth. "
313
- f"MI300X delivers 5.3 TB/s vs H100's 3.35 TB/s (58% more bandwidth). "
314
- f"After optimization, bandwidth utilization reached {tester_result.bandwidth_utilized:.0f}%, "
315
- f"meaning this workload extracts full value from AMD's memory architecture."
316
- )
317
- else:
318
- return (
319
- f"This is a compute-bound kernel. MI300X delivers 1.3 PFLOPS FP16 "
320
- f"vs H100's 989 TFLOPS — 31% more raw throughput. "
321
- f"After wavefront-aligned optimization, compute utilization improved significantly."
322
  )
 
 
 
 
 
1
  import asyncio
2
+ import json
3
  from typing import AsyncGenerator
4
+
5
+ # pylint: disable=broad-exception-caught
6
+
7
+ from . import analyzer, optimizer, tester, translator
8
+ from ..models import (
9
+ AgentEvent,
10
+ AgentStatus,
11
+ AnalyzerResult,
12
+ CostEstimate,
13
+ FinalReport,
14
+ OptimizerResult,
15
+ TesterResult,
16
+ TranslatorResult,
17
+ WorkloadType,
18
  )
 
19
 
20
 
21
  def calculate_cost_estimate(analyzer_result: AnalyzerResult) -> CostEstimate:
22
+ """Calculate cost impact estimate based on code complexity."""
 
23
  complexity = analyzer_result.complexity_score or 5
24
+
25
  if complexity <= 3:
26
  manual_weeks = "1-2 weeks"
27
  savings = "$5,000-$10,000"
28
  factor = "Low"
29
  elif complexity <= 7:
30
+ manual_weeks = "3-6 weeks"
31
  savings = "$20,000-$50,000"
32
  factor = "Medium"
33
  else:
34
  manual_weeks = "6-10 weeks"
35
  savings = "$50,000-$100,000"
36
  factor = "High"
37
+
38
  return CostEstimate(
39
  manual_porting_weeks=manual_weeks,
40
+ rocmport_minutes="Varies by kernel",
41
  estimated_savings=savings,
42
+ complexity_factor=factor,
43
  )
44
 
45
 
46
  def simplify_explanation(report: FinalReport) -> str:
47
+ """Convert technical explanation to simpler wording for explain mode."""
48
  simple_text = report.amd_advantage_explanation
49
+
50
+ simple_text = simple_text.replace(
51
+ "5.3 TB/s memory bandwidth", "much faster memory access")
52
  simple_text = simple_text.replace("3.35 TB/s", "slower memory access")
53
+ simple_text = simple_text.replace(
54
+ "memory-bound", "needs to move a lot of data")
55
+ simple_text = simple_text.replace(
56
+ "compute-bound", "does a lot of calculations")
57
+ simple_text = simple_text.replace(
58
+ "wavefront", "group of threads working together")
59
+ simple_text = simple_text.replace(
60
+ "shared memory tiling", "shares data between threads efficiently")
61
  simple_text = simple_text.replace("coalescing", "accesses memory in order")
62
  simple_text = simple_text.replace("optimization", "improvement")
63
  simple_text = simple_text.replace("performance", "speed")
64
  simple_text = simple_text.replace("benchmark", "test")
65
  simple_text = simple_text.replace("iteration", "try")
66
+
 
67
  simple_text = simple_text.replace("This kernel is", "This code is")
68
  simple_text = simple_text.replace("The optimization", "The improvement")
69
  simple_text = simple_text.replace("achieves", "gets")
70
  simple_text = simple_text.replace("demonstrates", "shows")
 
71
  return simple_text
72
 
73
 
74
+ async def run_pipeline(
75
+ cuda_code: str,
76
+ kernel_name: str = "custom",
77
+ simple_mode: bool = False,
78
+ ) -> AsyncGenerator[AgentEvent, None]:
79
+ """Run full pipeline and stream AgentEvent objects."""
80
+ _ = simple_mode
81
 
82
+ yield AgentEvent(
83
+ agent="analyzer",
84
+ status=AgentStatus.RUNNING,
85
+ message="Scanning CUDA code for kernels, APIs, and hardware-specific issues...",
86
+ )
87
 
88
  try:
89
  analyzer_result: AnalyzerResult = await asyncio.to_thread(analyzer.run, cuda_code)
90
  except Exception as e:
91
+ yield AgentEvent(agent="analyzer", status=AgentStatus.FAILED, message="Analysis failed", detail=str(e))
 
92
  return
93
 
94
+ detail_parts = [
95
+ f"Found {len(analyzer_result.kernels_found)} kernel(s): {', '.join(analyzer_result.kernels_found)}",
96
+ f"Workload: {analyzer_result.workload_type.value}",
97
+ f"Difficulty: {analyzer_result.difficulty} - {analyzer_result.difficulty_reason}",
98
+ ]
99
 
100
  if analyzer_result.warp_size_issue:
101
+ detail_parts.append(
102
+ f"WARP SIZE ISSUE: {analyzer_result.warp_size_detail}")
103
  if analyzer_result.sharding_detected:
104
+ detail_parts.append(
105
+ "Multi-GPU sharding detected; review if needed on MI300X memory capacity.")
 
106
  if analyzer_result.prediction:
107
  detail_parts.append(analyzer_result.prediction)
108
 
109
+ yield AgentEvent(
110
+ agent="analyzer",
111
+ status=AgentStatus.DONE,
112
+ message=(
113
+ f"Found {len(analyzer_result.kernels_found)} kernel(s) | "
114
+ f"{analyzer_result.workload_type.value} workload | Difficulty: {analyzer_result.difficulty}"
115
+ ),
116
+ detail="\n".join(detail_parts),
117
+ )
 
 
 
 
 
 
 
 
 
 
118
 
119
+ yield AgentEvent(
120
+ agent="translator",
121
+ status=AgentStatus.RUNNING,
122
+ message="Running hipify-clang (pass 1) then LLM correction (pass 2)...",
123
+ )
124
 
125
  try:
126
+ translator_result: TranslatorResult = await asyncio.to_thread(translator.run, cuda_code, analyzer_result)
 
 
127
  except Exception as e:
128
+ yield AgentEvent(agent="translator", status=AgentStatus.FAILED, message="Translation failed", detail=str(e))
 
129
  return
130
 
131
+ yield AgentEvent(
132
+ agent="translator",
133
+ status=AgentStatus.DONE,
134
+ message=(
135
+ f"{translator_result.total_changes} changes "
136
+ f"({translator_result.hipify_changes} hipify + {translator_result.llm_changes} LLM)"
137
+ ),
138
+ detail=(
139
+ f"Total changes: {translator_result.total_changes} "
140
+ f"({translator_result.hipify_changes} hipify, {translator_result.llm_changes} LLM)\n"
141
+ f"Warp size corrected: {analyzer_result.warp_size_issue}\n"
142
+ "Kernel launch syntax updated"
143
+ ),
144
  )
145
 
146
+ yield AgentEvent(
147
+ agent="optimizer",
148
+ status=AgentStatus.RUNNING,
149
+ message="Applying AMD MI300X-specific optimizations (iteration 1)...",
150
+ )
 
 
 
 
151
 
152
  try:
153
  optimizer_result: OptimizerResult = await asyncio.to_thread(
154
+ optimizer.run,
155
+ translator_result.hip_code,
156
+ analyzer_result,
157
+ 1,
158
  )
159
  except Exception as e:
160
+ yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED, message="Optimization failed", detail=str(e))
 
161
  return
162
 
163
+ yield AgentEvent(
164
+ agent="optimizer",
165
+ status=AgentStatus.DONE,
166
+ message=f"{len(optimizer_result.changes)} optimization(s) applied",
167
+ detail="\n".join(
168
+ f"- {c['description']}" for c in optimizer_result.changes),
169
  )
 
 
 
170
 
171
+ yield AgentEvent(
172
+ agent="tester",
173
+ status=AgentStatus.RUNNING,
174
+ message="Compiling with hipcc and profiling with rocprof (iteration 1)...",
175
+ )
176
 
177
  try:
178
  tester_result_1: TesterResult = await asyncio.to_thread(
179
+ tester.run,
180
+ optimizer_result.optimized_code,
181
+ analyzer_result,
182
+ 1,
183
+ kernel_name,
184
  )
185
  except Exception as e:
186
+ yield AgentEvent(agent="tester", status=AgentStatus.FAILED, message="Testing failed", detail=str(e))
 
187
  return
188
 
189
  if not tester_result_1.success:
190
+ yield AgentEvent(
191
+ agent="tester",
192
+ status=AgentStatus.FAILED,
193
+ message="Compilation or profiling failed",
194
+ detail=tester_result_1.notes,
195
+ )
196
  return
197
 
 
198
  if tester_result_1.speedup < 1.0:
199
  yield AgentEvent(
200
+ agent="tester",
201
+ status=AgentStatus.FAILED,
202
+ message=f"Iteration 1: {tester_result_1.speedup}x vs baseline HIP (regression)",
203
+ detail=(
204
+ f"Bandwidth utilized: {tester_result_1.bandwidth_utilized}%\n"
205
+ f"{tester_result_1.notes}"
206
+ ),
207
  )
208
 
209
  yield AgentEvent(
210
+ agent="coordinator",
211
+ status=AgentStatus.RUNNING,
212
+ message="Performance regressed, retrying optimizer with profiler feedback...",
213
+ detail=f"Profiler feedback: {tester_result_1.notes}",
214
  )
215
 
216
+ yield AgentEvent(
217
+ agent="optimizer",
218
+ status=AgentStatus.RETRYING,
219
+ message="Trying alternative optimization strategy (iteration 2)...",
220
+ detail=f"Previous strategy regressed. Feedback: {tester_result_1.notes}",
221
+ )
 
 
222
 
223
  try:
224
  optimizer_result_2: OptimizerResult = await asyncio.to_thread(
 
226
  translator_result.hip_code,
227
  analyzer_result,
228
  2,
229
+ tester_result_1.notes,
230
  )
231
  except Exception as e:
232
+ yield AgentEvent(agent="optimizer", status=AgentStatus.FAILED, message="Re-optimization failed", detail=str(e))
 
233
  return
234
 
235
+ yield AgentEvent(
236
+ agent="optimizer",
237
+ status=AgentStatus.DONE,
238
+ message=f"Alternative strategy: {len(optimizer_result_2.changes)} change(s) applied",
239
+ detail="\n".join(
240
+ f"- {c['description']}" for c in optimizer_result_2.changes),
241
+ )
 
242
 
243
+ yield AgentEvent(
244
+ agent="tester",
245
+ status=AgentStatus.RUNNING,
246
+ message="Re-profiling with alternative optimization (iteration 2)...",
247
+ )
248
 
249
  try:
250
  tester_result_final: TesterResult = await asyncio.to_thread(
251
+ tester.run,
252
+ optimizer_result_2.optimized_code,
253
+ analyzer_result,
254
+ 2,
255
+ kernel_name,
256
  )
257
  except Exception as e:
258
+ yield AgentEvent(agent="tester", status=AgentStatus.FAILED, message="Re-testing failed", detail=str(e))
 
259
  return
260
 
261
  final_optimizer = optimizer_result_2
 
263
  tester_result_final = tester_result_1
264
  final_optimizer = optimizer_result
265
 
 
266
  yield AgentEvent(
267
  agent="tester",
268
  status=AgentStatus.DONE,
269
+ message=f"Iteration {tester_result_final.iteration}: {tester_result_final.speedup}x vs baseline HIP",
270
  detail=(
271
  f"Execution time: {tester_result_final.execution_ms:.1f}ms\n"
272
  f"Memory bandwidth: {tester_result_final.bandwidth_utilized:.1f}% utilized\n"
273
  f"Bottleneck type: {tester_result_final.bottleneck}\n"
274
  f"{tester_result_final.notes}"
275
+ ),
276
  )
277
 
278
+ yield AgentEvent(agent="coordinator", status=AgentStatus.RUNNING, message="Generating migration report...")
 
 
279
 
280
+ amd_explanation = _build_amd_explanation(
281
+ analyzer_result, tester_result_final)
282
 
 
 
 
283
  try:
284
  cost_estimate = calculate_cost_estimate(analyzer_result)
285
+ except Exception:
 
286
  cost_estimate = CostEstimate(
287
  manual_porting_weeks="3-6 weeks",
288
+ rocmport_minutes="Varies by kernel",
289
  estimated_savings="$20,000-$50,000",
290
+ complexity_factor="Medium",
291
  )
292
+
 
293
  temp_report = FinalReport(
294
  migration_success=True,
295
  speedup=tester_result_final.speedup,
296
  bandwidth_utilized=tester_result_final.bandwidth_utilized,
297
+ total_changes=translator_result.total_changes +
298
+ len(final_optimizer.changes),
299
  bottleneck=tester_result_final.bottleneck,
300
  amd_advantage_explanation=amd_explanation,
301
  iterations=tester_result_final.iteration,
302
  hip_code=translator_result.hip_code,
303
  optimized_code=final_optimizer.optimized_code,
304
+ verification=tester_result_final.verification,
305
  )
306
  simplified_explanation = simplify_explanation(temp_report)
307
 
 
309
  migration_success=True,
310
  speedup=tester_result_final.speedup,
311
  bandwidth_utilized=tester_result_final.bandwidth_utilized,
312
+ total_changes=translator_result.total_changes +
313
+ len(final_optimizer.changes),
314
  bottleneck=tester_result_final.bottleneck,
315
  amd_advantage_explanation=amd_explanation,
316
  iterations=tester_result_final.iteration,
317
  hip_code=translator_result.hip_code,
318
  optimized_code=final_optimizer.optimized_code,
319
+ verification=tester_result_final.verification,
320
  cost_estimate=cost_estimate,
321
+ simplified_explanation=simplified_explanation,
322
  )
323
 
 
324
  yield AgentEvent(
325
  agent="coordinator",
326
  status=AgentStatus.DONE,
327
  message="Migration complete",
328
+ detail=json.dumps(report.model_dump()),
329
  )
330
 
331
 
332
  def _build_amd_explanation(analyzer_result: AnalyzerResult, tester_result: TesterResult) -> str:
333
  if analyzer_result.workload_type == WorkloadType.MEMORY_BOUND:
334
  return (
335
+ "This is a memory-bound kernel; performance scales with memory bandwidth. "
336
+ "MI300X provides higher memory bandwidth than H100-class hardware, and this workload "
337
+ f"reached {tester_result.bandwidth_utilized:.0f}% utilization after optimization."
 
 
 
 
 
 
 
338
  )
339
+ return (
340
+ "This is a compute-bound kernel; launch geometry and wavefront-aware tuning are key drivers. "
341
+ "After optimization, compute utilization and execution characteristics improved."
342
+ )
backend/agents/optimizer.py CHANGED
@@ -1,15 +1,17 @@
1
- import json
2
- import re
3
- from models import OptimizerResult, AnalyzerResult, WorkloadType
4
- from tools.llm_client import LLMClient
5
- from tools.json_utils import safe_json_loads
6
 
7
  llm_client = LLMClient()
8
 
 
9
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
10
  """Wrapper for LLM client chat completion"""
11
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
12
 
 
13
  ALLOWED_OPTIMIZATIONS = """
14
  You may ONLY suggest these specific, well-known AMD MI300X optimizations:
15
  1. Shared memory tiling: Replace naive global memory access with 32x32 shared memory tiles (__shared__)
 
1
+ # pylint: disable=broad-exception-caught
2
+
3
+ from ..models import OptimizerResult, AnalyzerResult, WorkloadType
4
+ from ..tools.llm_client import LLMClient
5
+ from ..tools.json_utils import safe_json_loads
6
 
7
  llm_client = LLMClient()
8
 
9
+
10
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
11
  """Wrapper for LLM client chat completion"""
12
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
13
 
14
+
15
  ALLOWED_OPTIMIZATIONS = """
16
  You may ONLY suggest these specific, well-known AMD MI300X optimizations:
17
  1. Shared memory tiling: Replace naive global memory access with 32x32 shared memory tiles (__shared__)
backend/agents/tester.py CHANGED
@@ -1,10 +1,7 @@
1
  import os
2
- import subprocess
3
- import tempfile
4
- import random
5
  import hashlib
6
- from models import TesterResult, AnalyzerResult, WorkloadType, VerificationResult
7
- from tools.rocprof_wrapper import RocprofWrapper
8
 
9
  # Set ROCM_AVAILABLE=true on AMD Cloud
10
  ROCM_AVAILABLE = os.environ.get("ROCM_AVAILABLE", "false").lower() == "true"
@@ -19,27 +16,23 @@ DEMO_KERNEL_CHECKSUMS = {
19
  }
20
 
21
 
22
- def compute_output_checksum(output_data: list, sample_size: int = 100) -> str:
23
- """Compute checksum of first N elements of output data"""
24
- if not output_data:
25
  return "empty"
26
-
27
- # Take first sample_size elements or all if less
28
- sample = output_data[:min(sample_size, len(output_data))]
29
-
30
- # Convert to string and compute SHA256
31
- sample_str = ','.join([str(x) for x in sample])
32
- return hashlib.sha256(sample_str.encode()).hexdigest()[:32]
33
 
34
 
35
  def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationResult:
36
  """Verify demo kernel execution and output correctness"""
37
  expected = DEMO_KERNEL_CHECKSUMS.get(kernel_name, "mock_checksum")
38
- actual = compute_output_checksum(optimized_code)
39
-
40
  # In mock mode, indicate this is simulated verification
41
  is_mock = not ROCM_AVAILABLE
42
-
43
  verification = VerificationResult(
44
  compiled_successfully=True,
45
  executed_without_error=True,
@@ -48,18 +41,12 @@ def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationRes
48
  actual_checksum=actual,
49
  mock_mode=is_mock
50
  )
51
-
52
- # For demo purposes, simulate verification
53
- if kernel_name in DEMO_KERNEL_CHECKSUMS:
54
- # Simulate successful verification on iteration 2, failed on iteration 1
55
- import time
56
- current_time = int(time.time())
57
- if current_time % 2 == 0: # Simulate alternating success/failure
58
- verification.output_matches_expected = True
59
- verification.checksum_computed = DEMO_KERNEL_CHECKSUMS[kernel_name]
60
- else:
61
- verification.checksum_computed = "wrong_checksum_demo"
62
-
63
  return verification
64
 
65
 
@@ -67,27 +54,24 @@ def run(optimized_code: str, analyzer_result: AnalyzerResult,
67
  iteration: int = 1, kernel_name: str = "matrix_multiply") -> TesterResult:
68
  """
69
  On AMD Cloud (ROCM_AVAILABLE=true): runs real hipcc + rocprof
70
- Locally: returns realistic mocked results
71
-
72
- Controlled failure: iteration 1 always performs worse than baseline.
73
- Iteration 2 shows the improvement. This is intentional demo design.
74
  """
75
  rocprof_wrapper = RocprofWrapper()
76
-
77
  # Add verification for demo kernels
78
  verification = None
79
  if kernel_name in DEMO_KERNEL_CHECKSUMS:
80
  verification = verify_demo_kernel(kernel_name, optimized_code)
81
-
82
  if ROCM_AVAILABLE:
83
  return _run_real(optimized_code, analyzer_result, iteration, rocprof_wrapper, verification)
84
  else:
85
- # Use mock data from RocprofWrapper and convert to TesterResult
86
- profiling_data = rocprof_wrapper._get_mock_profiling_data()
87
- return _convert_profiling_to_tester_result(profiling_data, analyzer_result, iteration, kernel_name, verification)
88
 
89
 
90
- def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: AnalyzerResult, iteration: int, kernel_name: str, verification: VerificationResult = None) -> TesterResult:
91
  """Convert RocprofWrapper output to TesterResult format"""
92
  if not profiling_data.get('success', False):
93
  return TesterResult(
@@ -100,25 +84,25 @@ def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: A
100
  notes=profiling_data.get('error', 'Unknown profiling error'),
101
  verification=verification
102
  )
103
-
104
  exec_ms = profiling_data.get('execution_time_ms', 0.0)
105
  bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
106
-
107
- # Calculate speedup based on iteration (controlled failure pattern)
108
- # To save time for the user, we only "fail" the first iteration for 'custom' code.
109
- # For demo kernels, we show the improvement immediately (skipping the 30s retry loop).
110
- is_demo = kernel_name in ["vector_add", "matrix_multiply", "convolution_2d", "reduction"]
111
-
112
- if iteration == 1 and not is_demo:
113
- speedup = round(0.8 + (hash(kernel_name) % 10) / 100, 2) # 0.80-0.89
114
- notes = "Global memory bandwidth underutilized. Shared memory tiling not yet applied. Re-optimization needed."
 
 
115
  else:
116
- if analyzer_result.workload_type == WorkloadType.MEMORY_BOUND:
117
- speedup = round(1.3 + (hash(kernel_name) % 20) / 100, 2) # 1.30-1.49
118
- else:
119
- speedup = round(1.15 + (hash(kernel_name) % 15) / 100, 2) # 1.15-1.29
120
- notes = "Optimization successful. Shared memory tiling applied and memory coalescing fixed for MI300X."
121
-
122
  return TesterResult(
123
  success=True,
124
  iteration=iteration,
@@ -135,7 +119,7 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
135
  """Real hipcc + rocprof execution on MI300X."""
136
  # Compile the code
137
  success, message = rocprof_wrapper.compile_hip_code(code)
138
-
139
  if not success:
140
  return TesterResult(
141
  success=False,
@@ -147,10 +131,11 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
147
  notes=f"Compilation failed: {message}",
148
  verification=verification
149
  )
150
-
151
  # Run with profiling
152
- profiling_data = rocprof_wrapper.run_with_profiling(message.split(": ")[-1]) # Extract executable path
153
-
 
154
  if not profiling_data.get('success', False):
155
  return TesterResult(
156
  success=False,
@@ -162,11 +147,11 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
162
  notes=f"Profiling failed: {profiling_data.get('error', 'Unknown error')}",
163
  verification=verification
164
  )
165
-
166
  exec_ms = profiling_data.get('execution_time_ms', 0.0)
167
  bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
168
- speedup = _calculate_speedup(exec_ms, analyzer_result, iteration)
169
-
170
  return TesterResult(
171
  success=True,
172
  iteration=iteration,
@@ -178,8 +163,9 @@ def _run_real(code: str, analyzer_result: AnalyzerResult, iteration: int, rocpro
178
  )
179
 
180
 
181
- def _calculate_speedup(exec_ms: float, analyzer_result: AnalyzerResult, iteration: int) -> float:
182
  """Estimate speedup relative to baseline HIP."""
183
- if iteration == 1:
184
- return round(random.uniform(0.80, 0.90), 2)
185
- return round(random.uniform(1.20, 1.40), 2)
 
 
1
  import os
 
 
 
2
  import hashlib
3
+ from ..models import TesterResult, AnalyzerResult, VerificationResult
4
+ from ..tools.rocprof_wrapper import RocprofWrapper
5
 
6
  # Set ROCM_AVAILABLE=true on AMD Cloud
7
  ROCM_AVAILABLE = os.environ.get("ROCM_AVAILABLE", "false").lower() == "true"
 
16
  }
17
 
18
 
19
+ def compute_code_checksum(code_text: str, sample_size: int = 400) -> str:
20
+ """Compute a short checksum from code text for traceability in mock mode."""
21
+ if not code_text:
22
  return "empty"
23
+
24
+ sample = code_text[:sample_size]
25
+ return hashlib.sha256(sample.encode()).hexdigest()[:32]
 
 
 
 
26
 
27
 
28
  def verify_demo_kernel(kernel_name: str, optimized_code: str) -> VerificationResult:
29
  """Verify demo kernel execution and output correctness"""
30
  expected = DEMO_KERNEL_CHECKSUMS.get(kernel_name, "mock_checksum")
31
+ actual = compute_code_checksum(optimized_code)
32
+
33
  # In mock mode, indicate this is simulated verification
34
  is_mock = not ROCM_AVAILABLE
35
+
36
  verification = VerificationResult(
37
  compiled_successfully=True,
38
  executed_without_error=True,
 
41
  actual_checksum=actual,
42
  mock_mode=is_mock
43
  )
44
+
45
+ # Do not fabricate pass/fail in mock mode. Surface that verification is simulated.
46
+ if is_mock:
47
+ verification.output_matches_expected = False
48
+ verification.checksum_computed = actual
49
+
 
 
 
 
 
 
50
  return verification
51
 
52
 
 
54
  iteration: int = 1, kernel_name: str = "matrix_multiply") -> TesterResult:
55
  """
56
  On AMD Cloud (ROCM_AVAILABLE=true): runs real hipcc + rocprof
57
+ Locally: returns mock profiling results labeled as simulated.
 
 
 
58
  """
59
  rocprof_wrapper = RocprofWrapper()
60
+
61
  # Add verification for demo kernels
62
  verification = None
63
  if kernel_name in DEMO_KERNEL_CHECKSUMS:
64
  verification = verify_demo_kernel(kernel_name, optimized_code)
65
+
66
  if ROCM_AVAILABLE:
67
  return _run_real(optimized_code, analyzer_result, iteration, rocprof_wrapper, verification)
68
  else:
69
+ # In non-ROCm environments, run_with_profiling returns simulated metrics.
70
+ profiling_data = rocprof_wrapper.run_with_profiling("mock_executable")
71
+ return _convert_profiling_to_tester_result(profiling_data, analyzer_result, iteration, verification)
72
 
73
 
74
+ def _convert_profiling_to_tester_result(profiling_data: dict, analyzer_result: AnalyzerResult, iteration: int, verification: VerificationResult = None) -> TesterResult:
75
  """Convert RocprofWrapper output to TesterResult format"""
76
  if not profiling_data.get('success', False):
77
  return TesterResult(
 
84
  notes=profiling_data.get('error', 'Unknown profiling error'),
85
  verification=verification
86
  )
87
+
88
  exec_ms = profiling_data.get('execution_time_ms', 0.0)
89
  bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
90
+
91
+ baseline_ms = profiling_data.get('baseline_time_ms', 100.0)
92
+ if exec_ms > 0:
93
+ speedup = round(baseline_ms / exec_ms, 2)
94
+ else:
95
+ speedup = 0.0
96
+
97
+ if speedup < 1.0:
98
+ notes = "Simulated profile indicates regression vs baseline. Retry with an alternative optimization strategy."
99
+ elif speedup < 1.1:
100
+ notes = "Simulated profile indicates marginal improvement. Optimization may be memory- or launch-bound."
101
  else:
102
+ notes = "Simulated profile indicates improvement vs baseline after optimization."
103
+
104
+ notes += " Mock mode is enabled (ROCM_AVAILABLE=false); use real ROCm hardware for authoritative numbers."
105
+
 
 
106
  return TesterResult(
107
  success=True,
108
  iteration=iteration,
 
119
  """Real hipcc + rocprof execution on MI300X."""
120
  # Compile the code
121
  success, message = rocprof_wrapper.compile_hip_code(code)
122
+
123
  if not success:
124
  return TesterResult(
125
  success=False,
 
131
  notes=f"Compilation failed: {message}",
132
  verification=verification
133
  )
134
+
135
  # Run with profiling
136
+ profiling_data = rocprof_wrapper.run_with_profiling(
137
+ message.split(": ")[-1]) # Extract executable path
138
+
139
  if not profiling_data.get('success', False):
140
  return TesterResult(
141
  success=False,
 
147
  notes=f"Profiling failed: {profiling_data.get('error', 'Unknown error')}",
148
  verification=verification
149
  )
150
+
151
  exec_ms = profiling_data.get('execution_time_ms', 0.0)
152
  bandwidth = profiling_data.get('memory_bandwidth_gbps', 0.0)
153
+ speedup = _calculate_speedup(exec_ms)
154
+
155
  return TesterResult(
156
  success=True,
157
  iteration=iteration,
 
163
  )
164
 
165
 
166
+ def _calculate_speedup(exec_ms: float) -> float:
167
  """Estimate speedup relative to baseline HIP."""
168
+ if exec_ms <= 0:
169
+ return 0.0
170
+ baseline_ms = 100.0
171
+ return round(baseline_ms / exec_ms, 2)
backend/agents/translator.py CHANGED
@@ -1,21 +1,24 @@
1
- import json
2
- import re
3
- from models import TranslatorResult, AnalyzerResult
4
- from tools.llm_client import LLMClient
5
- from tools.hipify_wrapper import HipifyWrapper
6
- from tools.json_utils import safe_json_loads
7
 
8
  llm_client = LLMClient()
9
  hipify_wrapper = HipifyWrapper()
10
 
 
11
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
12
  """Wrapper for LLM client chat completion"""
13
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
14
 
 
15
  def run_hipify(cuda_code: str) -> str:
16
  """Wrapper for hipify wrapper"""
17
  return hipify_wrapper.hipify_code(cuda_code)
18
 
 
19
  SYSTEM_PROMPT = """You are an expert AMD ROCm/HIP engineer. You receive CUDA code that has already gone through hipify (basic syntax replacement) and you fix what hipify missed.
20
 
21
  Your specific jobs:
 
1
+ # pylint: disable=broad-exception-caught
2
+
3
+ from ..models import TranslatorResult, AnalyzerResult
4
+ from ..tools.llm_client import LLMClient
5
+ from ..tools.hipify_wrapper import HipifyWrapper
6
+ from ..tools.json_utils import safe_json_loads
7
 
8
  llm_client = LLMClient()
9
  hipify_wrapper = HipifyWrapper()
10
 
11
+
12
  def chat_complete(messages: list, temperature: float = 0.7, max_tokens: int = 4000) -> str:
13
  """Wrapper for LLM client chat completion"""
14
  return llm_client.chat_completion(messages, temperature=temperature, max_tokens=max_tokens)
15
 
16
+
17
  def run_hipify(cuda_code: str) -> str:
18
  """Wrapper for hipify wrapper"""
19
  return hipify_wrapper.hipify_code(cuda_code)
20
 
21
+
22
  SYSTEM_PROMPT = """You are an expert AMD ROCm/HIP engineer. You receive CUDA code that has already gone through hipify (basic syntax replacement) and you fix what hipify missed.
23
 
24
  Your specific jobs:
backend/main.py CHANGED
@@ -1,3 +1,13 @@
 
 
 
 
 
 
 
 
 
 
1
  import json
2
  import asyncio
3
  import zipfile
@@ -9,18 +19,10 @@ from dotenv import load_dotenv
9
  # Load environment variables from .env file
10
  load_dotenv()
11
 
12
- from fastapi import FastAPI, HTTPException
13
- from fastapi.middleware.cors import CORSMiddleware
14
- from fastapi.responses import StreamingResponse
15
- from fastapi.staticfiles import StaticFiles
16
- from models import PortRequest, VerificationResult
17
- from agents.coordinator import run_pipeline
18
- from agents.tester import run as run_tester
19
- from agents.analyzer import AnalyzerResult, WorkloadType
20
 
21
  app = FastAPI(
22
  title="ROCmPort AI",
23
- description="The fastest way to escape CUDA lock-in and run on AMD.",
24
  version="1.0.0",
25
  contact={
26
  "name": "Tazwar Ahnaf Enan",
@@ -59,7 +61,8 @@ async def port_cuda_code(req: PortRequest):
59
  async for event in run_pipeline(req.cuda_code, req.kernel_name or "custom", req.simple_mode or False):
60
  data = json.dumps(event.model_dump())
61
  yield f"data: {data}\n\n"
62
- await asyncio.sleep(0.05) # Let the client breathe between events
 
63
  except Exception as e:
64
  error_event = {
65
  "agent": "coordinator",
@@ -81,6 +84,121 @@ async def port_cuda_code(req: PortRequest):
81
  )
82
 
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  @app.post("/recompile")
85
  async def recompile_edited_code(req: dict):
86
  """
@@ -90,10 +208,10 @@ async def recompile_edited_code(req: dict):
90
  try:
91
  edited_code = req.get("edited_code")
92
  kernel_name = req.get("kernel_name", "custom")
93
-
94
  if not edited_code or len(edited_code.strip()) < 10:
95
  raise HTTPException(status_code=400, detail="No HIP code provided")
96
-
97
  # Create a mock analyzer result for testing
98
  analyzer_result = AnalyzerResult(
99
  kernels_found=["test_kernel"],
@@ -105,17 +223,18 @@ async def recompile_edited_code(req: dict):
105
  difficulty="Easy",
106
  difficulty_reason="Simple test kernel"
107
  )
108
-
109
  # Run tester with edited code
110
  tester_result = await asyncio.to_thread(run_tester, edited_code, analyzer_result, 2, kernel_name)
111
-
112
  return {
113
  "success": True,
114
  "result": tester_result.model_dump()
115
  }
116
-
117
  except Exception as e:
118
- raise HTTPException(status_code=500, detail=f"Recompilation failed: {str(e)}")
 
119
 
120
 
121
  @app.post("/export")
@@ -128,7 +247,7 @@ async def export_migration_package(req: dict):
128
  original_cuda = req.get("original_cuda")
129
  final_rocm = req.get("final_rocm")
130
  migration_report = req.get("migration_report", {})
131
-
132
  with tempfile.NamedTemporaryFile(delete=False, suffix=".zip") as tmp_file:
133
  with zipfile.ZipFile(tmp_file, 'w', zipfile.ZIP_DEFLATED) as zf:
134
  # Add professional unified diff
@@ -140,7 +259,7 @@ async def export_migration_package(req: dict):
140
  )
141
  diff_text = "".join(diff)
142
  zf.writestr("migration.diff", diff_text)
143
-
144
  # Add migration report as markdown
145
  md_report = f"""# ROCmPort AI Migration Report
146
 
@@ -155,43 +274,44 @@ async def export_migration_package(req: dict):
155
  ## Cost Impact
156
  {migration_report.get('cost_estimate', 'N/A')}
157
 
158
- Generated by ROCmPort AI - The fastest way to escape CUDA lock-in and run on AMD.
159
  """
160
  zf.writestr("migration_report.md", md_report)
161
-
162
  # Read the zip file content
163
  with open(tmp_file, 'rb') as f:
164
  zip_content = f.read()
165
-
166
  # Clean up
167
  os.unlink(tmp_file)
168
-
169
  from fastapi.responses import Response
170
  return Response(
171
  content=zip_content,
172
  media_type="application/zip",
173
- headers={"Content-Disposition": "attachment; filename=rocmport_migration.zip"}
 
174
  )
175
-
176
  except Exception as e:
177
- raise HTTPException(status_code=500, detail=f"Export failed: {str(e)}")
 
178
 
179
 
180
  @app.get("/demo-kernels")
181
  async def list_demo_kernels():
182
- import os
183
  kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
184
  kernels = {}
185
  for fname in os.listdir(kernels_dir):
186
  if fname.endswith(".cu"):
187
  name = fname.replace(".cu", "")
188
- with open(os.path.join(kernels_dir, fname)) as f:
189
  kernels[name] = f.read()
190
  return kernels
191
 
192
 
193
  # Serve frontend if built
194
- import os
195
  frontend_path = os.path.join(os.path.dirname(__file__), "..", "frontend")
196
  if os.path.exists(frontend_path):
197
- app.mount("/", StaticFiles(directory=frontend_path, html=True), name="frontend")
 
 
1
+ # pylint: disable=broad-exception-caught
2
+
3
+ from backend.agents.analyzer import AnalyzerResult, WorkloadType
4
+ from backend.agents.tester import run as run_tester
5
+ from backend.agents.coordinator import run_pipeline
6
+ from backend.models import PortRequest, ColdStartRequest, AggregateMetricsRequest
7
+ from fastapi.staticfiles import StaticFiles
8
+ from fastapi.responses import StreamingResponse
9
+ from fastapi.middleware.cors import CORSMiddleware
10
+ from fastapi import FastAPI, HTTPException
11
  import json
12
  import asyncio
13
  import zipfile
 
19
  # Load environment variables from .env file
20
  load_dotenv()
21
 
 
 
 
 
 
 
 
 
22
 
23
  app = FastAPI(
24
  title="ROCmPort AI",
25
+ description="CUDA-to-ROCm migration assistant with iterative testing and optimization.",
26
  version="1.0.0",
27
  contact={
28
  "name": "Tazwar Ahnaf Enan",
 
61
  async for event in run_pipeline(req.cuda_code, req.kernel_name or "custom", req.simple_mode or False):
62
  data = json.dumps(event.model_dump())
63
  yield f"data: {data}\n\n"
64
+ # Let the client breathe between events
65
+ await asyncio.sleep(0.05)
66
  except Exception as e:
67
  error_event = {
68
  "agent": "coordinator",
 
84
  )
85
 
86
 
87
+ async def _collect_pipeline_events(cuda_code: str, kernel_name: str, simple_mode: bool = False) -> tuple[list[dict], dict | None]:
88
+ """Collect all pipeline events and extract final report payload when present."""
89
+ events: list[dict] = []
90
+ final_report = None
91
+
92
+ async for event in run_pipeline(cuda_code, kernel_name, simple_mode):
93
+ dumped = event.model_dump()
94
+ events.append(dumped)
95
+ if dumped.get("agent") == "coordinator" and dumped.get("status") == "done" and dumped.get("detail"):
96
+ try:
97
+ final_report = json.loads(dumped["detail"])
98
+ except (json.JSONDecodeError, TypeError):
99
+ final_report = None
100
+
101
+ return events, final_report
102
+
103
+
104
+ def _has_adaptation_loop(events: list[dict]) -> bool:
105
+ """Return True when the run shows retry-based adaptation behavior."""
106
+ saw_regression = any(
107
+ e.get("agent") == "tester" and e.get(
108
+ "status") == "failed" and "regression" in str(e.get("message", "")).lower()
109
+ for e in events
110
+ )
111
+ saw_retry = any(
112
+ e.get("agent") == "optimizer" and e.get("status") == "retrying"
113
+ for e in events
114
+ )
115
+ return saw_regression and saw_retry
116
+
117
+
118
+ @app.post("/cold-start")
119
+ async def cold_start_run(req: ColdStartRequest):
120
+ """
121
+ Single-run endpoint for unknown pasted CUDA input.
122
+ Returns full trace plus summary trust signals.
123
+ """
124
+ if not req.cuda_code or len(req.cuda_code.strip()) < 10:
125
+ raise HTTPException(status_code=400, detail="No CUDA code provided")
126
+
127
+ events, report = await _collect_pipeline_events(req.cuda_code, req.kernel_name or "unknown_input", False)
128
+
129
+ if report is None:
130
+ raise HTTPException(
131
+ status_code=500, detail="Pipeline completed without final report")
132
+
133
+ return {
134
+ "success": True,
135
+ "kernel_name": req.kernel_name or "unknown_input",
136
+ "adaptation_loop_observed": _has_adaptation_loop(events),
137
+ "event_count": len(events),
138
+ "report": report,
139
+ "events": events,
140
+ }
141
+
142
+
143
+ @app.post("/aggregate-metric")
144
+ async def aggregate_metric(req: AggregateMetricsRequest):
145
+ """
146
+ Evaluate multiple kernels and return one aggregate metric:
147
+ average speedup vs baseline HIP.
148
+ """
149
+ kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
150
+ requested = req.kernel_names or []
151
+
152
+ available: dict[str, str] = {}
153
+ for fname in os.listdir(kernels_dir):
154
+ if fname.endswith(".cu"):
155
+ kname = fname.replace(".cu", "")
156
+ with open(os.path.join(kernels_dir, fname), encoding="utf-8") as f:
157
+ available[kname] = f.read()
158
+
159
+ selected_names = requested if requested else sorted(available.keys())
160
+ selected_names = [name for name in selected_names if name in available]
161
+
162
+ if not selected_names:
163
+ raise HTTPException(
164
+ status_code=400, detail="No valid kernels selected for aggregation")
165
+
166
+ runs = []
167
+ speedups = []
168
+
169
+ for name in selected_names:
170
+ events, report = await _collect_pipeline_events(available[name], name, False)
171
+ if report is None:
172
+ continue
173
+
174
+ speedup = float(report.get("speedup", 0.0) or 0.0)
175
+ speedups.append(speedup)
176
+ runs.append({
177
+ "kernel": name,
178
+ "speedup": speedup,
179
+ "adaptation_loop_observed": _has_adaptation_loop(events),
180
+ "iterations": report.get("iterations", 1),
181
+ })
182
+
183
+ if not speedups:
184
+ raise HTTPException(
185
+ status_code=500, detail="Unable to produce aggregate metric from selected kernels")
186
+
187
+ avg_speedup = round(sum(speedups) / len(speedups), 3)
188
+ avg_improvement_pct = round((avg_speedup - 1.0) * 100.0, 2)
189
+
190
+ return {
191
+ "success": True,
192
+ "baseline": "straight hipify output with minimal compile edits",
193
+ "kernel_count": len(speedups),
194
+ "aggregate_metric": {
195
+ "average_speedup_vs_baseline": avg_speedup,
196
+ "average_improvement_percent": avg_improvement_pct,
197
+ },
198
+ "runs": runs,
199
+ }
200
+
201
+
202
  @app.post("/recompile")
203
  async def recompile_edited_code(req: dict):
204
  """
 
208
  try:
209
  edited_code = req.get("edited_code")
210
  kernel_name = req.get("kernel_name", "custom")
211
+
212
  if not edited_code or len(edited_code.strip()) < 10:
213
  raise HTTPException(status_code=400, detail="No HIP code provided")
214
+
215
  # Create a mock analyzer result for testing
216
  analyzer_result = AnalyzerResult(
217
  kernels_found=["test_kernel"],
 
223
  difficulty="Easy",
224
  difficulty_reason="Simple test kernel"
225
  )
226
+
227
  # Run tester with edited code
228
  tester_result = await asyncio.to_thread(run_tester, edited_code, analyzer_result, 2, kernel_name)
229
+
230
  return {
231
  "success": True,
232
  "result": tester_result.model_dump()
233
  }
234
+
235
  except Exception as e:
236
+ raise HTTPException(
237
+ status_code=500, detail=f"Recompilation failed: {str(e)}") from e
238
 
239
 
240
  @app.post("/export")
 
247
  original_cuda = req.get("original_cuda")
248
  final_rocm = req.get("final_rocm")
249
  migration_report = req.get("migration_report", {})
250
+
251
  with tempfile.NamedTemporaryFile(delete=False, suffix=".zip") as tmp_file:
252
  with zipfile.ZipFile(tmp_file, 'w', zipfile.ZIP_DEFLATED) as zf:
253
  # Add professional unified diff
 
259
  )
260
  diff_text = "".join(diff)
261
  zf.writestr("migration.diff", diff_text)
262
+
263
  # Add migration report as markdown
264
  md_report = f"""# ROCmPort AI Migration Report
265
 
 
274
  ## Cost Impact
275
  {migration_report.get('cost_estimate', 'N/A')}
276
 
277
+ Generated by ROCmPort AI.
278
  """
279
  zf.writestr("migration_report.md", md_report)
280
+
281
  # Read the zip file content
282
  with open(tmp_file, 'rb') as f:
283
  zip_content = f.read()
284
+
285
  # Clean up
286
  os.unlink(tmp_file)
287
+
288
  from fastapi.responses import Response
289
  return Response(
290
  content=zip_content,
291
  media_type="application/zip",
292
+ headers={
293
+ "Content-Disposition": "attachment; filename=rocmport_migration.zip"}
294
  )
295
+
296
  except Exception as e:
297
+ raise HTTPException(
298
+ status_code=500, detail=f"Export failed: {str(e)}") from e
299
 
300
 
301
  @app.get("/demo-kernels")
302
  async def list_demo_kernels():
 
303
  kernels_dir = os.path.join(os.path.dirname(__file__), "demo_kernels")
304
  kernels = {}
305
  for fname in os.listdir(kernels_dir):
306
  if fname.endswith(".cu"):
307
  name = fname.replace(".cu", "")
308
+ with open(os.path.join(kernels_dir, fname), encoding="utf-8") as f:
309
  kernels[name] = f.read()
310
  return kernels
311
 
312
 
313
  # Serve frontend if built
 
314
  frontend_path = os.path.join(os.path.dirname(__file__), "..", "frontend")
315
  if os.path.exists(frontend_path):
316
+ app.mount("/", StaticFiles(directory=frontend_path,
317
+ html=True), name="frontend")
backend/models.py CHANGED
@@ -23,6 +23,15 @@ class PortRequest(BaseModel):
23
  simple_mode: Optional[bool] = False # For "Explain Like I'm 5" feature
24
 
25
 
 
 
 
 
 
 
 
 
 
26
  class AgentEvent(BaseModel):
27
  agent: str # analyzer | translator | optimizer | tester | coordinator
28
  status: AgentStatus
@@ -83,7 +92,8 @@ class TesterResult(BaseModel):
83
  execution_ms: float
84
  bottleneck: str
85
  notes: str
86
- verification: Optional[VerificationResult] = None # Trust layer verification
 
87
 
88
 
89
  class FinalReport(BaseModel):
@@ -96,5 +106,7 @@ class FinalReport(BaseModel):
96
  iterations: int
97
  hip_code: str
98
  optimized_code: str
 
99
  cost_estimate: Optional[CostEstimate] = None # 💰 Cost impact estimator
100
- simplified_explanation: Optional[str] = None # For "Explain Like I'm 5" mode
 
 
23
  simple_mode: Optional[bool] = False # For "Explain Like I'm 5" feature
24
 
25
 
26
+ class ColdStartRequest(BaseModel):
27
+ cuda_code: str
28
+ kernel_name: Optional[str] = "unknown_input"
29
+
30
+
31
+ class AggregateMetricsRequest(BaseModel):
32
+ kernel_names: Optional[List[str]] = None
33
+
34
+
35
  class AgentEvent(BaseModel):
36
  agent: str # analyzer | translator | optimizer | tester | coordinator
37
  status: AgentStatus
 
92
  execution_ms: float
93
  bottleneck: str
94
  notes: str
95
+ # Trust layer verification
96
+ verification: Optional[VerificationResult] = None
97
 
98
 
99
  class FinalReport(BaseModel):
 
106
  iterations: int
107
  hip_code: str
108
  optimized_code: str
109
+ verification: Optional[VerificationResult] = None
110
  cost_estimate: Optional[CostEstimate] = None # 💰 Cost impact estimator
111
+ # For "Explain Like I'm 5" mode
112
+ simplified_explanation: Optional[str] = None
backend/prompts/coordinator_prompt.txt CHANGED
@@ -54,7 +54,7 @@ You'll receive results from each agent:
54
  - Always compare "Optimized ROCm vs Baseline HIP" (straight hipify output)
55
  - Never claim "faster than NVIDIA CUDA" - be honest and credible
56
  - Explain WHY AMD hardware advantages apply to this specific workload
57
- - Include controlled failure/recovery story if it happened
58
  - Provide concrete, actionable insights
59
 
60
  Focus on demonstrating that your agents add real value beyond basic hipify - that's the core claim.
 
54
  - Always compare "Optimized ROCm vs Baseline HIP" (straight hipify output)
55
  - Never claim "faster than NVIDIA CUDA" - be honest and credible
56
  - Explain WHY AMD hardware advantages apply to this specific workload
57
+ - Include retry and recovery details only when regression actually occurred
58
  - Provide concrete, actionable insights
59
 
60
  Focus on demonstrating that your agents add real value beyond basic hipify - that's the core claim.
backend/tools/hipify_wrapper.py CHANGED
@@ -1,15 +1,14 @@
1
  import subprocess
2
  import tempfile
3
  import os
4
- import re
5
 
6
 
7
  class HipifyWrapper:
8
  """Wrapper for hipify-clang tool with Python fallback"""
9
-
10
  def __init__(self):
11
  pass
12
-
13
  def hipify_code(self, cuda_code: str) -> tuple[str, list[dict]]:
14
  """
15
  Try to run real hipify-clang if available.
@@ -24,18 +23,19 @@ class HipifyWrapper:
24
 
25
  # Fallback: Python pattern replacement
26
  return self._python_hipify(cuda_code)
27
-
28
  def _hipify_available(self) -> bool:
29
  try:
30
  result = subprocess.run(
31
  ["hipify-clang", "--version"],
32
- capture_output=True, timeout=5
33
  )
34
  return result.returncode == 0
35
  except (FileNotFoundError, subprocess.TimeoutExpired):
36
  return False
37
 
38
  def _run_real_hipify(self, cuda_code: str) -> tuple[str, list[dict]] | None:
 
39
  try:
40
  with tempfile.NamedTemporaryFile(suffix=".cu", mode="w", delete=False) as f:
41
  f.write(cuda_code)
@@ -43,36 +43,41 @@ class HipifyWrapper:
43
 
44
  # Use -- separator to pass compiler flags to the internal Clang parser
45
  # This is critical for Clang-based tools to distinguish tool flags from compiler flags.
46
- cmd = ["hipify-clang", tmp_path, "--", "-nocudalib", "-nocudainc", "-arch=sm_60"]
47
-
 
48
  # Debug log for build engineering
49
  print(f"DEBUG: Running hipify-clang command: {' '.join(cmd)}")
50
-
51
  # Set environment variable just in case hipify-clang invokes nvcc internally
52
  env = os.environ.copy()
53
  env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
54
-
55
  result = subprocess.run(
56
  cmd,
57
  capture_output=True, text=True, timeout=30,
58
- env=env
 
59
  )
60
 
61
  if result.returncode != 0:
62
- print(f"DEBUG: hipify-clang failed with return code {result.returncode}")
 
63
  print(f"DEBUG: stderr: {result.stderr}")
64
 
65
  if result.returncode == 0 and result.stdout:
66
- changes = self._detect_changes(cuda_code, result.stdout, source="hipify-clang")
 
67
  return result.stdout, changes
68
 
69
  return None
70
- except Exception:
71
  return None
72
  finally:
73
  try:
74
- os.unlink(tmp_path)
75
- except Exception:
 
76
  pass
77
 
78
  def _python_hipify(self, cuda_code: str) -> tuple[str, list[dict]]:
 
1
  import subprocess
2
  import tempfile
3
  import os
 
4
 
5
 
6
  class HipifyWrapper:
7
  """Wrapper for hipify-clang tool with Python fallback"""
8
+
9
  def __init__(self):
10
  pass
11
+
12
  def hipify_code(self, cuda_code: str) -> tuple[str, list[dict]]:
13
  """
14
  Try to run real hipify-clang if available.
 
23
 
24
  # Fallback: Python pattern replacement
25
  return self._python_hipify(cuda_code)
26
+
27
  def _hipify_available(self) -> bool:
28
  try:
29
  result = subprocess.run(
30
  ["hipify-clang", "--version"],
31
+ capture_output=True, timeout=5, check=False
32
  )
33
  return result.returncode == 0
34
  except (FileNotFoundError, subprocess.TimeoutExpired):
35
  return False
36
 
37
  def _run_real_hipify(self, cuda_code: str) -> tuple[str, list[dict]] | None:
38
+ tmp_path = None
39
  try:
40
  with tempfile.NamedTemporaryFile(suffix=".cu", mode="w", delete=False) as f:
41
  f.write(cuda_code)
 
43
 
44
  # Use -- separator to pass compiler flags to the internal Clang parser
45
  # This is critical for Clang-based tools to distinguish tool flags from compiler flags.
46
+ cmd = ["hipify-clang", tmp_path, "--",
47
+ "-nocudalib", "-nocudainc", "-arch=sm_60"]
48
+
49
  # Debug log for build engineering
50
  print(f"DEBUG: Running hipify-clang command: {' '.join(cmd)}")
51
+
52
  # Set environment variable just in case hipify-clang invokes nvcc internally
53
  env = os.environ.copy()
54
  env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
55
+
56
  result = subprocess.run(
57
  cmd,
58
  capture_output=True, text=True, timeout=30,
59
+ env=env,
60
+ check=False,
61
  )
62
 
63
  if result.returncode != 0:
64
+ print(
65
+ f"DEBUG: hipify-clang failed with return code {result.returncode}")
66
  print(f"DEBUG: stderr: {result.stderr}")
67
 
68
  if result.returncode == 0 and result.stdout:
69
+ changes = self._detect_changes(
70
+ cuda_code, result.stdout, source="hipify-clang")
71
  return result.stdout, changes
72
 
73
  return None
74
+ except (OSError, subprocess.SubprocessError):
75
  return None
76
  finally:
77
  try:
78
+ if tmp_path and os.path.exists(tmp_path):
79
+ os.unlink(tmp_path)
80
+ except OSError:
81
  pass
82
 
83
  def _python_hipify(self, cuda_code: str) -> tuple[str, list[dict]]:
backend/tools/rocprof_wrapper.py CHANGED
@@ -1,105 +1,113 @@
1
  import subprocess
2
  import tempfile
3
  import os
4
- import json
5
  import re
6
- from typing import Dict, List, Optional, Tuple
7
- from pathlib import Path
8
 
9
  class RocprofWrapper:
10
  """Wrapper for AMD rocprof profiler and hipcc compiler"""
11
-
12
  def __init__(self):
13
- self.rocm_available = os.getenv("ROCM_AVAILABLE", "false").lower() == "true"
 
14
  self.hipcc_path = os.getenv("HIPCC_PATH", "hipcc")
15
  self.rocprof_path = os.getenv("ROCPROF_PATH", "rocprof")
16
-
17
  def compile_hip_code(self, hip_code: str, output_file: str = None) -> Tuple[bool, str]:
18
  """Compile HIP code using hipcc"""
19
  if not self.rocm_available:
20
  return True, "Mock compilation successful (ROCm not available)"
21
-
22
  try:
23
  with tempfile.NamedTemporaryFile(mode='w', suffix='.hip', delete=False) as f:
24
  f.write(hip_code)
25
  temp_file = f.name
26
-
27
  if output_file is None:
28
  output_file = temp_file.replace('.hip', '.out')
29
-
30
  # Add -nocudalib and -arch=sm_60 to solve "Cannot find libdevice for sm_52" error
31
  # This ensures compilation works even if CUDA device libraries are missing.
32
- cmd = [self.hipcc_path, '-o', output_file, temp_file, '-nocudalib', '-arch=sm_60']
33
-
 
34
  # Set environment variable just in case hipcc invokes nvcc internally
35
  env = os.environ.copy()
36
  env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
37
-
38
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=60, env=env)
39
-
 
40
  # Cleanup
41
  os.unlink(temp_file)
42
-
43
  if result.returncode == 0:
44
  return True, f"Compilation successful: {output_file}"
45
  else:
46
  return False, f"Compilation failed: {result.stderr}"
47
-
48
  except subprocess.TimeoutExpired:
49
  return False, "Compilation timed out"
50
- except Exception as e:
51
  return False, f"Compilation error: {str(e)}"
52
-
53
  def run_with_profiling(self, executable_path: str, args: List[str] = None) -> Dict:
54
  """Run executable with rocprof profiling"""
55
  if not self.rocm_available:
56
  # Return mock profiling data
57
  return self._get_mock_profiling_data()
58
-
59
  try:
60
  if args is None:
61
  args = []
62
-
63
  # Run with rocprof
64
- cmd = [self.rocprof_path, '-i', 'default', '--'] + [executable_path] + args
65
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
66
-
 
 
67
  # Parse rocprof output
68
- profiling_data = self._parse_rocprof_output(result.stdout, result.stderr)
69
-
 
70
  return profiling_data
71
-
72
  except subprocess.TimeoutExpired:
73
  return {"error": "Profiling timed out", "execution_time_ms": 0}
74
- except Exception as e:
75
  return {"error": f"Profiling error: {str(e)}", "execution_time_ms": 0}
76
-
77
- def _parse_rocprof_output(self, stdout: str, stderr: str) -> Dict:
78
  """Parse rocprof output to extract metrics"""
79
  try:
80
  # Look for key metrics in rocprof output
81
  metrics = {}
82
-
83
  # Parse execution time
84
- time_match = re.search(r'Kernel execution time:\s+(\d+\.\d+)\s*ms', stdout)
 
85
  if time_match:
86
  metrics['execution_time_ms'] = float(time_match.group(1))
87
-
88
  # Parse memory bandwidth
89
- bandwidth_match = re.search(r'Memory bandwidth:\s+(\d+\.\d+)\s*GB/s', stdout)
 
90
  if bandwidth_match:
91
- metrics['memory_bandwidth_gbps'] = float(bandwidth_match.group(1))
92
-
 
93
  # Parse GPU utilization
94
  util_match = re.search(r'GPU utilization:\s+(\d+\.\d+)%', stdout)
95
  if util_match:
96
  metrics['gpu_utilization_percent'] = float(util_match.group(1))
97
-
98
  # Parse wavefront count
99
  wave_match = re.search(r'SQ_WAVES:\s+(\d+)', stdout)
100
  if wave_match:
101
  metrics['sq_waves'] = int(wave_match.group(1))
102
-
103
  # If no metrics found, return basic execution info
104
  if not metrics:
105
  metrics = {
@@ -108,47 +116,40 @@ class RocprofWrapper:
108
  'gpu_utilization_percent': 75.0,
109
  'sq_waves': 1024
110
  }
111
-
112
  metrics['success'] = True
113
  return metrics
114
-
115
- except Exception as e:
116
  return {
117
  'success': False,
118
  'error': f'Failed to parse rocprof output: {str(e)}',
119
  'execution_time_ms': 0
120
  }
121
-
 
 
 
 
122
  def _get_mock_profiling_data(self) -> Dict:
123
  """Generate mock profiling data for testing without ROCm"""
124
  import random
125
-
126
- # Simulate controlled failure on first iteration
127
- base_performance = 100.0
128
- iteration = getattr(self, '_iteration', 1)
129
-
130
- if iteration == 1:
131
- # First iteration - worse performance (controlled failure)
132
- execution_time = base_performance * 1.2 # 20% slower
133
- bandwidth = 40.0 # Lower bandwidth utilization
134
- utilization = 60.0 # Lower GPU utilization
135
- else:
136
- # Second iteration - better performance
137
- execution_time = base_performance * 0.75 # 25% faster
138
- bandwidth = 80.0 # Higher bandwidth utilization
139
- utilization = 85.0 # Higher GPU utilization
140
-
141
- self._iteration = iteration + 1
142
-
143
  return {
144
  'success': True,
145
  'execution_time_ms': execution_time,
 
146
  'memory_bandwidth_gbps': bandwidth,
147
  'gpu_utilization_percent': utilization,
148
  'sq_waves': random.randint(800, 1200),
149
- 'iteration': iteration
150
  }
151
-
152
  def get_hardware_info(self) -> Dict:
153
  """Get AMD GPU hardware information"""
154
  if not self.rocm_available:
@@ -159,26 +160,27 @@ class RocprofWrapper:
159
  'memory_bandwidth_tb_s': 5.3,
160
  'wavefront_size': 64
161
  }
162
-
163
  try:
164
  # Try to get real GPU info using rocminfo or similar
165
  cmd = ['rocminfo']
166
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
167
-
 
168
  if result.returncode == 0:
169
  return self._parse_rocminfo(result.stdout)
170
  else:
171
  return self._get_mock_hardware_info()
172
-
173
- except Exception:
174
  return self._get_mock_hardware_info()
175
-
176
- def _parse_rocminfo(self, output: str) -> Dict:
177
  """Parse rocminfo output"""
178
  # This would parse real rocminfo output
179
  # For now, return mock data
180
  return self._get_mock_hardware_info()
181
-
182
  def _get_mock_hardware_info(self) -> Dict:
183
  """Mock hardware info for MI300X"""
184
  return {
 
1
  import subprocess
2
  import tempfile
3
  import os
 
4
  import re
5
+ from typing import Dict, List, Tuple
6
+
7
 
8
  class RocprofWrapper:
9
  """Wrapper for AMD rocprof profiler and hipcc compiler"""
10
+
11
  def __init__(self):
12
+ self.rocm_available = os.getenv(
13
+ "ROCM_AVAILABLE", "false").lower() == "true"
14
  self.hipcc_path = os.getenv("HIPCC_PATH", "hipcc")
15
  self.rocprof_path = os.getenv("ROCPROF_PATH", "rocprof")
16
+
17
  def compile_hip_code(self, hip_code: str, output_file: str = None) -> Tuple[bool, str]:
18
  """Compile HIP code using hipcc"""
19
  if not self.rocm_available:
20
  return True, "Mock compilation successful (ROCm not available)"
21
+
22
  try:
23
  with tempfile.NamedTemporaryFile(mode='w', suffix='.hip', delete=False) as f:
24
  f.write(hip_code)
25
  temp_file = f.name
26
+
27
  if output_file is None:
28
  output_file = temp_file.replace('.hip', '.out')
29
+
30
  # Add -nocudalib and -arch=sm_60 to solve "Cannot find libdevice for sm_52" error
31
  # This ensures compilation works even if CUDA device libraries are missing.
32
+ cmd = [self.hipcc_path, '-o', output_file,
33
+ temp_file, '-nocudalib', '-arch=sm_60']
34
+
35
  # Set environment variable just in case hipcc invokes nvcc internally
36
  env = os.environ.copy()
37
  env['NVCC_APPEND_FLAGS'] = '-nocudalib -arch=sm_60'
38
+
39
+ result = subprocess.run(
40
+ cmd, capture_output=True, text=True, timeout=60, env=env, check=False)
41
+
42
  # Cleanup
43
  os.unlink(temp_file)
44
+
45
  if result.returncode == 0:
46
  return True, f"Compilation successful: {output_file}"
47
  else:
48
  return False, f"Compilation failed: {result.stderr}"
49
+
50
  except subprocess.TimeoutExpired:
51
  return False, "Compilation timed out"
52
+ except (OSError, subprocess.SubprocessError) as e:
53
  return False, f"Compilation error: {str(e)}"
54
+
55
  def run_with_profiling(self, executable_path: str, args: List[str] = None) -> Dict:
56
  """Run executable with rocprof profiling"""
57
  if not self.rocm_available:
58
  # Return mock profiling data
59
  return self._get_mock_profiling_data()
60
+
61
  try:
62
  if args is None:
63
  args = []
64
+
65
  # Run with rocprof
66
+ cmd = [self.rocprof_path, '-i', 'default', '--'] + \
67
+ [executable_path] + args
68
+ result = subprocess.run(
69
+ cmd, capture_output=True, text=True, timeout=120, check=False)
70
+
71
  # Parse rocprof output
72
+ profiling_data = self._parse_rocprof_output(
73
+ result.stdout, result.stderr)
74
+
75
  return profiling_data
76
+
77
  except subprocess.TimeoutExpired:
78
  return {"error": "Profiling timed out", "execution_time_ms": 0}
79
+ except (OSError, subprocess.SubprocessError) as e:
80
  return {"error": f"Profiling error: {str(e)}", "execution_time_ms": 0}
81
+
82
+ def _parse_rocprof_output(self, stdout: str, _stderr: str) -> Dict:
83
  """Parse rocprof output to extract metrics"""
84
  try:
85
  # Look for key metrics in rocprof output
86
  metrics = {}
87
+
88
  # Parse execution time
89
+ time_match = re.search(
90
+ r'Kernel execution time:\s+(\d+\.\d+)\s*ms', stdout)
91
  if time_match:
92
  metrics['execution_time_ms'] = float(time_match.group(1))
93
+
94
  # Parse memory bandwidth
95
+ bandwidth_match = re.search(
96
+ r'Memory bandwidth:\s+(\d+\.\d+)\s*GB/s', stdout)
97
  if bandwidth_match:
98
+ metrics['memory_bandwidth_gbps'] = float(
99
+ bandwidth_match.group(1))
100
+
101
  # Parse GPU utilization
102
  util_match = re.search(r'GPU utilization:\s+(\d+\.\d+)%', stdout)
103
  if util_match:
104
  metrics['gpu_utilization_percent'] = float(util_match.group(1))
105
+
106
  # Parse wavefront count
107
  wave_match = re.search(r'SQ_WAVES:\s+(\d+)', stdout)
108
  if wave_match:
109
  metrics['sq_waves'] = int(wave_match.group(1))
110
+
111
  # If no metrics found, return basic execution info
112
  if not metrics:
113
  metrics = {
 
116
  'gpu_utilization_percent': 75.0,
117
  'sq_waves': 1024
118
  }
119
+
120
  metrics['success'] = True
121
  return metrics
122
+
123
+ except (TypeError, ValueError) as e:
124
  return {
125
  'success': False,
126
  'error': f'Failed to parse rocprof output: {str(e)}',
127
  'execution_time_ms': 0
128
  }
129
+
130
+ def get_mock_profiling_data(self) -> Dict:
131
+ """Public accessor for mock profiling data used by testing layer."""
132
+ return self._get_mock_profiling_data()
133
+
134
  def _get_mock_profiling_data(self) -> Dict:
135
  """Generate mock profiling data for testing without ROCm"""
136
  import random
137
+
138
+ baseline_ms = 100.0
139
+ execution_time = random.uniform(85.0, 115.0)
140
+ bandwidth = random.uniform(35.0, 90.0)
141
+ utilization = random.uniform(55.0, 92.0)
142
+
 
 
 
 
 
 
 
 
 
 
 
 
143
  return {
144
  'success': True,
145
  'execution_time_ms': execution_time,
146
+ 'baseline_time_ms': baseline_ms,
147
  'memory_bandwidth_gbps': bandwidth,
148
  'gpu_utilization_percent': utilization,
149
  'sq_waves': random.randint(800, 1200),
150
+ 'simulated': True
151
  }
152
+
153
  def get_hardware_info(self) -> Dict:
154
  """Get AMD GPU hardware information"""
155
  if not self.rocm_available:
 
160
  'memory_bandwidth_tb_s': 5.3,
161
  'wavefront_size': 64
162
  }
163
+
164
  try:
165
  # Try to get real GPU info using rocminfo or similar
166
  cmd = ['rocminfo']
167
+ result = subprocess.run(
168
+ cmd, capture_output=True, text=True, timeout=10, check=False)
169
+
170
  if result.returncode == 0:
171
  return self._parse_rocminfo(result.stdout)
172
  else:
173
  return self._get_mock_hardware_info()
174
+
175
+ except (OSError, subprocess.SubprocessError):
176
  return self._get_mock_hardware_info()
177
+
178
+ def _parse_rocminfo(self, _output: str) -> Dict:
179
  """Parse rocminfo output"""
180
  # This would parse real rocminfo output
181
  # For now, return mock data
182
  return self._get_mock_hardware_info()
183
+
184
  def _get_mock_hardware_info(self) -> Dict:
185
  """Mock hardware info for MI300X"""
186
  return {
docs/FAILURE_CASES.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Failure Cases
2
+
3
+ This document records known failure modes with reproducible context.
4
+
5
+ ## FC-001: Inline PTX in CUDA Kernel
6
+
7
+ ### Why this matters
8
+ Kernels that embed inline PTX are a realistic migration boundary. hipify can translate CUDA APIs, but it cannot preserve NVIDIA-specific assembly semantics on AMD.
9
+
10
+ ### Original CUDA pattern (simplified)
11
+ ```cpp
12
+ __device__ __forceinline__ unsigned lane_id() {
13
+ unsigned lane;
14
+ asm volatile("mov.u32 %0, %%laneid;" : "=r"(lane));
15
+ return lane;
16
+ }
17
+ ```
18
+
19
+ ### Typical migration output
20
+ - CUDA runtime calls are translated.
21
+ - Inline PTX block is left unchanged or translated into invalid code for HIP compilation.
22
+
23
+ ### Observed failure mode
24
+ - Compile error under hipcc due to unsupported PTX instruction syntax.
25
+ - In some cases, compile succeeds after manual edits but semantics differ because lane behavior assumptions are NVIDIA-specific.
26
+
27
+ ### Root cause
28
+ - Inline PTX is vendor-specific and outside mechanical translation scope.
29
+ - Warp-level assumptions in PTX often rely on 32-lane behavior and NVIDIA ISA details.
30
+
31
+ ### What is required to fix
32
+ 1. Replace inline PTX with HIP or portable intrinsics.
33
+ 2. Rework lane-level logic for wavefront-64 behavior where required.
34
+ 3. Add correctness tests for edge lanes and reduction boundaries.
35
+ 4. Re-profile after rewrite to confirm no occupancy regressions.
36
+
37
+ ### Trust note
38
+ This is a deliberate example of where ROCmPort AI should report risk, not pretend full automation.
docs/JUDGE_MODE.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Judge Mode Walkthrough
2
+
3
+ Use this sequence during technical evaluation.
4
+
5
+ ## Goal
6
+ Make every claim falsifiable and easy to verify.
7
+
8
+ ## Flow
9
+ 1. Show raw CUDA input.
10
+ 2. Run baseline translation only (straight hipify output).
11
+ 3. Show baseline compile/profiler result.
12
+ 4. Run full ROCmPort AI loop.
13
+ 5. Show each agent event and decisions.
14
+ 6. Compare final output against the declared baseline.
15
+ 7. Show one weak result (small gain or no gain) and explain why.
16
+
17
+ ## Baseline Policy
18
+ - Primary baseline: straight hipify output with minimal required compile edits.
19
+ - Never switch baselines mid-demo.
20
+ - Repeat baseline definition before showing speedup.
21
+
22
+ ## Required Artifacts
23
+ - CUDA source.
24
+ - Baseline HIP output.
25
+ - Optimized HIP output.
26
+ - Compile logs.
27
+ - Profiler summary.
28
+ - Final report with rationale.
29
+
30
+ ## Suggested Script
31
+ - "Here is the original CUDA kernel."
32
+ - "Here is baseline HIP produced by hipify only."
33
+ - "Now we run the orchestration loop and show each decision."
34
+ - "This is the final code diff and measured result versus baseline."
35
+ - "Here is a case where gain is limited, and why."
36
+
37
+ ## Pass/Fail Criteria
38
+ A demo is credible if:
39
+ - Baseline is explicit.
40
+ - Intermediate artifacts are visible.
41
+ - At least one non-win case is included.
42
+ - Reasoning matches observed profiler data.
frontend/index.html CHANGED
@@ -1,503 +1,1112 @@
1
  <!DOCTYPE html>
2
  <html lang="en">
 
3
  <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>ROCmPort AI</title>
7
- <link rel="preconnect" href="https://fonts.googleapis.com">
8
- <link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&family=Space+Grotesk:wght@500;600;700&display=swap" rel="stylesheet">
9
- <style>
10
- :root {
11
- --bg: #030303;
12
- --s1: #0a0a0b;
13
- --s2: #121214;
14
- --s3: #1a1a1e;
15
- --b1: rgba(255, 255, 255, 0.08);
16
- --b2: rgba(255, 255, 255, 0.15);
17
- --red: #ff3344;
18
- --red-glow: rgba(255, 51, 68, 0.4);
19
- --green: #00ff88;
20
- --green-glow: rgba(0, 255, 136, 0.4);
21
- --yellow: #ffcc00;
22
- --cyan: #00d9ff;
23
- --muted: #88888e;
24
- --t1: #a1a1aa;
25
- --t2: #d4d4d8;
26
- --t3: #ffffff;
27
- --mono: 'JetBrains Mono', monospace;
28
- --sans: 'Space Grotesk', sans-serif;
29
- --spring: cubic-bezier(0.34, 1.56, 0.64, 1);
30
- }
31
-
32
- * { margin: 0; padding: 0; box-sizing: border-box; cursor: none !important; }
33
- .hide { display: none !important; }
34
-
35
- body {
36
- background: var(--bg);
37
- color: var(--t1);
38
- font-family: var(--sans);
39
- font-size: 14px;
40
- line-height: 1.6;
41
- overflow-x: hidden;
42
- min-height: 100vh;
43
- }
44
-
45
- /* Animated Gradient Background */
46
- body::before {
47
- content: '';
48
- position: fixed;
49
- inset: 0;
50
- background:
51
- radial-gradient(circle at 20% 30%, rgba(0, 217, 255, 0.05), transparent 40%),
52
- radial-gradient(circle at 80% 70%, rgba(255, 51, 68, 0.05), transparent 40%),
53
- radial-gradient(circle at 50% 50%, rgba(0, 255, 136, 0.03), transparent 60%);
54
- z-index: -1;
55
- animation: bgMove 20s ease-in-out infinite alternate;
56
- }
57
-
58
- @keyframes bgMove {
59
- 0% { transform: scale(1) translate(0, 0); }
60
- 50% { transform: scale(1.1) translate(20px, -20px); }
61
- 100% { transform: scale(1) translate(-20px, 20px); }
62
- }
63
-
64
- .w {
65
- max-width: 1200px;
66
- margin: 0 auto;
67
- padding: 32px 24px;
68
- position: relative;
69
- }
70
-
71
- /* Container Glow */
72
- .w::after {
73
- content: '';
74
- position: absolute;
75
- inset: 0;
76
- background: radial-gradient(circle at 50% 0%, rgba(255, 51, 68, 0.08), transparent 70%);
77
- pointer-events: none;
78
- z-index: -1;
79
- }
80
-
81
- header {
82
- padding-bottom: 24px;
83
- border-bottom: 1px solid var(--b1);
84
- display: flex;
85
- align-items: center;
86
- justify-content: space-between;
87
- margin-bottom: 24px;
88
- }
89
-
90
- .logo {
91
- font-weight: 700;
92
- font-size: 18px;
93
- color: var(--t3);
94
- letter-spacing: -0.02em;
95
- }
96
-
97
- .logo em {
98
- font-style: normal;
99
- color: var(--red);
100
- text-shadow: 0 0 15px var(--red-glow);
101
- }
102
-
103
- .hr {
104
- font-size: 12px;
105
- color: var(--muted);
106
- display: flex;
107
- align-items: center;
108
- gap: 10px;
109
- background: var(--s1);
110
- padding: 6px 12px;
111
- border-radius: 20px;
112
- border: 1px solid var(--b1);
113
- }
114
-
115
- .hd {
116
- width: 6px;
117
- height: 6px;
118
- border-radius: 50%;
119
- background: var(--green);
120
- box-shadow: 0 0 10px var(--green-glow);
121
- }
122
-
123
- .hd.on { animation: pulse 2s ease-in-out infinite; }
124
-
125
- @keyframes pulse {
126
- 0%, 100% { opacity: 1; transform: scale(1); }
127
- 50% { opacity: 0.4; transform: scale(0.8); }
128
- }
129
-
130
- .g {
131
- display: grid;
132
- grid-template-columns: 1.2fr 0.8fr;
133
- gap: 24px;
134
- padding: 0;
135
- }
136
-
137
- .fs { grid-column: 1 / -1; }
138
-
139
- @media (max-width: 900px) {
140
- .g { grid-template-columns: 1fr; }
141
- }
142
-
143
- /* Card Styling */
144
- .p {
145
- background: var(--s1);
146
- border: 1px solid var(--b1);
147
- border-radius: 12px;
148
- overflow: hidden;
149
- display: flex;
150
- flex-direction: column;
151
- box-shadow: 0 4px 20px rgba(0, 0, 0, 0.4);
152
- backdrop-filter: blur(10px);
153
- transition: transform 0.3s var(--spring), border-color 0.3s ease;
154
- }
155
-
156
- .p:hover {
157
- border-color: var(--b2);
158
- }
159
-
160
- .ph {
161
- padding: 12px 16px;
162
- border-bottom: 1px solid var(--b1);
163
- display: flex;
164
- align-items: center;
165
- justify-content: space-between;
166
- font-size: 12px;
167
- color: var(--muted);
168
- background: rgba(255, 255, 255, 0.02);
169
- }
170
-
171
- .ph b { color: var(--red); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; }
172
-
173
- textarea.code {
174
- width: 100%;
175
- flex: 1;
176
- min-height: 300px;
177
- background: var(--bg);
178
- border: none;
179
- color: var(--t2);
180
- font-family: var(--mono);
181
- font-size: 13px;
182
- line-height: 1.7;
183
- padding: 20px;
184
- resize: vertical;
185
- outline: none;
186
- caret-color: var(--red);
187
- will-change: transform;
188
- }
189
-
190
- .db {
191
- padding: 12px 16px;
192
- border-top: 1px solid var(--b1);
193
- display: flex;
194
- align-items: center;
195
- gap: 8px;
196
- background: var(--s1);
197
- }
198
-
199
- .db .l { font-size: 11px; color: var(--muted); font-weight: 500; }
200
-
201
- .ch {
202
- font-family: var(--sans);
203
- font-size: 11px;
204
- padding: 4px 12px;
205
- background: var(--s2);
206
- border: 1px solid var(--b1);
207
- border-radius: 6px;
208
- color: var(--t1);
209
- cursor: pointer;
210
- transition: all 0.2s var(--spring);
211
- }
212
-
213
- .ch:hover {
214
- background: var(--s3);
215
- color: var(--t3);
216
- transform: translateY(-1px);
217
- border-color: var(--b2);
218
- }
219
-
220
- .ch.on {
221
- background: var(--red);
222
- border-color: var(--red);
223
- color: #fff;
224
- box-shadow: 0 0 15px var(--red-glow);
225
- }
226
-
227
- .bg {
228
- margin: 16px;
229
- padding: 14px;
230
- background: var(--red);
231
- border: none;
232
- border-radius: 8px;
233
- color: #fff;
234
- font-family: var(--sans);
235
- font-size: 14px;
236
- font-weight: 700;
237
- cursor: pointer;
238
- transition: all 0.3s var(--spring);
239
- text-transform: uppercase;
240
- letter-spacing: 0.05em;
241
- box-shadow: 0 4px 15px var(--red-glow);
242
- }
243
-
244
- .bg:hover {
245
- background: #ff4d5a;
246
- transform: translateY(-2px);
247
- box-shadow: 0 6px 20px var(--red-glow);
248
- }
249
-
250
- .bg:active { transform: translateY(0); }
251
-
252
- .bg:disabled {
253
- opacity: 0.4;
254
- cursor: not-allowed;
255
- transform: none;
256
- box-shadow: none;
257
- }
258
-
259
- /* Agent log */
260
- .al { padding: 12px; display: flex; flex-direction: column; gap: 8px; }
261
-
262
- .ar {
263
- padding: 12px 16px;
264
- border-radius: 8px;
265
- background: rgba(255, 255, 255, 0.03);
266
- border: 1px solid transparent;
267
- transition: all 0.4s var(--spring);
268
- animation: slideIn 0.5s var(--spring) forwards;
269
- opacity: 0;
270
- transform: translateX(20px);
271
- }
272
-
273
- @keyframes slideIn {
274
- to { opacity: 1; transform: translateX(0); }
275
- }
276
-
277
- .ar.run { border-color: var(--cyan); background: rgba(0, 217, 255, 0.05); }
278
- .ar.done { border-color: var(--green); background: rgba(0, 255, 136, 0.05); }
279
- .ar.fail { border-color: var(--red); background: rgba(255, 51, 68, 0.05); }
280
- .ar.retry {
281
- border-color: var(--yellow);
282
- background: rgba(255, 204, 0, 0.05);
283
- animation: pulse-border 1.5s ease-in-out infinite;
284
- }
285
-
286
- @keyframes pulse-border {
287
- 50% { border-color: rgba(255, 204, 0, 0.2); }
288
- }
289
-
290
- .at { display: flex; align-items: center; gap: 12px; }
291
- .an { font-size: 10px; font-weight: 700; color: var(--muted); min-width: 90px; text-transform: uppercase; letter-spacing: 0.1em; }
292
- .am { font-size: 13px; color: var(--t2); font-weight: 500; }
293
- .ad { font-size: 11px; color: var(--muted); margin-top: 4px; padding-left: 102px; white-space: pre-wrap; line-height: 1.6; max-height: 100px; overflow-y: auto; }
294
- .ad .w { color: var(--yellow); font-weight: 600; }
295
- .ad .g { color: var(--green); font-weight: 600; }
296
-
297
- /* Horizontal Timeline */
298
- .timeline {
299
- display: flex;
300
- justify-content: space-between;
301
- padding: 16px 20px;
302
- background: rgba(255, 255, 255, 0.02);
303
- border-bottom: 1px solid var(--b1);
304
- margin-bottom: 8px;
305
- }
306
-
307
- .node {
308
- display: flex;
309
- flex-direction: column;
310
- align-items: center;
311
- gap: 6px;
312
- position: relative;
313
- flex: 1;
314
- }
315
-
316
- .node::after {
317
- content: '';
318
- position: absolute;
319
- top: 12px;
320
- left: 50%;
321
- width: 100%;
322
- height: 2px;
323
- background: var(--b1);
324
- z-index: 0;
325
- }
326
-
327
- .node:last-child::after { display: none; }
328
-
329
- .ni {
330
- width: 24px;
331
- height: 24px;
332
- border-radius: 50%;
333
- background: var(--s3);
334
- border: 2px solid var(--b1);
335
- display: flex;
336
- align-items: center;
337
- justify-content: center;
338
- font-size: 12px;
339
- z-index: 1;
340
- transition: all 0.4s var(--spring);
341
- }
342
-
343
- .node.on .ni { background: var(--cyan); border-color: var(--cyan); color: #000; box-shadow: 0 0 15px var(--cyan); }
344
- .node.done .ni { background: var(--green); border-color: var(--green); color: #000; box-shadow: 0 0 15px var(--green); }
345
- .node.fail .ni { background: var(--red); border-color: var(--red); color: #fff; }
346
- .node.retry .ni { animation: pulse-node 1s var(--spring) infinite; background: var(--yellow); border-color: var(--yellow); }
347
-
348
- @keyframes pulse-node {
349
- 0%, 100% { transform: scale(1); }
350
- 50% { transform: scale(1.2); }
351
- }
352
-
353
- .nl { font-size: 9px; font-weight: 700; color: var(--muted); text-transform: uppercase; letter-spacing: 0.05em; }
354
- .node.on .nl, .node.done .nl { color: var(--t3); }
355
-
356
- /* Tabs */
357
- .tabs { display: flex; gap: 8px; }
358
- .tab {
359
- background: var(--s2);
360
- border: 1px solid var(--b1);
361
- padding: 6px 16px;
362
- border-radius: 8px;
363
- font-family: var(--sans);
364
- font-size: 12px;
365
- font-weight: 600;
366
- color: var(--muted);
367
- cursor: pointer;
368
- transition: all 0.2s var(--spring);
369
- }
370
-
371
- .tab:hover { color: var(--t2); background: var(--s3); }
372
- .tab.on { color: var(--t3); background: var(--red); border-color: var(--red); box-shadow: 0 0 10px var(--red-glow); }
373
-
374
- .tc { display: none; padding: 0; animation: fadeIn 0.4s ease; }
375
- .tc.on { display: block; }
376
-
377
- @keyframes fadeIn { from { opacity: 0; transform: translateY(10px); } to { opacity: 1; transform: translateY(0); } }
378
-
379
- /* Summary row */
380
- .sum-row { padding: 24px; display: flex; align-items: center; gap: 32px; flex-wrap: wrap; border-bottom: 1px solid var(--b1); background: rgba(0, 255, 136, 0.02); }
381
- .sum-big { font-size: 32px; font-weight: 800; color: var(--green); line-height: 1; letter-spacing: -0.02em; text-shadow: 0 0 20px var(--green-glow); }
382
- .sum-big .u { font-size: 13px; font-weight: 500; color: var(--muted); margin-left: 4px; display: block; margin-top: 4px; letter-spacing: 0; }
383
- .sum-big .vic { font-size: 11px; color: var(--cyan); font-weight: 600; display: block; margin-top: 8px; text-shadow: none; opacity: 0.8; }
384
- .sum-sep { width: 1px; height: 40px; background: var(--b1); }
385
- .sum-chk { display: flex; align-items: center; gap: 8px; font-size: 12px; color: var(--t2); font-weight: 500; }
386
- .sum-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; }
387
- .sum-dot.ok { background: var(--green); box-shadow: 0 0 8px var(--green-glow); }
388
- .sum-dot.no { background: var(--red); box-shadow: 0 0 8px var(--red-glow); }
389
- .sum-type { font-size: 11px; color: var(--cyan); text-transform: uppercase; letter-spacing: 0.1em; font-weight: 700; padding: 4px 10px; background: rgba(0, 217, 255, 0.1); border-radius: 4px; }
390
-
391
- .sum-bar { padding: 16px 24px; display: flex; align-items: center; gap: 12px; flex-wrap: wrap; border-bottom: 1px solid var(--b1); }
392
- .bs {
393
- font-family: var(--sans);
394
- font-size: 11px;
395
- font-weight: 700;
396
- padding: 8px 16px;
397
- border-radius: 8px;
398
- border: 1px solid var(--b1);
399
- background: var(--s2);
400
- color: var(--t2);
401
- cursor: pointer;
402
- transition: all 0.2s var(--spring);
403
- text-transform: uppercase;
404
- letter-spacing: 0.05em;
405
- }
406
-
407
- .bs:hover { border-color: var(--b2); transform: translateY(-1px); background: var(--s3); }
408
- .bs.r { background: var(--bg); border-color: var(--red); color: var(--red); }
409
- .bs.r:hover { background: var(--red); color: #fff; box-shadow: 0 4px 15px var(--red-glow); }
410
- .bs.gr { background: var(--green); border-color: var(--green); color: #000; }
411
- .bs.gr:hover { box-shadow: 0 4px 15px var(--green-glow); transform: translateY(-2px); }
412
- .sp { flex: 1; }
413
-
414
- /* Details tab */
415
- .dm { display: grid; grid-template-columns: repeat(5, 1fr); border-bottom: 1px solid var(--b1); }
416
- @media (max-width: 800px) { .dm { grid-template-columns: repeat(2, 1fr); } }
417
- .di { padding: 20px; border-right: 1px solid var(--b1); background: rgba(255, 255, 255, 0.01); }
418
- .di:last-child { border-right: none; }
419
- .dl { font-size: 10px; color: var(--muted); text-transform: uppercase; letter-spacing: 0.1em; margin-bottom: 8px; font-weight: 700; }
420
- .dv { font-size: 20px; font-weight: 800; line-height: 1; margin-bottom: 4px; color: var(--t3); }
421
- .dv.g { color: var(--green); }
422
- .dv.c { color: var(--cyan); }
423
- .dv.y { color: var(--yellow); }
424
- .dv.t { color: var(--t2); font-size: 13px; }
425
- .ds { font-size: 10px; color: var(--muted); line-height: 1.4; }
426
-
427
- /* Benchmark bars */
428
- .bk { padding: 24px; border-bottom: 1px solid var(--b1); }
429
- .bk-t { font-size: 11px; color: var(--muted); text-transform: uppercase; letter-spacing: 0.1em; margin-bottom: 16px; font-weight: 700; }
430
- .br { display: flex; align-items: center; gap: 16px; margin-bottom: 12px; }
431
- .br:last-child { margin-bottom: 0; }
432
- .bl { font-size: 12px; color: var(--t2); width: 140px; flex-shrink: 0; font-weight: 500; }
433
- .bt { flex: 1; height: 8px; background: var(--bg); border-radius: 4px; overflow: hidden; border: 1px solid var(--b1); }
434
- .bf { height: 100%; border-radius: 4px; transition: width 1s var(--spring); width: 0; }
435
- .bf.bad { background: linear-gradient(90deg, #ff334466, #ff3344); box-shadow: 0 0 10px rgba(255, 51, 68, 0.3); }
436
- .bf.good { background: linear-gradient(90deg, #00ff8866, #00ff88); box-shadow: 0 0 10px rgba(0, 255, 136, 0.3); }
437
- .bv { font-size: 12px; font-weight: 700; width: 40px; text-align: right; flex-shrink: 0; }
438
- .bv.bad { color: var(--red); }
439
- .bv.good { color: var(--green); }
440
-
441
- /* Simple mode note */
442
- .sn { padding: 20px; border: 1px solid var(--cyan); border-radius: 12px; background: rgba(0, 217, 255, 0.05); margin: 24px; font-size: 13px; color: var(--t2); line-height: 1.6; border-left-width: 4px; }
443
-
444
- /* Diff */
445
- .dg { display: grid; grid-template-columns: 1fr 1fr; background: var(--bg); }
446
- @media (max-width: 780px) { .dg { grid-template-columns: 1fr; } .dfs:first-child { border-right: none !important; border-bottom: 1px solid var(--b1); } }
447
- .dfs:first-child { border-right: 1px solid var(--b1); }
448
- .dfh { padding: 10px 16px; border-bottom: 1px solid var(--b1); font-size: 11px; color: var(--muted); display: flex; align-items: center; gap: 8px; font-weight: 600; background: var(--s2); }
449
- .dft { font-size: 9px; font-weight: 800; padding: 2px 6px; border-radius: 4px; text-transform: uppercase; }
450
- .dft.cu { background: rgba(255, 51, 68, 0.2); color: var(--red); }
451
- .dft.ro { background: rgba(0, 255, 136, 0.2); color: var(--green); }
452
- .dfp { padding: 20px; font-family: var(--mono); font-size: 12px; line-height: 1.7; overflow: auto; max-height: 500px; white-space: pre; color: var(--t2); }
453
- .dlo { background: rgba(255, 51, 68, 0.1); color: var(--red); text-decoration: line-through; display: block; width: 100%; }
454
- .dln { background: rgba(0, 255, 136, 0.1); color: var(--green); display: block; width: 100%; }
455
-
456
- /* Loading Skeleton */
457
- .skeleton { position: relative; overflow: hidden; background: var(--s2); border-radius: 12px; height: 200px; margin-top: 24px; }
458
- .skeleton::after { content: ''; position: absolute; inset: 0; transform: translateX(-100%); background: linear-gradient(90deg, transparent, rgba(255,255,255,0.05), transparent); animation: shimmer 1.5s infinite; }
459
- @keyframes shimmer { 100% { transform: translateX(100%); } }
460
-
461
- /* Custom Cursor */
462
- #cursor {
463
- position: fixed;
464
- width: 20px;
465
- height: 20px;
466
- background: rgba(255, 255, 255, 0.2);
467
- border: 1px solid rgba(255, 255, 255, 0.4);
468
- border-radius: 50%;
469
- pointer-events: none;
470
- z-index: 9999;
471
- transition: transform 0.1s ease, width 0.3s var(--spring), height 0.3s var(--spring), background 0.3s ease;
472
- mix-blend-mode: difference;
473
- }
474
-
475
- #cursor.active { transform: scale(3); background: rgba(255, 51, 68, 0.3); border-color: var(--red); }
476
-
477
- /* Modal */
478
- .mo { display: none; position: fixed; inset: 0; background: rgba(0, 0, 0, 0.85); z-index: 1000; place-items: center; backdrop-filter: blur(8px); }
479
- .mo.open { display: grid; }
480
- .mb { background: var(--s1); border: 1px solid var(--b1); border-radius: 16px; width: 90%; max-width: 800px; max-height: 90vh; overflow: hidden; box-shadow: 0 20px 50px rgba(0, 0, 0, 0.6); }
481
- .mt { padding: 16px 24px; border-bottom: 1px solid var(--b1); display: flex; justify-content: space-between; align-items: center; background: var(--s2); }
482
- .mt h3 { font-size: 16px; color: var(--t3); font-weight: 700; }
483
- .mx { background: none; border: none; color: var(--muted); font-size: 24px; cursor: pointer !important; line-height: 1; transition: color 0.2s; }
484
- .mx:hover { color: var(--t3); }
485
- .mc { padding: 24px; }
486
- .mc textarea { width: 100%; height: 400px; background: var(--bg); border: 1px solid var(--b1); border-radius: 8px; padding: 16px; color: var(--cyan); font-family: var(--mono); font-size: 12px; line-height: 1.6; resize: vertical; outline: none; }
487
- .mc textarea:focus { border-color: var(--cyan); box-shadow: 0 0 10px rgba(0, 217, 255, 0.2); }
488
- .mf { padding: 16px 24px; border-top: 1px solid var(--b1); display: flex; justify-content: flex-end; gap: 12px; background: var(--s2); }
489
-
490
- ::-webkit-scrollbar { width: 6px; height: 6px; }
491
- ::-webkit-scrollbar-track { background: transparent; }
492
- ::-webkit-scrollbar-thumb { background: var(--b1); border-radius: 10px; }
493
- ::-webkit-scrollbar-thumb:hover { background: var(--b2); }
494
-
495
- footer { padding: 32px 0; border-top: 1px solid var(--b1); display: flex; justify-content: space-between; font-size: 11px; color: var(--muted); font-weight: 500; }
496
- footer a { color: var(--muted); text-decoration: none; transition: color 0.2s; border-bottom: 1px solid transparent; }
497
- footer a:hover { color: var(--t2); border-bottom-color: var(--muted); }
498
-
499
- .idle { flex: 1; display: flex; align-items: center; justify-content: center; color: var(--b2); font-size: 13px; font-weight: 500; min-height: 100px; }
500
- </style>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
501
  </head>
502
  <div id="cursor"></div>
503
 
@@ -506,13 +1115,16 @@ footer a:hover { color: var(--t2); border-bottom-color: var(--muted); }
506
  <div class="logo">ROCmPort <em>AI</em></div>
507
  <div class="hr">
508
  <div class="hd on" id="hdot"></div>
509
- <span id="hstat">⚡ Armed and waiting</span>
510
  </div>
511
  </header>
512
 
513
  <div class="g">
514
  <div class="p">
515
- <div class="ph"><div><b>//</b> CUDA source</div><div id="lc">0 lines</div></div>
 
 
 
516
  <textarea class="code" id="inp" spellcheck="false" placeholder="// Paste CUDA code here
517
  // or pick a demo below
518
 
@@ -531,7 +1143,10 @@ __global__ void kernel(float* A, float* B, int N) {
531
  </div>
532
 
533
  <div class="p">
534
- <div class="ph"><div><b>//</b> Pipeline</div><div id="pt">0.0s</div></div>
 
 
 
535
  <div class="timeline" id="tl">
536
  <!-- Nodes injected by JS -->
537
  </div>
@@ -561,243 +1176,247 @@ __global__ void kernel(float* A, float* B, int N) {
561
 
562
  <footer>
563
  <div>ROCmPort AI — AMD Developer Hackathon 2025</div>
564
- <div><a href="https://x.com/TazwarEnan" target="_blank">Tazwar Ahnaf Enan</a> · <a href="https://github.com/tazwaryayyyy" target="_blank">GitHub</a></div>
 
565
  </footer>
566
  </div>
567
 
568
  <div class="mo" id="modal">
569
  <div class="mb">
570
- <div class="mt"><h3>Edit ROCm code</h3><button class="mx" onclick="cm()">&times;</button></div>
 
 
571
  <div class="mc"><textarea id="edt"></textarea></div>
572
- <div class="mf"><button class="bs" onclick="cm()">Cancel</button><button class="bs r" onclick="rec()">Re-test</button></div>
 
573
  </div>
574
  </div>
575
  <script>
576
- const API = 'http://localhost:8000';
577
- const S = { code: '', kn: 'custom', run: false, t0: null, iv: null, rep: null, tl: [], kernels: {} };
578
- const AG = {
579
- analyzer: { n: 'ANALYZER', i: '🔍' },
580
- translator: { n: 'TRANSLATOR', i: '🔄' },
581
- optimizer: { n: 'OPTIMIZER', i: '⚡' },
582
- tester: { n: 'TESTER', i: '🧪' },
583
- coordinator: { n: 'COORDINATOR', i: '📋' }
584
- };
585
-
586
- // Custom Cursor Logic
587
- const cur = document.getElementById('cursor');
588
- document.addEventListener('mousemove', (e) => {
589
- cur.style.left = e.clientX + 'px';
590
- cur.style.top = e.clientY + 'px';
591
- const target = e.target;
592
- const isClickable = target.onclick ||
593
- target.tagName === 'BUTTON' ||
594
- target.tagName === 'A' ||
595
- target.tagName === 'TEXTAREA' ||
596
- target.classList.contains('ch') ||
597
- target.classList.contains('tab');
598
-
599
- if (isClickable) {
600
- cur.classList.add('active');
601
- if (target.id === 'go') cur.style.background = 'rgba(255, 51, 68, 0.5)';
602
- else cur.style.background = 'rgba(255, 255, 255, 0.3)';
603
- } else {
604
- cur.classList.remove('active');
605
- cur.style.background = 'rgba(255, 255, 255, 0.2)';
606
- }
607
- });
608
-
609
- async function init() {
610
- const ta = document.getElementById('inp');
611
- ta.oninput = () => {
612
- document.getElementById('lc').textContent = ta.value.split('\n').length + ' lines';
613
- S.code = ta.value;
614
  };
615
- try {
616
- const r = await fetch(API + '/demo-kernels');
617
- S.kernels = await r.json();
618
- } catch (e) { S.kernels = FB; }
619
- }
620
-
621
- function lk(n, btn) {
622
- document.querySelectorAll('.ch').forEach(c => c.classList.remove('on'));
623
- btn.classList.add('on');
624
- const code = S.kernels[n] || FB[n] || '', ta = document.getElementById('inp');
625
- ta.value = code; S.code = code; S.kn = n;
626
- document.getElementById('lc').textContent = code.split('\n').length + ' lines';
627
- }
628
-
629
- function stab(id, btn) {
630
- document.querySelectorAll('.tab').forEach(t => t.classList.remove('on'));
631
- document.querySelectorAll('.tc').forEach(t => t.classList.remove('on'));
632
- btn.classList.add('on');
633
- document.getElementById('t-' + id).classList.add('on');
634
- if (id === 'diff' && S.rep) rDiff(S.code, S.rep.optimized_code);
635
- }
636
-
637
- async function go() {
638
- if (S.run) return;
639
- const code = document.getElementById('inp').value.trim();
640
- if (!code) return;
641
-
642
- S.code = code; S.run = true; S.t0 = Date.now(); S.tl = [];
643
- const btn = document.getElementById('go');
644
- btn.disabled = true;
645
- btn.textContent = 'Awaiting Agents...';
646
-
647
- document.getElementById('hstat').textContent = '🤖 Agents thinking...';
648
- document.getElementById('rp').classList.add('hide');
649
-
650
- bLog();
651
- sTimer();
652
-
653
- try {
654
- const simpleModeCheckbox = document.getElementById('sm');
655
- const res = await fetch(API + '/port', {
656
- method: 'POST',
657
- headers: { 'Content-Type': 'application/json' },
658
- body: JSON.stringify({
659
- cuda_code: code,
660
- kernel_name: S.kn,
661
- simple_mode: simpleModeCheckbox ? simpleModeCheckbox.checked : false
662
- })
663
- });
664
-
665
- // Show results panel with loader immediately
666
- document.getElementById('rp').classList.remove('hide');
667
- document.getElementById('t-loader').classList.remove('hide');
668
- document.getElementById('t-sum').classList.remove('on');
669
- document.getElementById('t-diff').classList.remove('on');
670
- document.getElementById('t-det').classList.remove('on');
671
-
672
- const rd = res.body.getReader(), dc = new TextDecoder();
673
- let buf = '';
674
- while (true) {
675
- const { done, value } = await rd.read();
676
- if (done) break;
677
- buf += dc.decode(value, { stream: true });
678
- const lines = buf.split('\n');
679
- buf = lines.pop();
680
- for (const ln of lines) {
681
- if (!ln.startsWith('data: ')) continue;
682
- const raw = ln.slice(6).trim();
683
- if (raw === '[DONE]') { done_(); break; }
684
- try { hEvt(JSON.parse(raw)); } catch (e) { console.error('Parse error:', e); }
685
- }
686
  }
687
- } catch (e) {
688
- document.getElementById('hstat').textContent = '⚠️ Agent failure';
689
- document.getElementById('t-loader').classList.add('hide'); // Hide loader on error
690
- console.error(e);
691
- } finally {
692
- xTimer();
693
- S.run = false;
694
- btn.disabled = false;
695
- btn.textContent = 'Port to ROCm';
696
- document.getElementById('t-loader').classList.add('hide');
 
 
697
  }
698
- }
699
-
700
- function hEvt(ev) {
701
- uLog(ev.agent, ev.status, ev.message, ev.detail);
702
- if (ev.agent === 'tester' && (ev.status === 'done' || ev.status === 'failed')) {
703
- const m = ev.message.match(/([\d.]+)x/);
704
- if (m) {
705
- const sp = parseFloat(m[1]), ok = sp >= 1, im = ev.message.match(/Iteration (\d+)/i);
706
- S.tl.push({
707
- label: 'Iteration ' + (im ? im[1] : S.tl.length + 1) + (ok ? ' (optimized)' : ' (baseline)'),
708
- speedup: sp,
709
- good: ok
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
710
  });
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
711
  }
712
  }
713
- if (ev.agent === 'coordinator' && ev.status === 'done' && ev.detail) {
714
- try {
715
- const r = JSON.parse(ev.detail);
716
- S.rep = r;
717
- rRes(r, S.tl);
718
- } catch (e) { console.error('Coordinator detail parse error:', e); }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
719
  }
720
- }
721
 
722
- function done_() {
723
- document.getElementById('hstat').textContent = ' Migration complete';
724
- document.getElementById('t-loader').classList.add('hide');
725
- if (!S.rep) {
726
- document.getElementById('t-sum').innerHTML = '<div class="idle">Migration finished but no report was generated. Check agent logs for details.</div>';
727
- document.getElementById('t-sum').classList.add('on');
 
728
  }
729
- }
730
-
731
- function bLog() {
732
- const el = document.getElementById('al');
733
- const tl = document.getElementById('tl');
734
- el.innerHTML = '';
735
- tl.innerHTML = '';
736
-
737
- let i = 0;
738
- for (const [k, obj] of Object.entries(AG)) {
739
- // Log row
740
- const d = document.createElement('div');
741
- d.className = 'ar';
742
- d.id = 'ar-' + k;
743
- d.style.animationDelay = (i * 0.1) + 's';
744
- d.innerHTML = `
745
  <div class="at">
746
  <span class="an">${obj.n}</span>
747
  <span class="am" id="am-${k}">Waiting</span>
748
  </div>
749
  <div class="ad" id="ad-${k}"></div>`;
750
- el.appendChild(d);
751
-
752
- // Timeline node
753
- const n = document.createElement('div');
754
- n.className = 'node';
755
- n.id = 'nd-' + k;
756
- n.title = obj.n;
757
- n.innerHTML = `<div class="ni">${obj.i}</div><div class="nl">${obj.n.slice(0,3)}</div>`;
758
- tl.appendChild(n);
759
- i++;
 
760
  }
761
- }
762
-
763
- function uLog(a, s, m, d) {
764
- const row = document.getElementById('ar-' + a);
765
- const node = document.getElementById('nd-' + a);
766
- if (!row || !node) return;
767
-
768
- const statusClass = { running: 'run', done: 'done', failed: 'fail', retrying: 'retry' }[s] || '';
769
- row.className = 'ar ' + statusClass;
770
- node.className = 'node ' + (s === 'running' ? 'on' : s === 'retrying' ? 'retry' : s === 'done' ? 'done' : s === 'failed' ? 'fail' : '');
771
-
772
- const me = document.getElementById('am-' + a);
773
- if (me) me.textContent = m;
774
-
775
- // Node tooltip message update
776
- node.title = m;
777
-
778
- const de = document.getElementById('ad-' + a);
779
- if (de && d) {
780
- de.innerHTML = esc(d)
781
- .replace(/\u26a0\ufe0f([^\n]*)/g, '<span class="w">⚠️ $1</span>')
782
- .replace(/\u2705([^\n]*)/g, '<span class="g">✅ $1</span>');
783
- de.scrollTop = de.scrollHeight;
784
  }
785
- }
786
 
787
- function rRes(r, tl) {
788
- // Hide loader, show summary
789
- document.getElementById('t-loader').classList.add('hide');
790
- document.getElementById('t-sum').classList.add('on');
791
-
792
- const v = r.verification || {}, bw = r.bandwidth_utilized;
793
- const dot = ok => `<div class="sum-dot ${ok === false ? 'no' : 'ok'}"></div>`;
794
 
795
- document.getElementById('t-sum').innerHTML = `
796
  <div class="sum-row">
797
  <div class="sum-big">
798
  ${r.speedup}x
799
  <span class="u">vs baseline hipify</span>
800
- <span class="vic">🎯 Your code is now an AMD champion.</span>
801
  </div>
802
  <div class="sum-sep"></div>
803
  <div>
@@ -819,105 +1438,106 @@ function rRes(r, tl) {
819
  ${r.simplified_explanation ? esc(r.simplified_explanation) : '<em>Simplified explanation will appear here</em>'}
820
  </div>`;
821
 
822
- // Details tab
823
- let dh = `<div class="dm">
824
  <div class="di"><div class="dl">Speedup</div><div class="dv g">${r.speedup}x</div><div class="ds">optimized ROCm vs straight hipify output</div></div>
825
  <div class="di"><div class="dl">Bandwidth</div><div class="dv c">${bw != null ? bw.toFixed(1) : '—'}%</div><div class="ds">of MI300X 5.3 TB/s HBM3</div></div>
826
  <div class="di"><div class="dl">Changes</div><div class="dv y">${r.total_changes}</div><div class="ds">hipify + LLM + optimizer changes</div></div>
827
  <div class="di"><div class="dl">Iterations</div><div class="dv c">${r.iterations || 1}</div><div class="ds">optimizer retry loop count</div></div>
828
  <div class="di"><div class="dl">Type</div><div class="dv t">${(r.bottleneck || '—').toUpperCase()}</div><div class="ds">workload classification</div></div>
829
  </div>`;
830
-
831
- if (tl.length) {
832
- dh += '<div class="bk"><div class="bk-t">Benchmark iterations (optimized vs baseline hipify)</div>';
833
- tl.forEach(d => {
834
- const pct = Math.min(Math.max((d.speedup / 2) * 100, 3), 95);
835
- dh += `<div class="br">
836
  <div class="bl">${esc(d.label)}</div>
837
  <div class="bt"><div class="bf ${d.good ? 'good' : 'bad'}" style="width: 0" data-w="${pct}%"></div></div>
838
  <div class="bv ${d.good ? 'good' : 'bad'}">${d.speedup}x</div>
839
  </div>`;
840
- });
841
- dh += '</div>';
 
 
 
 
 
 
 
 
 
 
 
842
  }
843
-
844
- document.getElementById('t-det').innerHTML = dh;
845
- tsm(); // Ensure simple note visibility matches current toggle state
846
-
847
- // Progress bar animation
848
- setTimeout(() => {
849
- document.querySelectorAll('.bf[data-w]').forEach(b => {
850
- b.style.width = b.dataset.w;
851
- });
852
- }, 100);
853
- }
854
-
855
- function rDiff(o, n) {
856
- if (!o || !n) return;
857
- const oe = document.getElementById('d-o'), ne = document.getElementById('d-n');
858
- if (oe && oe.innerHTML && ne && ne.innerHTML) return; // Already rendered
859
-
860
- document.getElementById('t-diff').innerHTML = `<div class="dg">
861
  <div class="dfs"><div class="dfh"><span class="dft cu">CUDA</span> Original Source</div><pre class="dfp" id="d-o"></pre></div>
862
  <div class="dfs"><div class="dfh"><span class="dft ro">ROCm</span> Optimized HIP</div><pre class="dfp" id="d-n"></pre></div>
863
  </div>`;
864
-
865
- const oL = o.split('\n'), nL = n.split('\n'), mx = Math.max(oL.length, nL.length);
866
- let oH = '', nH = '';
867
- for (let i = 0; i < mx; i++) {
868
- const a = oL[i] ?? '', b = nL[i] ?? '', c = a !== b;
869
- oH += `<span class="${c ? 'dlo' : ''}">${esc(a)}\n</span>`;
870
- nH += `<span class="${c ? 'dln' : ''}">${esc(b)}\n</span>`;
 
 
 
871
  }
872
- document.getElementById('d-o').innerHTML = oH;
873
- document.getElementById('d-n').innerHTML = nH;
874
- }
875
-
876
- function sTimer() { S.iv = setInterval(() => { document.getElementById('pt').textContent = ((Date.now() - S.t0) / 1000).toFixed(1) + 's' }, 100) }
877
- function xTimer() { clearInterval(S.iv) }
878
-
879
- function dlR() {
880
- const r = S.rep; if (!r) return;
881
- const md = `# ROCmPort AI — Migration Report\n\n## Results\n- **Speedup**: ${r.speedup}x\n- **Bandwidth**: ${r.bandwidth_utilized ? r.bandwidth_utilized.toFixed(1) : '—'}%\n- **Changes**: ${r.total_changes}\n- **Iterations**: ${r.iterations}\n- **Type**: ${r.bottleneck}\n\n${r.amd_advantage_explanation ? '> ' + r.amd_advantage_explanation + '\n\n' : ''}${r.cost_estimate ? '## Cost Impact\n- Manual: ' + r.cost_estimate.manual_porting_weeks + '\n- ROCmPort: ' + r.cost_estimate.rocmport_minutes + '\n- Savings: ' + r.cost_estimate.estimated_savings + '\n\n' : ''}## ROCm/HIP Code\n\`\`\`cpp\n${r.optimized_code || ''}\n\`\`\`\n\n---\n*Generated by ROCmPort AI*\n`;
882
- const a = document.createElement('a'); a.href = URL.createObjectURL(new Blob([md], { type: 'text/markdown' })); a.download = 'rocmport-migration-report.md'; a.click();
883
- }
884
-
885
- function om() { if (!S.rep) return alert('No results yet!'); document.getElementById('edt').value = S.rep?.optimized_code || ''; document.getElementById('modal').classList.add('open') }
886
- function cm() { document.getElementById('modal').classList.remove('open') }
887
-
888
- async function rec() {
889
- const code = document.getElementById('edt').value.trim(); if (!code) return;
890
- try {
891
- const res = await fetch(API + '/recompile', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ edited_code: code, kernel_name: S.kn }) });
892
- const r = await res.json();
893
- if (r.success) { cm(); if (r.result) rRes(r.result, S.tl); }
894
- else alert('Failed: ' + (r.detail || 'Unknown'))
895
- } catch (e) { alert('Error: ' + e.message) }
896
- }
897
-
898
- async function exM() {
899
- if (!S.rep) return;
900
- try {
901
- const res = await fetch(API + '/export', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ original_cuda: S.code, final_rocm: S.rep.optimized_code, migration_report: S.rep }) });
902
- if (res.ok) { const a = document.createElement('a'); a.href = URL.createObjectURL(await res.blob()); a.download = 'rocmport-migration.zip'; a.click() }
903
- } catch (e) { alert('Export error') }
904
- }
905
-
906
- function tsm() {
907
- const sn = document.getElementById('sn');
908
- if (sn) sn.classList.remove('hide');
909
- }
910
-
911
- function esc(s) { return String(s ?? '').replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;') }
912
-
913
- const FB = {
914
- vector_add: `#include <cuda_runtime.h>\n\n__global__ void vector_add_kernel(float* A, float* B, float* C, int N) {\n int idx = blockIdx.x * blockDim.x + threadIdx.x;\n if (idx < N) {\n C[idx] = A[idx] + B[idx];\n }\n}\n\nint main() {\n int N = 1 << 24;\n size_t size = N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n int threads = 128;\n int blocks = (N + threads - 1) / threads;\n vector_add_kernel<<<blocks, threads>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
915
- matrix_multiply: `#include <cuda_runtime.h>\n#define WARP_SIZE 32\n\n__global__ void matmul_kernel(float* A, float* B, float* C, int N) {\n int row = blockIdx.y * blockDim.y + threadIdx.y;\n int col = blockIdx.x * blockDim.x + threadIdx.x;\n float sum = 0.0f;\n if (row < N && col < N) {\n for (int k = 0; k < N; k++)\n sum += A[row * N + k] * B[k * N + col];\n C[row * N + col] = sum;\n }\n}\n\n__global__ void warp_reduce(float* data, float* result, int N) {\n int tid = threadIdx.x;\n extern __shared__ float sdata[];\n sdata[tid] = (tid < N) ? data[tid] : 0;\n __syncthreads();\n for (int s = WARP_SIZE/2; s > 0; s >>= 1) {\n if (tid < s) sdata[tid] += sdata[tid + s];\n __syncthreads();\n }\n if (tid == 0) result[blockIdx.x] = sdata[0];\n}\n\nint main() {\n int N = 1024;\n size_t size = N * N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n dim3 block(16, 16);\n dim3 grid((N+15)/16, (N+15)/16);\n matmul_kernel<<<grid, block>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
916
- convolution_2d: `#include <cuda_runtime.h>\n#define BLOCK_SIZE 16\n\n__global__ void conv2d_kernel(\n float* input, float* kernel, float* output,\n int width, int height\n) {\n int x = blockIdx.x * blockDim.x + threadIdx.x;\n int y = blockIdx.y * blockDim.y + threadIdx.y;\n if (x >= width || y >= height) return;\n float sum = 0.0f;\n for (int ky = -1; ky <= 1; ky++) {\n for (int kx = -1; kx <= 1; kx++) {\n int ix = x + kx, iy = y + ky;\n if (ix >= 0 && ix < width && iy >= 0 && iy < height)\n sum += input[iy * width + ix] * kernel[(ky+1)*3 + (kx+1)];\n }\n }\n output[y * width + x] = sum;\n}\n\nint main() {\n int W = 2048, H = 2048;\n float *d_in, *d_ker, *d_out;\n cudaMalloc(&d_in, W*H*sizeof(float));\n cudaMalloc(&d_ker, 9*sizeof(float));\n cudaMalloc(&d_out, W*H*sizeof(float));\n dim3 block(BLOCK_SIZE, BLOCK_SIZE);\n dim3 grid((W+BLOCK_SIZE-1)/BLOCK_SIZE, (H+BLOCK_SIZE-1)/BLOCK_SIZE);\n conv2d_kernel<<<grid, block>>>(d_in, d_ker, d_out, W, H);\n cudaDeviceSynchronize();\n cudaFree(d_in); cudaFree(d_ker); cudaFree(d_out);\n return 0;\n}`,
917
- reduction: `#include <cuda_runtime.h>\n#include <stdio.h>\n#include <iostream>\n#include <vector>\n#include <numeric>\n\n// Tree-based reduction kernel\n__global__ void reduction_kernel(float* g_idata, float* g_odata, unsigned int n) {\n extern __shared__ float sdata[];\n unsigned int tid = threadIdx.x;\n unsigned int i = blockIdx.x * (blockDim.x * 2) + threadIdx.x;\n\n float mySum = (i < n) ? g_idata[i] : 0;\n if (i + blockDim.x < n) mySum += g_idata[i + blockDim.x];\n sdata[tid] = mySum;\n __syncthreads();\n\n for (unsigned int s = blockDim.x / 2; s > 32; s >>= 1) {\n if (tid < s) sdata[tid] = mySum = mySum + sdata[tid + s];\n __syncthreads();\n }\n\n // DELIBERATE WARP-SIZE BUG: Unroll to 32 instead of 64\n if (tid < 32) {\n volatile float* vsmem = sdata;\n vsmem[tid] = mySum = mySum + vsmem[tid + 32];\n vsmem[tid] = mySum = mySum + vsmem[tid + 16];\n vsmem[tid] = mySum = mySum + vsmem[tid + 8];\n vsmem[tid] = mySum = mySum + vsmem[tid + 4];\n vsmem[tid] = mySum = mySum + vsmem[tid + 2];\n vsmem[tid] = mySum = mySum + vsmem[tid + 1];\n }\n\n if (tid == 0) g_odata[blockIdx.x] = sdata[0];\n}\n\nint main() {\n const int N = 1048576;\n // ... Host code for Parallel Reduction demo\n printf("Parallel Reduction demo loaded.\\n");\n return 0;\n}`
918
- };
919
-
920
- init();
921
  </script>
922
  </body>
 
923
  </html>
 
1
  <!DOCTYPE html>
2
  <html lang="en">
3
+
4
  <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>ROCmPort AI</title>
8
+ <link rel="preconnect" href="https://fonts.googleapis.com">
9
+ <link
10
+ href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&family=Space+Grotesk:wght@500;600;700&display=swap"
11
+ rel="stylesheet">
12
+ <style>
13
+ :root {
14
+ --bg: #030303;
15
+ --s1: #0a0a0b;
16
+ --s2: #121214;
17
+ --s3: #1a1a1e;
18
+ --b1: rgba(255, 255, 255, 0.08);
19
+ --b2: rgba(255, 255, 255, 0.15);
20
+ --red: #ff3344;
21
+ --red-glow: rgba(255, 51, 68, 0.4);
22
+ --green: #00ff88;
23
+ --green-glow: rgba(0, 255, 136, 0.4);
24
+ --yellow: #ffcc00;
25
+ --cyan: #00d9ff;
26
+ --muted: #88888e;
27
+ --t1: #a1a1aa;
28
+ --t2: #d4d4d8;
29
+ --t3: #ffffff;
30
+ --mono: 'JetBrains Mono', monospace;
31
+ --sans: 'Space Grotesk', sans-serif;
32
+ --spring: cubic-bezier(0.34, 1.56, 0.64, 1);
33
+ }
34
+
35
+ * {
36
+ margin: 0;
37
+ padding: 0;
38
+ box-sizing: border-box;
39
+ cursor: none !important;
40
+ }
41
+
42
+ .hide {
43
+ display: none !important;
44
+ }
45
+
46
+ body {
47
+ background: var(--bg);
48
+ color: var(--t1);
49
+ font-family: var(--sans);
50
+ font-size: 14px;
51
+ line-height: 1.6;
52
+ overflow-x: hidden;
53
+ min-height: 100vh;
54
+ }
55
+
56
+ /* Animated Gradient Background */
57
+ body::before {
58
+ content: '';
59
+ position: fixed;
60
+ inset: 0;
61
+ background:
62
+ radial-gradient(circle at 20% 30%, rgba(0, 217, 255, 0.05), transparent 40%),
63
+ radial-gradient(circle at 80% 70%, rgba(255, 51, 68, 0.05), transparent 40%),
64
+ radial-gradient(circle at 50% 50%, rgba(0, 255, 136, 0.03), transparent 60%);
65
+ z-index: -1;
66
+ animation: bgMove 20s ease-in-out infinite alternate;
67
+ }
68
+
69
+ @keyframes bgMove {
70
+ 0% {
71
+ transform: scale(1) translate(0, 0);
72
+ }
73
+
74
+ 50% {
75
+ transform: scale(1.1) translate(20px, -20px);
76
+ }
77
+
78
+ 100% {
79
+ transform: scale(1) translate(-20px, 20px);
80
+ }
81
+ }
82
+
83
+ .w {
84
+ max-width: 1200px;
85
+ margin: 0 auto;
86
+ padding: 32px 24px;
87
+ position: relative;
88
+ }
89
+
90
+ /* Container Glow */
91
+ .w::after {
92
+ content: '';
93
+ position: absolute;
94
+ inset: 0;
95
+ background: radial-gradient(circle at 50% 0%, rgba(255, 51, 68, 0.08), transparent 70%);
96
+ pointer-events: none;
97
+ z-index: -1;
98
+ }
99
+
100
+ header {
101
+ padding-bottom: 24px;
102
+ border-bottom: 1px solid var(--b1);
103
+ display: flex;
104
+ align-items: center;
105
+ justify-content: space-between;
106
+ margin-bottom: 24px;
107
+ }
108
+
109
+ .logo {
110
+ font-weight: 700;
111
+ font-size: 18px;
112
+ color: var(--t3);
113
+ letter-spacing: -0.02em;
114
+ }
115
+
116
+ .logo em {
117
+ font-style: normal;
118
+ color: var(--red);
119
+ text-shadow: 0 0 15px var(--red-glow);
120
+ }
121
+
122
+ .hr {
123
+ font-size: 12px;
124
+ color: var(--muted);
125
+ display: flex;
126
+ align-items: center;
127
+ gap: 10px;
128
+ background: var(--s1);
129
+ padding: 6px 12px;
130
+ border-radius: 20px;
131
+ border: 1px solid var(--b1);
132
+ }
133
+
134
+ .hd {
135
+ width: 6px;
136
+ height: 6px;
137
+ border-radius: 50%;
138
+ background: var(--green);
139
+ box-shadow: 0 0 10px var(--green-glow);
140
+ }
141
+
142
+ .hd.on {
143
+ animation: pulse 2s ease-in-out infinite;
144
+ }
145
+
146
+ @keyframes pulse {
147
+
148
+ 0%,
149
+ 100% {
150
+ opacity: 1;
151
+ transform: scale(1);
152
+ }
153
+
154
+ 50% {
155
+ opacity: 0.4;
156
+ transform: scale(0.8);
157
+ }
158
+ }
159
+
160
+ .g {
161
+ display: grid;
162
+ grid-template-columns: 1.2fr 0.8fr;
163
+ gap: 24px;
164
+ padding: 0;
165
+ }
166
+
167
+ .fs {
168
+ grid-column: 1 / -1;
169
+ }
170
+
171
+ @media (max-width: 900px) {
172
+ .g {
173
+ grid-template-columns: 1fr;
174
+ }
175
+ }
176
+
177
+ /* Card Styling */
178
+ .p {
179
+ background: var(--s1);
180
+ border: 1px solid var(--b1);
181
+ border-radius: 12px;
182
+ overflow: hidden;
183
+ display: flex;
184
+ flex-direction: column;
185
+ box-shadow: 0 4px 20px rgba(0, 0, 0, 0.4);
186
+ backdrop-filter: blur(10px);
187
+ transition: transform 0.3s var(--spring), border-color 0.3s ease;
188
+ }
189
+
190
+ .p:hover {
191
+ border-color: var(--b2);
192
+ }
193
+
194
+ .ph {
195
+ padding: 12px 16px;
196
+ border-bottom: 1px solid var(--b1);
197
+ display: flex;
198
+ align-items: center;
199
+ justify-content: space-between;
200
+ font-size: 12px;
201
+ color: var(--muted);
202
+ background: rgba(255, 255, 255, 0.02);
203
+ }
204
+
205
+ .ph b {
206
+ color: var(--red);
207
+ font-weight: 600;
208
+ text-transform: uppercase;
209
+ letter-spacing: 0.05em;
210
+ }
211
+
212
+ textarea.code {
213
+ width: 100%;
214
+ flex: 1;
215
+ min-height: 300px;
216
+ background: var(--bg);
217
+ border: none;
218
+ color: var(--t2);
219
+ font-family: var(--mono);
220
+ font-size: 13px;
221
+ line-height: 1.7;
222
+ padding: 20px;
223
+ resize: vertical;
224
+ outline: none;
225
+ caret-color: var(--red);
226
+ will-change: transform;
227
+ }
228
+
229
+ .db {
230
+ padding: 12px 16px;
231
+ border-top: 1px solid var(--b1);
232
+ display: flex;
233
+ align-items: center;
234
+ gap: 8px;
235
+ background: var(--s1);
236
+ }
237
+
238
+ .db .l {
239
+ font-size: 11px;
240
+ color: var(--muted);
241
+ font-weight: 500;
242
+ }
243
+
244
+ .ch {
245
+ font-family: var(--sans);
246
+ font-size: 11px;
247
+ padding: 4px 12px;
248
+ background: var(--s2);
249
+ border: 1px solid var(--b1);
250
+ border-radius: 6px;
251
+ color: var(--t1);
252
+ cursor: pointer;
253
+ transition: all 0.2s var(--spring);
254
+ }
255
+
256
+ .ch:hover {
257
+ background: var(--s3);
258
+ color: var(--t3);
259
+ transform: translateY(-1px);
260
+ border-color: var(--b2);
261
+ }
262
+
263
+ .ch.on {
264
+ background: var(--red);
265
+ border-color: var(--red);
266
+ color: #fff;
267
+ box-shadow: 0 0 15px var(--red-glow);
268
+ }
269
+
270
+ .bg {
271
+ margin: 16px;
272
+ padding: 14px;
273
+ background: var(--red);
274
+ border: none;
275
+ border-radius: 8px;
276
+ color: #fff;
277
+ font-family: var(--sans);
278
+ font-size: 14px;
279
+ font-weight: 700;
280
+ cursor: pointer;
281
+ transition: all 0.3s var(--spring);
282
+ text-transform: uppercase;
283
+ letter-spacing: 0.05em;
284
+ box-shadow: 0 4px 15px var(--red-glow);
285
+ }
286
+
287
+ .bg:hover {
288
+ background: #ff4d5a;
289
+ transform: translateY(-2px);
290
+ box-shadow: 0 6px 20px var(--red-glow);
291
+ }
292
+
293
+ .bg:active {
294
+ transform: translateY(0);
295
+ }
296
+
297
+ .bg:disabled {
298
+ opacity: 0.4;
299
+ cursor: not-allowed;
300
+ transform: none;
301
+ box-shadow: none;
302
+ }
303
+
304
+ /* Agent log */
305
+ .al {
306
+ padding: 12px;
307
+ display: flex;
308
+ flex-direction: column;
309
+ gap: 8px;
310
+ }
311
+
312
+ .ar {
313
+ padding: 12px 16px;
314
+ border-radius: 8px;
315
+ background: rgba(255, 255, 255, 0.03);
316
+ border: 1px solid transparent;
317
+ transition: all 0.4s var(--spring);
318
+ animation: slideIn 0.5s var(--spring) forwards;
319
+ opacity: 0;
320
+ transform: translateX(20px);
321
+ }
322
+
323
+ @keyframes slideIn {
324
+ to {
325
+ opacity: 1;
326
+ transform: translateX(0);
327
+ }
328
+ }
329
+
330
+ .ar.run {
331
+ border-color: var(--cyan);
332
+ background: rgba(0, 217, 255, 0.05);
333
+ }
334
+
335
+ .ar.done {
336
+ border-color: var(--green);
337
+ background: rgba(0, 255, 136, 0.05);
338
+ }
339
+
340
+ .ar.fail {
341
+ border-color: var(--red);
342
+ background: rgba(255, 51, 68, 0.05);
343
+ }
344
+
345
+ .ar.retry {
346
+ border-color: var(--yellow);
347
+ background: rgba(255, 204, 0, 0.05);
348
+ animation: pulse-border 1.5s ease-in-out infinite;
349
+ }
350
+
351
+ @keyframes pulse-border {
352
+ 50% {
353
+ border-color: rgba(255, 204, 0, 0.2);
354
+ }
355
+ }
356
+
357
+ .at {
358
+ display: flex;
359
+ align-items: center;
360
+ gap: 12px;
361
+ }
362
+
363
+ .an {
364
+ font-size: 10px;
365
+ font-weight: 700;
366
+ color: var(--muted);
367
+ min-width: 90px;
368
+ text-transform: uppercase;
369
+ letter-spacing: 0.1em;
370
+ }
371
+
372
+ .am {
373
+ font-size: 13px;
374
+ color: var(--t2);
375
+ font-weight: 500;
376
+ }
377
+
378
+ .ad {
379
+ font-size: 11px;
380
+ color: var(--muted);
381
+ margin-top: 4px;
382
+ padding-left: 102px;
383
+ white-space: pre-wrap;
384
+ line-height: 1.6;
385
+ max-height: 100px;
386
+ overflow-y: auto;
387
+ }
388
+
389
+ .ad .w {
390
+ color: var(--yellow);
391
+ font-weight: 600;
392
+ }
393
+
394
+ .ad .g {
395
+ color: var(--green);
396
+ font-weight: 600;
397
+ }
398
+
399
+ /* Horizontal Timeline */
400
+ .timeline {
401
+ display: flex;
402
+ justify-content: space-between;
403
+ padding: 16px 20px;
404
+ background: rgba(255, 255, 255, 0.02);
405
+ border-bottom: 1px solid var(--b1);
406
+ margin-bottom: 8px;
407
+ }
408
+
409
+ .node {
410
+ display: flex;
411
+ flex-direction: column;
412
+ align-items: center;
413
+ gap: 6px;
414
+ position: relative;
415
+ flex: 1;
416
+ }
417
+
418
+ .node::after {
419
+ content: '';
420
+ position: absolute;
421
+ top: 12px;
422
+ left: 50%;
423
+ width: 100%;
424
+ height: 2px;
425
+ background: var(--b1);
426
+ z-index: 0;
427
+ }
428
+
429
+ .node:last-child::after {
430
+ display: none;
431
+ }
432
+
433
+ .ni {
434
+ width: 24px;
435
+ height: 24px;
436
+ border-radius: 50%;
437
+ background: var(--s3);
438
+ border: 2px solid var(--b1);
439
+ display: flex;
440
+ align-items: center;
441
+ justify-content: center;
442
+ font-size: 12px;
443
+ z-index: 1;
444
+ transition: all 0.4s var(--spring);
445
+ }
446
+
447
+ .node.on .ni {
448
+ background: var(--cyan);
449
+ border-color: var(--cyan);
450
+ color: #000;
451
+ box-shadow: 0 0 15px var(--cyan);
452
+ }
453
+
454
+ .node.done .ni {
455
+ background: var(--green);
456
+ border-color: var(--green);
457
+ color: #000;
458
+ box-shadow: 0 0 15px var(--green);
459
+ }
460
+
461
+ .node.fail .ni {
462
+ background: var(--red);
463
+ border-color: var(--red);
464
+ color: #fff;
465
+ }
466
+
467
+ .node.retry .ni {
468
+ animation: pulse-node 1s var(--spring) infinite;
469
+ background: var(--yellow);
470
+ border-color: var(--yellow);
471
+ }
472
+
473
+ @keyframes pulse-node {
474
+
475
+ 0%,
476
+ 100% {
477
+ transform: scale(1);
478
+ }
479
+
480
+ 50% {
481
+ transform: scale(1.2);
482
+ }
483
+ }
484
+
485
+ .nl {
486
+ font-size: 9px;
487
+ font-weight: 700;
488
+ color: var(--muted);
489
+ text-transform: uppercase;
490
+ letter-spacing: 0.05em;
491
+ }
492
+
493
+ .node.on .nl,
494
+ .node.done .nl {
495
+ color: var(--t3);
496
+ }
497
+
498
+ /* Tabs */
499
+ .tabs {
500
+ display: flex;
501
+ gap: 8px;
502
+ }
503
+
504
+ .tab {
505
+ background: var(--s2);
506
+ border: 1px solid var(--b1);
507
+ padding: 6px 16px;
508
+ border-radius: 8px;
509
+ font-family: var(--sans);
510
+ font-size: 12px;
511
+ font-weight: 600;
512
+ color: var(--muted);
513
+ cursor: pointer;
514
+ transition: all 0.2s var(--spring);
515
+ }
516
+
517
+ .tab:hover {
518
+ color: var(--t2);
519
+ background: var(--s3);
520
+ }
521
+
522
+ .tab.on {
523
+ color: var(--t3);
524
+ background: var(--red);
525
+ border-color: var(--red);
526
+ box-shadow: 0 0 10px var(--red-glow);
527
+ }
528
+
529
+ .tc {
530
+ display: none;
531
+ padding: 0;
532
+ animation: fadeIn 0.4s ease;
533
+ }
534
+
535
+ .tc.on {
536
+ display: block;
537
+ }
538
+
539
+ @keyframes fadeIn {
540
+ from {
541
+ opacity: 0;
542
+ transform: translateY(10px);
543
+ }
544
+
545
+ to {
546
+ opacity: 1;
547
+ transform: translateY(0);
548
+ }
549
+ }
550
+
551
+ /* Summary row */
552
+ .sum-row {
553
+ padding: 24px;
554
+ display: flex;
555
+ align-items: center;
556
+ gap: 32px;
557
+ flex-wrap: wrap;
558
+ border-bottom: 1px solid var(--b1);
559
+ background: rgba(0, 255, 136, 0.02);
560
+ }
561
+
562
+ .sum-big {
563
+ font-size: 32px;
564
+ font-weight: 800;
565
+ color: var(--green);
566
+ line-height: 1;
567
+ letter-spacing: -0.02em;
568
+ text-shadow: 0 0 20px var(--green-glow);
569
+ }
570
+
571
+ .sum-big .u {
572
+ font-size: 13px;
573
+ font-weight: 500;
574
+ color: var(--muted);
575
+ margin-left: 4px;
576
+ display: block;
577
+ margin-top: 4px;
578
+ letter-spacing: 0;
579
+ }
580
+
581
+ .sum-big .vic {
582
+ font-size: 11px;
583
+ color: var(--cyan);
584
+ font-weight: 600;
585
+ display: block;
586
+ margin-top: 8px;
587
+ text-shadow: none;
588
+ opacity: 0.8;
589
+ }
590
+
591
+ .sum-sep {
592
+ width: 1px;
593
+ height: 40px;
594
+ background: var(--b1);
595
+ }
596
+
597
+ .sum-chk {
598
+ display: flex;
599
+ align-items: center;
600
+ gap: 8px;
601
+ font-size: 12px;
602
+ color: var(--t2);
603
+ font-weight: 500;
604
+ }
605
+
606
+ .sum-dot {
607
+ width: 8px;
608
+ height: 8px;
609
+ border-radius: 50%;
610
+ flex-shrink: 0;
611
+ }
612
+
613
+ .sum-dot.ok {
614
+ background: var(--green);
615
+ box-shadow: 0 0 8px var(--green-glow);
616
+ }
617
+
618
+ .sum-dot.no {
619
+ background: var(--red);
620
+ box-shadow: 0 0 8px var(--red-glow);
621
+ }
622
+
623
+ .sum-dot.na {
624
+ background: var(--muted);
625
+ box-shadow: none;
626
+ }
627
+
628
+ .sum-type {
629
+ font-size: 11px;
630
+ color: var(--cyan);
631
+ text-transform: uppercase;
632
+ letter-spacing: 0.1em;
633
+ font-weight: 700;
634
+ padding: 4px 10px;
635
+ background: rgba(0, 217, 255, 0.1);
636
+ border-radius: 4px;
637
+ }
638
+
639
+ .sum-bar {
640
+ padding: 16px 24px;
641
+ display: flex;
642
+ align-items: center;
643
+ gap: 12px;
644
+ flex-wrap: wrap;
645
+ border-bottom: 1px solid var(--b1);
646
+ }
647
+
648
+ .bs {
649
+ font-family: var(--sans);
650
+ font-size: 11px;
651
+ font-weight: 700;
652
+ padding: 8px 16px;
653
+ border-radius: 8px;
654
+ border: 1px solid var(--b1);
655
+ background: var(--s2);
656
+ color: var(--t2);
657
+ cursor: pointer;
658
+ transition: all 0.2s var(--spring);
659
+ text-transform: uppercase;
660
+ letter-spacing: 0.05em;
661
+ }
662
+
663
+ .bs:hover {
664
+ border-color: var(--b2);
665
+ transform: translateY(-1px);
666
+ background: var(--s3);
667
+ }
668
+
669
+ .bs.r {
670
+ background: var(--bg);
671
+ border-color: var(--red);
672
+ color: var(--red);
673
+ }
674
+
675
+ .bs.r:hover {
676
+ background: var(--red);
677
+ color: #fff;
678
+ box-shadow: 0 4px 15px var(--red-glow);
679
+ }
680
+
681
+ .bs.gr {
682
+ background: var(--green);
683
+ border-color: var(--green);
684
+ color: #000;
685
+ }
686
+
687
+ .bs.gr:hover {
688
+ box-shadow: 0 4px 15px var(--green-glow);
689
+ transform: translateY(-2px);
690
+ }
691
+
692
+ .sp {
693
+ flex: 1;
694
+ }
695
+
696
+ /* Details tab */
697
+ .dm {
698
+ display: grid;
699
+ grid-template-columns: repeat(5, 1fr);
700
+ border-bottom: 1px solid var(--b1);
701
+ }
702
+
703
+ @media (max-width: 800px) {
704
+ .dm {
705
+ grid-template-columns: repeat(2, 1fr);
706
+ }
707
+ }
708
+
709
+ .di {
710
+ padding: 20px;
711
+ border-right: 1px solid var(--b1);
712
+ background: rgba(255, 255, 255, 0.01);
713
+ }
714
+
715
+ .di:last-child {
716
+ border-right: none;
717
+ }
718
+
719
+ .dl {
720
+ font-size: 10px;
721
+ color: var(--muted);
722
+ text-transform: uppercase;
723
+ letter-spacing: 0.1em;
724
+ margin-bottom: 8px;
725
+ font-weight: 700;
726
+ }
727
+
728
+ .dv {
729
+ font-size: 20px;
730
+ font-weight: 800;
731
+ line-height: 1;
732
+ margin-bottom: 4px;
733
+ color: var(--t3);
734
+ }
735
+
736
+ .dv.g {
737
+ color: var(--green);
738
+ }
739
+
740
+ .dv.c {
741
+ color: var(--cyan);
742
+ }
743
+
744
+ .dv.y {
745
+ color: var(--yellow);
746
+ }
747
+
748
+ .dv.t {
749
+ color: var(--t2);
750
+ font-size: 13px;
751
+ }
752
+
753
+ .ds {
754
+ font-size: 10px;
755
+ color: var(--muted);
756
+ line-height: 1.4;
757
+ }
758
+
759
+ /* Benchmark bars */
760
+ .bk {
761
+ padding: 24px;
762
+ border-bottom: 1px solid var(--b1);
763
+ }
764
+
765
+ .bk-t {
766
+ font-size: 11px;
767
+ color: var(--muted);
768
+ text-transform: uppercase;
769
+ letter-spacing: 0.1em;
770
+ margin-bottom: 16px;
771
+ font-weight: 700;
772
+ }
773
+
774
+ .br {
775
+ display: flex;
776
+ align-items: center;
777
+ gap: 16px;
778
+ margin-bottom: 12px;
779
+ }
780
+
781
+ .br:last-child {
782
+ margin-bottom: 0;
783
+ }
784
+
785
+ .bl {
786
+ font-size: 12px;
787
+ color: var(--t2);
788
+ width: 140px;
789
+ flex-shrink: 0;
790
+ font-weight: 500;
791
+ }
792
+
793
+ .bt {
794
+ flex: 1;
795
+ height: 8px;
796
+ background: var(--bg);
797
+ border-radius: 4px;
798
+ overflow: hidden;
799
+ border: 1px solid var(--b1);
800
+ }
801
+
802
+ .bf {
803
+ height: 100%;
804
+ border-radius: 4px;
805
+ transition: width 1s var(--spring);
806
+ width: 0;
807
+ }
808
+
809
+ .bf.bad {
810
+ background: linear-gradient(90deg, #ff334466, #ff3344);
811
+ box-shadow: 0 0 10px rgba(255, 51, 68, 0.3);
812
+ }
813
+
814
+ .bf.good {
815
+ background: linear-gradient(90deg, #00ff8866, #00ff88);
816
+ box-shadow: 0 0 10px rgba(0, 255, 136, 0.3);
817
+ }
818
+
819
+ .bv {
820
+ font-size: 12px;
821
+ font-weight: 700;
822
+ width: 40px;
823
+ text-align: right;
824
+ flex-shrink: 0;
825
+ }
826
+
827
+ .bv.bad {
828
+ color: var(--red);
829
+ }
830
+
831
+ .bv.good {
832
+ color: var(--green);
833
+ }
834
+
835
+ /* Simple mode note */
836
+ .sn {
837
+ padding: 20px;
838
+ border: 1px solid var(--cyan);
839
+ border-radius: 12px;
840
+ background: rgba(0, 217, 255, 0.05);
841
+ margin: 24px;
842
+ font-size: 13px;
843
+ color: var(--t2);
844
+ line-height: 1.6;
845
+ border-left-width: 4px;
846
+ }
847
+
848
+ /* Diff */
849
+ .dg {
850
+ display: grid;
851
+ grid-template-columns: 1fr 1fr;
852
+ background: var(--bg);
853
+ }
854
+
855
+ @media (max-width: 780px) {
856
+ .dg {
857
+ grid-template-columns: 1fr;
858
+ }
859
+
860
+ .dfs:first-child {
861
+ border-right: none !important;
862
+ border-bottom: 1px solid var(--b1);
863
+ }
864
+ }
865
+
866
+ .dfs:first-child {
867
+ border-right: 1px solid var(--b1);
868
+ }
869
+
870
+ .dfh {
871
+ padding: 10px 16px;
872
+ border-bottom: 1px solid var(--b1);
873
+ font-size: 11px;
874
+ color: var(--muted);
875
+ display: flex;
876
+ align-items: center;
877
+ gap: 8px;
878
+ font-weight: 600;
879
+ background: var(--s2);
880
+ }
881
+
882
+ .dft {
883
+ font-size: 9px;
884
+ font-weight: 800;
885
+ padding: 2px 6px;
886
+ border-radius: 4px;
887
+ text-transform: uppercase;
888
+ }
889
+
890
+ .dft.cu {
891
+ background: rgba(255, 51, 68, 0.2);
892
+ color: var(--red);
893
+ }
894
+
895
+ .dft.ro {
896
+ background: rgba(0, 255, 136, 0.2);
897
+ color: var(--green);
898
+ }
899
+
900
+ .dfp {
901
+ padding: 20px;
902
+ font-family: var(--mono);
903
+ font-size: 12px;
904
+ line-height: 1.7;
905
+ overflow: auto;
906
+ max-height: 500px;
907
+ white-space: pre;
908
+ color: var(--t2);
909
+ }
910
+
911
+ .dlo {
912
+ background: rgba(255, 51, 68, 0.1);
913
+ color: var(--red);
914
+ text-decoration: line-through;
915
+ display: block;
916
+ width: 100%;
917
+ }
918
+
919
+ .dln {
920
+ background: rgba(0, 255, 136, 0.1);
921
+ color: var(--green);
922
+ display: block;
923
+ width: 100%;
924
+ }
925
+
926
+ /* Loading Skeleton */
927
+ .skeleton {
928
+ position: relative;
929
+ overflow: hidden;
930
+ background: var(--s2);
931
+ border-radius: 12px;
932
+ height: 200px;
933
+ margin-top: 24px;
934
+ }
935
+
936
+ .skeleton::after {
937
+ content: '';
938
+ position: absolute;
939
+ inset: 0;
940
+ transform: translateX(-100%);
941
+ background: linear-gradient(90deg, transparent, rgba(255, 255, 255, 0.05), transparent);
942
+ animation: shimmer 1.5s infinite;
943
+ }
944
+
945
+ @keyframes shimmer {
946
+ 100% {
947
+ transform: translateX(100%);
948
+ }
949
+ }
950
+
951
+ /* Custom Cursor */
952
+ #cursor {
953
+ position: fixed;
954
+ width: 20px;
955
+ height: 20px;
956
+ background: rgba(255, 255, 255, 0.2);
957
+ border: 1px solid rgba(255, 255, 255, 0.4);
958
+ border-radius: 50%;
959
+ pointer-events: none;
960
+ z-index: 9999;
961
+ transition: transform 0.1s ease, width 0.3s var(--spring), height 0.3s var(--spring), background 0.3s ease;
962
+ mix-blend-mode: difference;
963
+ }
964
+
965
+ #cursor.active {
966
+ transform: scale(3);
967
+ background: rgba(255, 51, 68, 0.3);
968
+ border-color: var(--red);
969
+ }
970
+
971
+ /* Modal */
972
+ .mo {
973
+ display: none;
974
+ position: fixed;
975
+ inset: 0;
976
+ background: rgba(0, 0, 0, 0.85);
977
+ z-index: 1000;
978
+ place-items: center;
979
+ backdrop-filter: blur(8px);
980
+ }
981
+
982
+ .mo.open {
983
+ display: grid;
984
+ }
985
+
986
+ .mb {
987
+ background: var(--s1);
988
+ border: 1px solid var(--b1);
989
+ border-radius: 16px;
990
+ width: 90%;
991
+ max-width: 800px;
992
+ max-height: 90vh;
993
+ overflow: hidden;
994
+ box-shadow: 0 20px 50px rgba(0, 0, 0, 0.6);
995
+ }
996
+
997
+ .mt {
998
+ padding: 16px 24px;
999
+ border-bottom: 1px solid var(--b1);
1000
+ display: flex;
1001
+ justify-content: space-between;
1002
+ align-items: center;
1003
+ background: var(--s2);
1004
+ }
1005
+
1006
+ .mt h3 {
1007
+ font-size: 16px;
1008
+ color: var(--t3);
1009
+ font-weight: 700;
1010
+ }
1011
+
1012
+ .mx {
1013
+ background: none;
1014
+ border: none;
1015
+ color: var(--muted);
1016
+ font-size: 24px;
1017
+ cursor: pointer !important;
1018
+ line-height: 1;
1019
+ transition: color 0.2s;
1020
+ }
1021
+
1022
+ .mx:hover {
1023
+ color: var(--t3);
1024
+ }
1025
+
1026
+ .mc {
1027
+ padding: 24px;
1028
+ }
1029
+
1030
+ .mc textarea {
1031
+ width: 100%;
1032
+ height: 400px;
1033
+ background: var(--bg);
1034
+ border: 1px solid var(--b1);
1035
+ border-radius: 8px;
1036
+ padding: 16px;
1037
+ color: var(--cyan);
1038
+ font-family: var(--mono);
1039
+ font-size: 12px;
1040
+ line-height: 1.6;
1041
+ resize: vertical;
1042
+ outline: none;
1043
+ }
1044
+
1045
+ .mc textarea:focus {
1046
+ border-color: var(--cyan);
1047
+ box-shadow: 0 0 10px rgba(0, 217, 255, 0.2);
1048
+ }
1049
+
1050
+ .mf {
1051
+ padding: 16px 24px;
1052
+ border-top: 1px solid var(--b1);
1053
+ display: flex;
1054
+ justify-content: flex-end;
1055
+ gap: 12px;
1056
+ background: var(--s2);
1057
+ }
1058
+
1059
+ ::-webkit-scrollbar {
1060
+ width: 6px;
1061
+ height: 6px;
1062
+ }
1063
+
1064
+ ::-webkit-scrollbar-track {
1065
+ background: transparent;
1066
+ }
1067
+
1068
+ ::-webkit-scrollbar-thumb {
1069
+ background: var(--b1);
1070
+ border-radius: 10px;
1071
+ }
1072
+
1073
+ ::-webkit-scrollbar-thumb:hover {
1074
+ background: var(--b2);
1075
+ }
1076
+
1077
+ footer {
1078
+ padding: 32px 0;
1079
+ border-top: 1px solid var(--b1);
1080
+ display: flex;
1081
+ justify-content: space-between;
1082
+ font-size: 11px;
1083
+ color: var(--muted);
1084
+ font-weight: 500;
1085
+ }
1086
+
1087
+ footer a {
1088
+ color: var(--muted);
1089
+ text-decoration: none;
1090
+ transition: color 0.2s;
1091
+ border-bottom: 1px solid transparent;
1092
+ }
1093
+
1094
+ footer a:hover {
1095
+ color: var(--t2);
1096
+ border-bottom-color: var(--muted);
1097
+ }
1098
+
1099
+ .idle {
1100
+ flex: 1;
1101
+ display: flex;
1102
+ align-items: center;
1103
+ justify-content: center;
1104
+ color: var(--b2);
1105
+ font-size: 13px;
1106
+ font-weight: 500;
1107
+ min-height: 100px;
1108
+ }
1109
+ </style>
1110
  </head>
1111
  <div id="cursor"></div>
1112
 
 
1115
  <div class="logo">ROCmPort <em>AI</em></div>
1116
  <div class="hr">
1117
  <div class="hd on" id="hdot"></div>
1118
+ <span id="hstat">Ready</span>
1119
  </div>
1120
  </header>
1121
 
1122
  <div class="g">
1123
  <div class="p">
1124
+ <div class="ph">
1125
+ <div><b>//</b> CUDA source</div>
1126
+ <div id="lc">0 lines</div>
1127
+ </div>
1128
  <textarea class="code" id="inp" spellcheck="false" placeholder="// Paste CUDA code here
1129
  // or pick a demo below
1130
 
 
1143
  </div>
1144
 
1145
  <div class="p">
1146
+ <div class="ph">
1147
+ <div><b>//</b> Pipeline</div>
1148
+ <div id="pt">0.0s</div>
1149
+ </div>
1150
  <div class="timeline" id="tl">
1151
  <!-- Nodes injected by JS -->
1152
  </div>
 
1176
 
1177
  <footer>
1178
  <div>ROCmPort AI — AMD Developer Hackathon 2025</div>
1179
+ <div><a href="https://x.com/TazwarEnan" target="_blank">Tazwar Ahnaf Enan</a> · <a
1180
+ href="https://github.com/tazwaryayyyy" target="_blank">GitHub</a></div>
1181
  </footer>
1182
  </div>
1183
 
1184
  <div class="mo" id="modal">
1185
  <div class="mb">
1186
+ <div class="mt">
1187
+ <h3>Edit ROCm code</h3><button class="mx" onclick="cm()">&times;</button>
1188
+ </div>
1189
  <div class="mc"><textarea id="edt"></textarea></div>
1190
+ <div class="mf"><button class="bs" onclick="cm()">Cancel</button><button class="bs r"
1191
+ onclick="rec()">Re-test</button></div>
1192
  </div>
1193
  </div>
1194
  <script>
1195
+ const API = 'http://localhost:8000';
1196
+ const S = { code: '', kn: 'custom', run: false, t0: null, iv: null, rep: null, tl: [], kernels: {} };
1197
+ const AG = {
1198
+ analyzer: { n: 'ANALYZER', i: '🔍' },
1199
+ translator: { n: 'TRANSLATOR', i: '🔄' },
1200
+ optimizer: { n: 'OPTIMIZER', i: '⚡' },
1201
+ tester: { n: 'TESTER', i: '🧪' },
1202
+ coordinator: { n: 'COORDINATOR', i: '📋' }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1203
  };
1204
+
1205
+ // Custom Cursor Logic
1206
+ const cur = document.getElementById('cursor');
1207
+ document.addEventListener('mousemove', (e) => {
1208
+ cur.style.left = e.clientX + 'px';
1209
+ cur.style.top = e.clientY + 'px';
1210
+ const target = e.target;
1211
+ const isClickable = target.onclick ||
1212
+ target.tagName === 'BUTTON' ||
1213
+ target.tagName === 'A' ||
1214
+ target.tagName === 'TEXTAREA' ||
1215
+ target.classList.contains('ch') ||
1216
+ target.classList.contains('tab');
1217
+
1218
+ if (isClickable) {
1219
+ cur.classList.add('active');
1220
+ if (target.id === 'go') cur.style.background = 'rgba(255, 51, 68, 0.5)';
1221
+ else cur.style.background = 'rgba(255, 255, 255, 0.3)';
1222
+ } else {
1223
+ cur.classList.remove('active');
1224
+ cur.style.background = 'rgba(255, 255, 255, 0.2)';
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1225
  }
1226
+ });
1227
+
1228
+ async function init() {
1229
+ const ta = document.getElementById('inp');
1230
+ ta.oninput = () => {
1231
+ document.getElementById('lc').textContent = ta.value.split('\n').length + ' lines';
1232
+ S.code = ta.value;
1233
+ };
1234
+ try {
1235
+ const r = await fetch(API + '/demo-kernels');
1236
+ S.kernels = await r.json();
1237
+ } catch (e) { S.kernels = FB; }
1238
  }
1239
+
1240
+ function lk(n, btn) {
1241
+ document.querySelectorAll('.ch').forEach(c => c.classList.remove('on'));
1242
+ btn.classList.add('on');
1243
+ const code = S.kernels[n] || FB[n] || '', ta = document.getElementById('inp');
1244
+ ta.value = code; S.code = code; S.kn = n;
1245
+ document.getElementById('lc').textContent = code.split('\n').length + ' lines';
1246
+ }
1247
+
1248
+ function stab(id, btn) {
1249
+ document.querySelectorAll('.tab').forEach(t => t.classList.remove('on'));
1250
+ document.querySelectorAll('.tc').forEach(t => t.classList.remove('on'));
1251
+ btn.classList.add('on');
1252
+ document.getElementById('t-' + id).classList.add('on');
1253
+ if (id === 'diff' && S.rep) rDiff(S.code, S.rep.optimized_code);
1254
+ }
1255
+
1256
+ async function go() {
1257
+ if (S.run) return;
1258
+ const code = document.getElementById('inp').value.trim();
1259
+ if (!code) return;
1260
+
1261
+ S.code = code; S.run = true; S.t0 = Date.now(); S.tl = [];
1262
+ const btn = document.getElementById('go');
1263
+ btn.disabled = true;
1264
+ btn.textContent = 'Running pipeline...';
1265
+
1266
+ document.getElementById('hstat').textContent = 'Pipeline running...';
1267
+ document.getElementById('rp').classList.add('hide');
1268
+
1269
+ bLog();
1270
+ sTimer();
1271
+
1272
+ try {
1273
+ const simpleModeCheckbox = document.getElementById('sm');
1274
+ const res = await fetch(API + '/port', {
1275
+ method: 'POST',
1276
+ headers: { 'Content-Type': 'application/json' },
1277
+ body: JSON.stringify({
1278
+ cuda_code: code,
1279
+ kernel_name: S.kn,
1280
+ simple_mode: simpleModeCheckbox ? simpleModeCheckbox.checked : false
1281
+ })
1282
  });
1283
+
1284
+ // Show results panel with loader immediately
1285
+ document.getElementById('rp').classList.remove('hide');
1286
+ document.getElementById('t-loader').classList.remove('hide');
1287
+ document.getElementById('t-sum').classList.remove('on');
1288
+ document.getElementById('t-diff').classList.remove('on');
1289
+ document.getElementById('t-det').classList.remove('on');
1290
+
1291
+ const rd = res.body.getReader(), dc = new TextDecoder();
1292
+ let buf = '';
1293
+ while (true) {
1294
+ const { done, value } = await rd.read();
1295
+ if (done) break;
1296
+ buf += dc.decode(value, { stream: true });
1297
+ const lines = buf.split('\n');
1298
+ buf = lines.pop();
1299
+ for (const ln of lines) {
1300
+ if (!ln.startsWith('data: ')) continue;
1301
+ const raw = ln.slice(6).trim();
1302
+ if (raw === '[DONE]') { done_(); break; }
1303
+ try { hEvt(JSON.parse(raw)); } catch (e) { console.error('Parse error:', e); }
1304
+ }
1305
+ }
1306
+ } catch (e) {
1307
+ document.getElementById('hstat').textContent = 'Pipeline error';
1308
+ document.getElementById('t-loader').classList.add('hide'); // Hide loader on error
1309
+ console.error(e);
1310
+ } finally {
1311
+ xTimer();
1312
+ S.run = false;
1313
+ btn.disabled = false;
1314
+ btn.textContent = 'Port to ROCm';
1315
+ document.getElementById('t-loader').classList.add('hide');
1316
  }
1317
  }
1318
+
1319
+ function hEvt(ev) {
1320
+ uLog(ev.agent, ev.status, ev.message, ev.detail);
1321
+ if (ev.agent === 'tester' && (ev.status === 'done' || ev.status === 'failed')) {
1322
+ const m = ev.message.match(/([\d.]+)x/);
1323
+ if (m) {
1324
+ const sp = parseFloat(m[1]), ok = sp >= 1, im = ev.message.match(/Iteration (\d+)/i);
1325
+ S.tl.push({
1326
+ label: 'Iteration ' + (im ? im[1] : S.tl.length + 1) + (ok ? ' (optimized)' : ' (baseline)'),
1327
+ speedup: sp,
1328
+ good: ok
1329
+ });
1330
+ }
1331
+ }
1332
+ if (ev.agent === 'coordinator' && ev.status === 'done' && ev.detail) {
1333
+ try {
1334
+ const r = JSON.parse(ev.detail);
1335
+ S.rep = r;
1336
+ rRes(r, S.tl);
1337
+ } catch (e) { console.error('Coordinator detail parse error:', e); }
1338
+ }
1339
  }
 
1340
 
1341
+ function done_() {
1342
+ document.getElementById('hstat').textContent = 'Pipeline complete';
1343
+ document.getElementById('t-loader').classList.add('hide');
1344
+ if (!S.rep) {
1345
+ document.getElementById('t-sum').innerHTML = '<div class="idle">Migration finished but no report was generated. Check agent logs for details.</div>';
1346
+ document.getElementById('t-sum').classList.add('on');
1347
+ }
1348
  }
1349
+
1350
+ function bLog() {
1351
+ const el = document.getElementById('al');
1352
+ const tl = document.getElementById('tl');
1353
+ el.innerHTML = '';
1354
+ tl.innerHTML = '';
1355
+
1356
+ let i = 0;
1357
+ for (const [k, obj] of Object.entries(AG)) {
1358
+ // Log row
1359
+ const d = document.createElement('div');
1360
+ d.className = 'ar';
1361
+ d.id = 'ar-' + k;
1362
+ d.style.animationDelay = (i * 0.1) + 's';
1363
+ d.innerHTML = `
 
1364
  <div class="at">
1365
  <span class="an">${obj.n}</span>
1366
  <span class="am" id="am-${k}">Waiting</span>
1367
  </div>
1368
  <div class="ad" id="ad-${k}"></div>`;
1369
+ el.appendChild(d);
1370
+
1371
+ // Timeline node
1372
+ const n = document.createElement('div');
1373
+ n.className = 'node';
1374
+ n.id = 'nd-' + k;
1375
+ n.title = obj.n;
1376
+ n.innerHTML = `<div class="ni">${obj.i}</div><div class="nl">${obj.n.slice(0, 3)}</div>`;
1377
+ tl.appendChild(n);
1378
+ i++;
1379
+ }
1380
  }
1381
+
1382
+ function uLog(a, s, m, d) {
1383
+ const row = document.getElementById('ar-' + a);
1384
+ const node = document.getElementById('nd-' + a);
1385
+ if (!row || !node) return;
1386
+
1387
+ const statusClass = { running: 'run', done: 'done', failed: 'fail', retrying: 'retry' }[s] || '';
1388
+ row.className = 'ar ' + statusClass;
1389
+ node.className = 'node ' + (s === 'running' ? 'on' : s === 'retrying' ? 'retry' : s === 'done' ? 'done' : s === 'failed' ? 'fail' : '');
1390
+
1391
+ const me = document.getElementById('am-' + a);
1392
+ if (me) me.textContent = m;
1393
+
1394
+ // Node tooltip message update
1395
+ node.title = m;
1396
+
1397
+ const de = document.getElementById('ad-' + a);
1398
+ if (de && d) {
1399
+ de.innerHTML = esc(d)
1400
+ .replace(/\u26a0\ufe0f([^\n]*)/g, '<span class="w">⚠️ $1</span>')
1401
+ .replace(/\u2705([^\n]*)/g, '<span class="g"> $1</span>');
1402
+ de.scrollTop = de.scrollHeight;
1403
+ }
1404
  }
 
1405
 
1406
+ function rRes(r, tl) {
1407
+ // Hide loader, show summary
1408
+ document.getElementById('t-loader').classList.add('hide');
1409
+ document.getElementById('t-sum').classList.add('on');
1410
+
1411
+ const v = r.verification || {}, bw = r.bandwidth_utilized;
1412
+ const dot = ok => `<div class="sum-dot ${ok === true ? 'ok' : ok === false ? 'no' : 'na'}"></div>`;
1413
 
1414
+ document.getElementById('t-sum').innerHTML = `
1415
  <div class="sum-row">
1416
  <div class="sum-big">
1417
  ${r.speedup}x
1418
  <span class="u">vs baseline hipify</span>
1419
+ <span class="vic">Measured against declared baseline.</span>
1420
  </div>
1421
  <div class="sum-sep"></div>
1422
  <div>
 
1438
  ${r.simplified_explanation ? esc(r.simplified_explanation) : '<em>Simplified explanation will appear here</em>'}
1439
  </div>`;
1440
 
1441
+ // Details tab
1442
+ let dh = `<div class="dm">
1443
  <div class="di"><div class="dl">Speedup</div><div class="dv g">${r.speedup}x</div><div class="ds">optimized ROCm vs straight hipify output</div></div>
1444
  <div class="di"><div class="dl">Bandwidth</div><div class="dv c">${bw != null ? bw.toFixed(1) : '—'}%</div><div class="ds">of MI300X 5.3 TB/s HBM3</div></div>
1445
  <div class="di"><div class="dl">Changes</div><div class="dv y">${r.total_changes}</div><div class="ds">hipify + LLM + optimizer changes</div></div>
1446
  <div class="di"><div class="dl">Iterations</div><div class="dv c">${r.iterations || 1}</div><div class="ds">optimizer retry loop count</div></div>
1447
  <div class="di"><div class="dl">Type</div><div class="dv t">${(r.bottleneck || '—').toUpperCase()}</div><div class="ds">workload classification</div></div>
1448
  </div>`;
1449
+
1450
+ if (tl.length) {
1451
+ dh += '<div class="bk"><div class="bk-t">Benchmark iterations (optimized vs baseline hipify)</div>';
1452
+ tl.forEach(d => {
1453
+ const pct = Math.min(Math.max((d.speedup / 2) * 100, 3), 95);
1454
+ dh += `<div class="br">
1455
  <div class="bl">${esc(d.label)}</div>
1456
  <div class="bt"><div class="bf ${d.good ? 'good' : 'bad'}" style="width: 0" data-w="${pct}%"></div></div>
1457
  <div class="bv ${d.good ? 'good' : 'bad'}">${d.speedup}x</div>
1458
  </div>`;
1459
+ });
1460
+ dh += '</div>';
1461
+ }
1462
+
1463
+ document.getElementById('t-det').innerHTML = dh;
1464
+ tsm(); // Ensure simple note visibility matches current toggle state
1465
+
1466
+ // Progress bar animation
1467
+ setTimeout(() => {
1468
+ document.querySelectorAll('.bf[data-w]').forEach(b => {
1469
+ b.style.width = b.dataset.w;
1470
+ });
1471
+ }, 100);
1472
  }
1473
+
1474
+ function rDiff(o, n) {
1475
+ if (!o || !n) return;
1476
+ const oe = document.getElementById('d-o'), ne = document.getElementById('d-n');
1477
+ if (oe && oe.innerHTML && ne && ne.innerHTML) return; // Already rendered
1478
+
1479
+ document.getElementById('t-diff').innerHTML = `<div class="dg">
 
 
 
 
 
 
 
 
 
 
 
1480
  <div class="dfs"><div class="dfh"><span class="dft cu">CUDA</span> Original Source</div><pre class="dfp" id="d-o"></pre></div>
1481
  <div class="dfs"><div class="dfh"><span class="dft ro">ROCm</span> Optimized HIP</div><pre class="dfp" id="d-n"></pre></div>
1482
  </div>`;
1483
+
1484
+ const oL = o.split('\n'), nL = n.split('\n'), mx = Math.max(oL.length, nL.length);
1485
+ let oH = '', nH = '';
1486
+ for (let i = 0; i < mx; i++) {
1487
+ const a = oL[i] ?? '', b = nL[i] ?? '', c = a !== b;
1488
+ oH += `<span class="${c ? 'dlo' : ''}">${esc(a)}\n</span>`;
1489
+ nH += `<span class="${c ? 'dln' : ''}">${esc(b)}\n</span>`;
1490
+ }
1491
+ document.getElementById('d-o').innerHTML = oH;
1492
+ document.getElementById('d-n').innerHTML = nH;
1493
  }
1494
+
1495
+ function sTimer() { S.iv = setInterval(() => { document.getElementById('pt').textContent = ((Date.now() - S.t0) / 1000).toFixed(1) + 's' }, 100) }
1496
+ function xTimer() { clearInterval(S.iv) }
1497
+
1498
+ function dlR() {
1499
+ const r = S.rep; if (!r) return;
1500
+ const md = `# ROCmPort AI — Migration Report\n\n## Results\n- **Speedup**: ${r.speedup}x\n- **Bandwidth**: ${r.bandwidth_utilized ? r.bandwidth_utilized.toFixed(1) : '—'}%\n- **Changes**: ${r.total_changes}\n- **Iterations**: ${r.iterations}\n- **Type**: ${r.bottleneck}\n\n${r.amd_advantage_explanation ? '> ' + r.amd_advantage_explanation + '\n\n' : ''}${r.cost_estimate ? '## Cost Impact\n- Manual: ' + r.cost_estimate.manual_porting_weeks + '\n- ROCmPort: ' + r.cost_estimate.rocmport_minutes + '\n- Savings: ' + r.cost_estimate.estimated_savings + '\n\n' : ''}## ROCm/HIP Code\n\`\`\`cpp\n${r.optimized_code || ''}\n\`\`\`\n\n---\n*Generated by ROCmPort AI*\n`;
1501
+ const a = document.createElement('a'); a.href = URL.createObjectURL(new Blob([md], { type: 'text/markdown' })); a.download = 'rocmport-migration-report.md'; a.click();
1502
+ }
1503
+
1504
+ function om() { if (!S.rep) return alert('No results yet!'); document.getElementById('edt').value = S.rep?.optimized_code || ''; document.getElementById('modal').classList.add('open') }
1505
+ function cm() { document.getElementById('modal').classList.remove('open') }
1506
+
1507
+ async function rec() {
1508
+ const code = document.getElementById('edt').value.trim(); if (!code) return;
1509
+ try {
1510
+ const res = await fetch(API + '/recompile', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ edited_code: code, kernel_name: S.kn }) });
1511
+ const r = await res.json();
1512
+ if (r.success) { cm(); if (r.result) rRes(r.result, S.tl); }
1513
+ else alert('Failed: ' + (r.detail || 'Unknown'))
1514
+ } catch (e) { alert('Error: ' + e.message) }
1515
+ }
1516
+
1517
+ async function exM() {
1518
+ if (!S.rep) return;
1519
+ try {
1520
+ const res = await fetch(API + '/export', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ original_cuda: S.code, final_rocm: S.rep.optimized_code, migration_report: S.rep }) });
1521
+ if (res.ok) { const a = document.createElement('a'); a.href = URL.createObjectURL(await res.blob()); a.download = 'rocmport-migration.zip'; a.click() }
1522
+ } catch (e) { alert('Export error') }
1523
+ }
1524
+
1525
+ function tsm() {
1526
+ const sn = document.getElementById('sn');
1527
+ if (sn) sn.classList.remove('hide');
1528
+ }
1529
+
1530
+ function esc(s) { return String(s ?? '').replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;') }
1531
+
1532
+ const FB = {
1533
+ vector_add: `#include <cuda_runtime.h>\n\n__global__ void vector_add_kernel(float* A, float* B, float* C, int N) {\n int idx = blockIdx.x * blockDim.x + threadIdx.x;\n if (idx < N) {\n C[idx] = A[idx] + B[idx];\n }\n}\n\nint main() {\n int N = 1 << 24;\n size_t size = N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n int threads = 128;\n int blocks = (N + threads - 1) / threads;\n vector_add_kernel<<<blocks, threads>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
1534
+ matrix_multiply: `#include <cuda_runtime.h>\n#define WARP_SIZE 32\n\n__global__ void matmul_kernel(float* A, float* B, float* C, int N) {\n int row = blockIdx.y * blockDim.y + threadIdx.y;\n int col = blockIdx.x * blockDim.x + threadIdx.x;\n float sum = 0.0f;\n if (row < N && col < N) {\n for (int k = 0; k < N; k++)\n sum += A[row * N + k] * B[k * N + col];\n C[row * N + col] = sum;\n }\n}\n\n__global__ void warp_reduce(float* data, float* result, int N) {\n int tid = threadIdx.x;\n extern __shared__ float sdata[];\n sdata[tid] = (tid < N) ? data[tid] : 0;\n __syncthreads();\n for (int s = WARP_SIZE/2; s > 0; s >>= 1) {\n if (tid < s) sdata[tid] += sdata[tid + s];\n __syncthreads();\n }\n if (tid == 0) result[blockIdx.x] = sdata[0];\n}\n\nint main() {\n int N = 1024;\n size_t size = N * N * sizeof(float);\n float *d_A, *d_B, *d_C;\n cudaMalloc(&d_A, size);\n cudaMalloc(&d_B, size);\n cudaMalloc(&d_C, size);\n dim3 block(16, 16);\n dim3 grid((N+15)/16, (N+15)/16);\n matmul_kernel<<<grid, block>>>(d_A, d_B, d_C, N);\n cudaDeviceSynchronize();\n cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);\n return 0;\n}`,
1535
+ convolution_2d: `#include <cuda_runtime.h>\n#define BLOCK_SIZE 16\n\n__global__ void conv2d_kernel(\n float* input, float* kernel, float* output,\n int width, int height\n) {\n int x = blockIdx.x * blockDim.x + threadIdx.x;\n int y = blockIdx.y * blockDim.y + threadIdx.y;\n if (x >= width || y >= height) return;\n float sum = 0.0f;\n for (int ky = -1; ky <= 1; ky++) {\n for (int kx = -1; kx <= 1; kx++) {\n int ix = x + kx, iy = y + ky;\n if (ix >= 0 && ix < width && iy >= 0 && iy < height)\n sum += input[iy * width + ix] * kernel[(ky+1)*3 + (kx+1)];\n }\n }\n output[y * width + x] = sum;\n}\n\nint main() {\n int W = 2048, H = 2048;\n float *d_in, *d_ker, *d_out;\n cudaMalloc(&d_in, W*H*sizeof(float));\n cudaMalloc(&d_ker, 9*sizeof(float));\n cudaMalloc(&d_out, W*H*sizeof(float));\n dim3 block(BLOCK_SIZE, BLOCK_SIZE);\n dim3 grid((W+BLOCK_SIZE-1)/BLOCK_SIZE, (H+BLOCK_SIZE-1)/BLOCK_SIZE);\n conv2d_kernel<<<grid, block>>>(d_in, d_ker, d_out, W, H);\n cudaDeviceSynchronize();\n cudaFree(d_in); cudaFree(d_ker); cudaFree(d_out);\n return 0;\n}`,
1536
+ reduction: `#include <cuda_runtime.h>\n#include <stdio.h>\n#include <iostream>\n#include <vector>\n#include <numeric>\n\n// Tree-based reduction kernel\n__global__ void reduction_kernel(float* g_idata, float* g_odata, unsigned int n) {\n extern __shared__ float sdata[];\n unsigned int tid = threadIdx.x;\n unsigned int i = blockIdx.x * (blockDim.x * 2) + threadIdx.x;\n\n float mySum = (i < n) ? g_idata[i] : 0;\n if (i + blockDim.x < n) mySum += g_idata[i + blockDim.x];\n sdata[tid] = mySum;\n __syncthreads();\n\n for (unsigned int s = blockDim.x / 2; s > 32; s >>= 1) {\n if (tid < s) sdata[tid] = mySum = mySum + sdata[tid + s];\n __syncthreads();\n }\n\n // DELIBERATE WARP-SIZE BUG: Unroll to 32 instead of 64\n if (tid < 32) {\n volatile float* vsmem = sdata;\n vsmem[tid] = mySum = mySum + vsmem[tid + 32];\n vsmem[tid] = mySum = mySum + vsmem[tid + 16];\n vsmem[tid] = mySum = mySum + vsmem[tid + 8];\n vsmem[tid] = mySum = mySum + vsmem[tid + 4];\n vsmem[tid] = mySum = mySum + vsmem[tid + 2];\n vsmem[tid] = mySum = mySum + vsmem[tid + 1];\n }\n\n if (tid == 0) g_odata[blockIdx.x] = sdata[0];\n}\n\nint main() {\n const int N = 1048576;\n // ... Host code for Parallel Reduction demo\n printf("Parallel Reduction demo loaded.\\n");\n return 0;\n}`
1537
+ };
1538
+
1539
+ init();
 
 
 
1540
  </script>
1541
  </body>
1542
+
1543
  </html>