# ROCmPort AI Migration Report: cuda_first_repo

## AMD Readiness Score

- Before deterministic fixes: 51/100
- After deterministic fixes: 100/100

| Category | Before | After |
| --- | ---: | ---: |
| Code portability | 0 | 100 |
| Environment readiness | 0 | 100 |
| Serving readiness | 90 | 100 |
| Benchmark readiness | 65 | 100 |
| Deployment readiness | 100 | 100 |

## Findings

| Severity | Category | Location | Finding | Suggested fix |
| --- | --- | --- | --- | --- |
| high | Benchmark readiness | `benchmarks/benchmark.py:6` | NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. |
| high | Environment readiness | `Dockerfile:1` | Dockerfile uses an NVIDIA CUDA base image. | Use vllm/vllm-openai-rocm:latest for vLLM serving or rocm/pytorch:latest for PyTorch workloads. |
| medium | Environment readiness | `Dockerfile:8` | NVIDIA container environment variable found. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. |
| high | Code portability | `infer.py:6` | torch.device is hardcoded to CUDA. | Use torch.device("cuda" if torch.cuda.is_available() else "cpu"); ROCm PyTorch reports AMD GPUs through torch.cuda. |
| high | Code portability | `infer.py:11` | PyTorch tensor or module is moved with a hardcoded .cuda() call. | Replace .cuda() with .to(_rocmport_device) and define a runtime device abstraction. |
| high | Code portability | `infer.py:12` | Tensor or module transfer hardcodes the CUDA device string. | Replace .to("cuda") with .to(_rocmport_device). |
| low | Code portability | `infer.py:19` | CUDA availability check may confuse ROCm users because PyTorch ROCm still uses the torch.cuda namespace. | Keep the API call but document that it covers AMD GPUs under ROCm PyTorch. |
| medium | Environment readiness | `scripts/serve_vllm.sh:4` | CUDA_VISIBLE_DEVICES is used for GPU selection. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. |
| high | Environment readiness | `scripts/serve_vllm.sh:5` | NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. |
| low | Serving readiness | `scripts/serve_vllm.sh:6` | vLLM serving command found without explicit ROCm container guidance. | Run vLLM inside vllm/vllm-openai-rocm with /dev/kfd, /dev/dri, host IPC, and video group access. |

## Generated Artifacts

- `rocm_patch.diff` contains deterministic MVP fixes.
- `Dockerfile.rocm` uses the ROCm-enabled vLLM container.
- `amd_developer_cloud_runbook.md` documents the validation path.
- `benchmark_result.json` records the AMD benchmark schema and status.

## Qwen Agent Notes

Qwen endpoint was not configured. The report uses deterministic scanner output only.

## Remaining Risks

- CUDA C++ kernels, custom Triton kernels, and CUDA-only binary dependencies require manual review.
- Uploaded repositories are not executed inside the Space; live validation belongs on AMD Developer Cloud.
- ROCm performance depends on model, batch shape, vLLM version, ROCm version, and GPU instance configuration.