# ROCmPort AI Migration Report: cuda_first_repo ## AMD Readiness Score - Before deterministic fixes: 51/100 - After deterministic fixes: 100/100 | Category | Before | After | | --- | ---: | ---: | | Code portability | 0 | 100 | | Environment readiness | 0 | 100 | | Serving readiness | 90 | 100 | | Benchmark readiness | 65 | 100 | | Deployment readiness | 100 | 100 | ## Findings | Severity | Category | Location | Finding | Suggested fix | | --- | --- | --- | --- | --- | | high | Benchmark readiness | `benchmarks/benchmark.py:6` | NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. | | high | Environment readiness | `Dockerfile:1` | Dockerfile uses an NVIDIA CUDA base image. | Use vllm/vllm-openai-rocm:latest for vLLM serving or rocm/pytorch:latest for PyTorch workloads. | | medium | Environment readiness | `Dockerfile:8` | NVIDIA container environment variable found. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. | | high | Code portability | `infer.py:6` | torch.device is hardcoded to CUDA. | Use torch.device("cuda" if torch.cuda.is_available() else "cpu"); ROCm PyTorch reports AMD GPUs through torch.cuda. | | high | Code portability | `infer.py:11` | PyTorch tensor or module is moved with a hardcoded .cuda() call. | Replace .cuda() with .to(_rocmport_device) and define a runtime device abstraction. | | high | Code portability | `infer.py:12` | Tensor or module transfer hardcodes the CUDA device string. | Replace .to("cuda") with .to(_rocmport_device). | | low | Code portability | `infer.py:19` | CUDA availability check may confuse ROCm users because PyTorch ROCm still uses the torch.cuda namespace. | Keep the API call but document that it covers AMD GPUs under ROCm PyTorch. | | medium | Environment readiness | `scripts/serve_vllm.sh:4` | CUDA_VISIBLE_DEVICES is used for GPU selection. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. | | high | Environment readiness | `scripts/serve_vllm.sh:5` | NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. | | low | Serving readiness | `scripts/serve_vllm.sh:6` | vLLM serving command found without explicit ROCm container guidance. | Run vLLM inside vllm/vllm-openai-rocm with /dev/kfd, /dev/dri, host IPC, and video group access. | ## Generated Artifacts - `rocm_patch.diff` contains deterministic MVP fixes. - `Dockerfile.rocm` uses the ROCm-enabled vLLM container. - `amd_developer_cloud_runbook.md` documents the validation path. - `benchmark_result.json` records the AMD benchmark schema and status. ## Qwen Agent Notes Qwen endpoint was not configured. The report uses deterministic scanner output only. ## Remaining Risks - CUDA C++ kernels, custom Triton kernels, and CUDA-only binary dependencies require manual review. - Uploaded repositories are not executed inside the Space; live validation belongs on AMD Developer Cloud. - ROCm performance depends on model, batch shape, vLLM version, ROCm version, and GPU instance configuration.