A newer version of the Gradio SDK is available: 6.14.0
ROCmPort AI Migration Report: cuda_first_repo
AMD Readiness Score
- Before deterministic fixes: 51/100
- After deterministic fixes: 100/100
| Category | Before | After |
|---|---|---|
| Code portability | 0 | 100 |
| Environment readiness | 0 | 100 |
| Serving readiness | 90 | 100 |
| Benchmark readiness | 65 | 100 |
| Deployment readiness | 100 | 100 |
Findings
| Severity | Category | Location | Finding | Suggested fix |
|---|---|---|---|---|
| high | Benchmark readiness | benchmarks/benchmark.py:6 |
NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. |
| high | Environment readiness | Dockerfile:1 |
Dockerfile uses an NVIDIA CUDA base image. | Use vllm/vllm-openai-rocm:latest for vLLM serving or rocm/pytorch:latest for PyTorch workloads. |
| medium | Environment readiness | Dockerfile:8 |
NVIDIA container environment variable found. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. |
| high | Code portability | infer.py:6 |
torch.device is hardcoded to CUDA. | Use torch.device("cuda" if torch.cuda.is_available() else "cpu"); ROCm PyTorch reports AMD GPUs through torch.cuda. |
| high | Code portability | infer.py:11 |
PyTorch tensor or module is moved with a hardcoded .cuda() call. | Replace .cuda() with .to(_rocmport_device) and define a runtime device abstraction. |
| high | Code portability | infer.py:12 |
Tensor or module transfer hardcodes the CUDA device string. | Replace .to("cuda") with .to(_rocmport_device). |
| low | Code portability | infer.py:19 |
CUDA availability check may confuse ROCm users because PyTorch ROCm still uses the torch.cuda namespace. | Keep the API call but document that it covers AMD GPUs under ROCm PyTorch. |
| medium | Environment readiness | scripts/serve_vllm.sh:4 |
CUDA_VISIBLE_DEVICES is used for GPU selection. | Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. |
| high | Environment readiness | scripts/serve_vllm.sh:5 |
NVIDIA-specific GPU inspection command found. | Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. |
| low | Serving readiness | scripts/serve_vllm.sh:6 |
vLLM serving command found without explicit ROCm container guidance. | Run vLLM inside vllm/vllm-openai-rocm with /dev/kfd, /dev/dri, host IPC, and video group access. |
Generated Artifacts
rocm_patch.diffcontains deterministic MVP fixes.Dockerfile.rocmuses the ROCm-enabled vLLM container.amd_developer_cloud_runbook.mddocuments the validation path.benchmark_result.jsonrecords the AMD benchmark schema and status.
Qwen Agent Notes
Qwen endpoint was not configured. The report uses deterministic scanner output only.
Remaining Risks
- CUDA C++ kernels, custom Triton kernels, and CUDA-only binary dependencies require manual review.
- Uploaded repositories are not executed inside the Space; live validation belongs on AMD Developer Cloud.
- ROCm performance depends on model, batch shape, vLLM version, ROCm version, and GPU instance configuration.