Spaces:

lablab-ai-amd-developer-hackathon
/

rocmport-agentic

Runtime error

App Files Files Community

rocmport-agentic / artifacts /check /migration_report.md

Nawangdorjay

Deploy ROCmPort AI — CUDA-to-ROCm migration scanner

f6e0440 verified 3 days ago

preview code

raw

history blame contribute delete

2.98 kB

	# ROCmPort AI Migration Report: cuda_first_repo

	## AMD Readiness Score

	- Before deterministic fixes: 53/100
	- After deterministic fixes: 100/100

	\| Category \| Before \| After \|
	\| --- \| ---: \| ---: \|
	\| Code portability \| 0 \| 100 \|
	\| Environment readiness \| 8 \| 100 \|
	\| Serving readiness \| 90 \| 100 \|
	\| Benchmark readiness \| 65 \| 100 \|
	\| Deployment readiness \| 100 \| 100 \|

	## Findings

	\| Severity \| Category \| Location \| Finding \| Suggested fix \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| high \| Benchmark readiness \| `benchmarks/benchmark.py:6` \| NVIDIA-specific GPU inspection command found. \| Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. \|
	\| high \| Environment readiness \| `Dockerfile:1` \| Dockerfile uses an NVIDIA CUDA base image. \| Use vllm/vllm-openai-rocm:latest for vLLM serving or rocm/pytorch:latest for PyTorch workloads. \|
	\| medium \| Environment readiness \| `Dockerfile:8` \| NVIDIA container environment variable found. \| Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. \|
	\| high \| Code portability \| `infer.py:6` \| torch.device is hardcoded to CUDA. \| Use torch.device("cuda" if torch.cuda.is_available() else "cpu"); ROCm PyTorch reports AMD GPUs through torch.cuda. \|
	\| high \| Code portability \| `infer.py:11` \| PyTorch tensor or module is moved with a hardcoded .cuda() call. \| Replace .cuda() with .to(_rocmport_device) and define a runtime device abstraction. \|
	\| high \| Code portability \| `infer.py:12` \| Tensor or module transfer hardcodes the CUDA device string. \| Replace .to("cuda") with .to(_rocmport_device). \|
	\| low \| Code portability \| `infer.py:19` \| CUDA availability check may confuse ROCm users because PyTorch ROCm still uses the torch.cuda namespace. \| Keep the API call but document that it covers AMD GPUs under ROCm PyTorch. \|
	\| high \| Environment readiness \| `scripts/serve_vllm.sh:5` \| NVIDIA-specific GPU inspection command found. \| Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. \|
	\| low \| Serving readiness \| `scripts/serve_vllm.sh:6` \| vLLM serving command found without explicit ROCm container guidance. \| Run vLLM inside vllm/vllm-openai-rocm with /dev/kfd, /dev/dri, host IPC, and video group access. \|

	## Generated Artifacts

	- `rocm_patch.diff` contains deterministic MVP fixes.
	- `Dockerfile.rocm` uses the ROCm-enabled vLLM container.
	- `amd_developer_cloud_runbook.md` documents the validation path.
	- `benchmark_result.json` records the AMD benchmark schema and status.

	## Qwen Agent Notes

	Qwen endpoint was not configured. The report uses deterministic scanner output only.

	## Remaining Risks

	- CUDA C++ kernels, custom Triton kernels, and CUDA-only binary dependencies require manual review.
	- Uploaded repositories are not executed inside the Space; live validation belongs on AMD Developer Cloud.
	- ROCm performance depends on model, batch shape, vLLM version, ROCm version, and GPU instance configuration.

	# ROCmPort AI Migration Report: cuda_first_repo

	## AMD Readiness Score

	- Before deterministic fixes: 53/100
	- After deterministic fixes: 100/100

	\| Category \| Before \| After \|
	\| --- \| ---: \| ---: \|
	\| Code portability \| 0 \| 100 \|
	\| Environment readiness \| 8 \| 100 \|
	\| Serving readiness \| 90 \| 100 \|
	\| Benchmark readiness \| 65 \| 100 \|
	\| Deployment readiness \| 100 \| 100 \|

	## Findings

	\| Severity \| Category \| Location \| Finding \| Suggested fix \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| high \| Benchmark readiness \| `benchmarks/benchmark.py:6` \| NVIDIA-specific GPU inspection command found. \| Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. \|
	\| high \| Environment readiness \| `Dockerfile:1` \| Dockerfile uses an NVIDIA CUDA base image. \| Use vllm/vllm-openai-rocm:latest for vLLM serving or rocm/pytorch:latest for PyTorch workloads. \|
	\| medium \| Environment readiness \| `Dockerfile:8` \| NVIDIA container environment variable found. \| Use HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES for AMD GPU targeting. \|
	\| high \| Code portability \| `infer.py:6` \| torch.device is hardcoded to CUDA. \| Use torch.device("cuda" if torch.cuda.is_available() else "cpu"); ROCm PyTorch reports AMD GPUs through torch.cuda. \|
	\| high \| Code portability \| `infer.py:11` \| PyTorch tensor or module is moved with a hardcoded .cuda() call. \| Replace .cuda() with .to(_rocmport_device) and define a runtime device abstraction. \|
	\| high \| Code portability \| `infer.py:12` \| Tensor or module transfer hardcodes the CUDA device string. \| Replace .to("cuda") with .to(_rocmport_device). \|
	\| low \| Code portability \| `infer.py:19` \| CUDA availability check may confuse ROCm users because PyTorch ROCm still uses the torch.cuda namespace. \| Keep the API call but document that it covers AMD GPUs under ROCm PyTorch. \|
	\| high \| Environment readiness \| `scripts/serve_vllm.sh:5` \| NVIDIA-specific GPU inspection command found. \| Use rocm-smi for AMD GPU monitoring and benchmark metadata collection. \|
	\| low \| Serving readiness \| `scripts/serve_vllm.sh:6` \| vLLM serving command found without explicit ROCm container guidance. \| Run vLLM inside vllm/vllm-openai-rocm with /dev/kfd, /dev/dri, host IPC, and video group access. \|

	## Generated Artifacts

	- `rocm_patch.diff` contains deterministic MVP fixes.
	- `Dockerfile.rocm` uses the ROCm-enabled vLLM container.
	- `amd_developer_cloud_runbook.md` documents the validation path.
	- `benchmark_result.json` records the AMD benchmark schema and status.

	## Qwen Agent Notes

	Qwen endpoint was not configured. The report uses deterministic scanner output only.

	## Remaining Risks

	- CUDA C++ kernels, custom Triton kernels, and CUDA-only binary dependencies require manual review.
	- Uploaded repositories are not executed inside the Space; live validation belongs on AMD Developer Cloud.
	- ROCm performance depends on model, batch shape, vLLM version, ROCm version, and GPU instance configuration.