AMD Developer Cloud Deployment
Instance
Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible torch.cuda API and reports the HIP version in torch.version.hip.
Environment
Set:
DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf
Build with the ROCm inference extras:
docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up
The Docker compose file mounts /dev/kfd and /dev/dri, adds the video group, and uses host IPC for large model inference.
Inference Notes
- Whisper Large V3 runs through Hugging Face
transformerswith ROCm PyTorch. - Qwen2.5 highlight detection is wired for
vLLMwith ROCm backend. - Qwen2-VL has a service boundary in
backend/app/services/multimodal.py; add frame sampling there when demo time allows. - Keep
preferred_torch_dtype=bfloat16on MI300X.
Benchmark
Run the same source twice:
- CPU baseline: set
DEMO_MODE=false, force CPU by hiding GPUs, and runscripts/benchmark.py. - AMD GPU run: expose MI300X devices and run the same command.
Capture:
inputtranscriptionhighlight_detectionmultimodal_analysisclip_generationtotal
For the presentation, show the API timing JSON and the finished clips side by side.