# AMD Developer Cloud Deployment ## Instance Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`. ## Environment Set: ```bash DEMO_MODE=false HF_TOKEN=... WHISPER_MODEL_ID=openai/whisper-large-v3 QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct FFMPEG_VIDEO_CODEC=h264_amf ``` Build with the ROCm inference extras: ```bash docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend docker compose up ``` The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference. ## Inference Notes - Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch. - Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend. - Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows. - Keep `preferred_torch_dtype=bfloat16` on MI300X. ## Benchmark Run the same source twice: 1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`. 2. AMD GPU run: expose MI300X devices and run the same command. Capture: - `input` - `transcription` - `highlight_detection` - `multimodal_analysis` - `clip_generation` - `total` For the presentation, show the API timing JSON and the finished clips side by side.