| # AMD Developer Cloud Deployment | |
| ## Instance | |
| Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`. | |
| ## Environment | |
| Set: | |
| ```bash | |
| DEMO_MODE=false | |
| HF_TOKEN=... | |
| WHISPER_MODEL_ID=openai/whisper-large-v3 | |
| QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct | |
| QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct | |
| FFMPEG_VIDEO_CODEC=h264_amf | |
| ``` | |
| Build with the ROCm inference extras: | |
| ```bash | |
| docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend | |
| docker compose up | |
| ``` | |
| The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference. | |
| ## Inference Notes | |
| - Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch. | |
| - Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend. | |
| - Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows. | |
| - Keep `preferred_torch_dtype=bfloat16` on MI300X. | |
| ## Benchmark | |
| Run the same source twice: | |
| 1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`. | |
| 2. AMD GPU run: expose MI300X devices and run the same command. | |
| Capture: | |
| - `input` | |
| - `transcription` | |
| - `highlight_detection` | |
| - `multimodal_analysis` | |
| - `clip_generation` | |
| - `total` | |
| For the presentation, show the API timing JSON and the finished clips side by side. | |