File size: 1,516 Bytes
df9eb37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # AMD Developer Cloud Deployment
## Instance
Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.
## Environment
Set:
```bash
DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf
```
Build with the ROCm inference extras:
```bash
docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up
```
The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.
## Inference Notes
- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
- Keep `preferred_torch_dtype=bfloat16` on MI300X.
## Benchmark
Run the same source twice:
1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
2. AMD GPU run: expose MI300X devices and run the same command.
Capture:
- `input`
- `transcription`
- `highlight_detection`
- `multimodal_analysis`
- `clip_generation`
- `total`
For the presentation, show the API timing JSON and the finished clips side by side.
|