File size: 1,516 Bytes
df9eb37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# AMD Developer Cloud Deployment

## Instance

Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.

## Environment

Set:

```bash
DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf
```

Build with the ROCm inference extras:

```bash
docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up
```

The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.

## Inference Notes

- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
- Keep `preferred_torch_dtype=bfloat16` on MI300X.

## Benchmark

Run the same source twice:

1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
2. AMD GPU run: expose MI300X devices and run the same command.

Capture:

- `input`
- `transcription`
- `highlight_detection`
- `multimodal_analysis`
- `clip_generation`
- `total`

For the presentation, show the API timing JSON and the finished clips side by side.