ElevenClip-AI / infra /amd-cloud.md
JakgritB
chore(infra): add deployment and benchmark tooling
df9eb37
# AMD Developer Cloud Deployment
## Instance
Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.
## Environment
Set:
```bash
DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf
```
Build with the ROCm inference extras:
```bash
docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up
```
The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.
## Inference Notes
- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
- Keep `preferred_torch_dtype=bfloat16` on MI300X.
## Benchmark
Run the same source twice:
1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
2. AMD GPU run: expose MI300X devices and run the same command.
Capture:
- `input`
- `transcription`
- `highlight_detection`
- `multimodal_analysis`
- `clip_generation`
- `total`
For the presentation, show the API timing JSON and the finished clips side by side.