ElevenClip-AI / infra /amd-cloud.md
JakgritB
chore(infra): add deployment and benchmark tooling
df9eb37

AMD Developer Cloud Deployment

Instance

Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible torch.cuda API and reports the HIP version in torch.version.hip.

Environment

Set:

DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf

Build with the ROCm inference extras:

docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up

The Docker compose file mounts /dev/kfd and /dev/dri, adds the video group, and uses host IPC for large model inference.

Inference Notes

  • Whisper Large V3 runs through Hugging Face transformers with ROCm PyTorch.
  • Qwen2.5 highlight detection is wired for vLLM with ROCm backend.
  • Qwen2-VL has a service boundary in backend/app/services/multimodal.py; add frame sampling there when demo time allows.
  • Keep preferred_torch_dtype=bfloat16 on MI300X.

Benchmark

Run the same source twice:

  1. CPU baseline: set DEMO_MODE=false, force CPU by hiding GPUs, and run scripts/benchmark.py.
  2. AMD GPU run: expose MI300X devices and run the same command.

Capture:

  • input
  • transcription
  • highlight_detection
  • multimodal_analysis
  • clip_generation
  • total

For the presentation, show the API timing JSON and the finished clips side by side.