Spaces:

lablab-ai-amd-developer-hackathon
/

ElevenClip-AI

Running

App Files Files Community

ElevenClip-AI / infra /amd-cloud.md

JakgritB

chore(infra): add deployment and benchmark tooling

df9eb37 6 days ago

preview code

raw

history blame contribute delete

1.52 kB

AMD Developer Cloud Deployment

Instance

Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible torch.cuda API and reports the HIP version in torch.version.hip.

Environment

Set:

DEMO_MODE=false
HF_TOKEN=...
WHISPER_MODEL_ID=openai/whisper-large-v3
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
FFMPEG_VIDEO_CODEC=h264_amf

Build with the ROCm inference extras:

docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
docker compose up

The Docker compose file mounts /dev/kfd and /dev/dri, adds the video group, and uses host IPC for large model inference.

Inference Notes

Whisper Large V3 runs through Hugging Face transformers with ROCm PyTorch.
Qwen2.5 highlight detection is wired for vLLM with ROCm backend.
Qwen2-VL has a service boundary in backend/app/services/multimodal.py; add frame sampling there when demo time allows.
Keep preferred_torch_dtype=bfloat16 on MI300X.

Benchmark

Run the same source twice:

CPU baseline: set DEMO_MODE=false, force CPU by hiding GPUs, and run scripts/benchmark.py.
AMD GPU run: expose MI300X devices and run the same command.

Capture:

input
transcription
highlight_detection
multimodal_analysis
clip_generation
total

For the presentation, show the API timing JSON and the finished clips side by side.