Spaces:

lablab-ai-amd-developer-hackathon
/

ElevenClip-AI

Running

JakgritB

chore(infra): add deployment and benchmark tooling

df9eb37 6 days ago

1.52 kB

	# AMD Developer Cloud Deployment

	## Instance

	Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.

	## Environment

	Set:

	```bash
	DEMO_MODE=false
	HF_TOKEN=...
	WHISPER_MODEL_ID=openai/whisper-large-v3
	QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
	QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
	FFMPEG_VIDEO_CODEC=h264_amf
	```

	Build with the ROCm inference extras:

	```bash
	docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
	docker compose up
	```

	The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.

	## Inference Notes

	- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
	- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
	- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
	- Keep `preferred_torch_dtype=bfloat16` on MI300X.

	## Benchmark

	Run the same source twice:

	1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
	2. AMD GPU run: expose MI300X devices and run the same command.

	Capture:

	- `input`
	- `transcription`
	- `highlight_detection`
	- `multimodal_analysis`
	- `clip_generation`
	- `total`

	For the presentation, show the API timing JSON and the finished clips side by side.