Spaces:

lablab-ai-amd-developer-hackathon
/

ElevenClip-AI

Running

JakgritB commited on 6 days ago

Commit

df9eb37

1 Parent(s): 09e2e7a

chore(infra): add deployment and benchmark tooling

Add Docker Compose with AMD GPU device wiring, AMD Developer Cloud deployment notes, and an API benchmark script for CPU versus MI300X timing comparisons.

Files changed (3) hide show

docker-compose.yml +40 -0
infra/amd-cloud.md +52 -0
scripts/benchmark.py +66 -0

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,40 @@

+services:
+  redis:
+    image: redis:7-alpine
+    ports:
+      - "6379:6379"
+  backend:
+    build:
+      context: ./backend
+      args:
+        ROCM_PYTORCH_IMAGE: ${ROCM_PYTORCH_IMAGE:-rocm/pytorch:latest}
+    env_file:
+      - .env.example
+    environment:
+      STORAGE_DIR: /app/data
+      REDIS_URL: redis://redis:6379/0
+      FRONTEND_ORIGIN: http://localhost:5173
+    volumes:
+      - ./backend/data:/app/data
+    ports:
+      - "8000:8000"
+    depends_on:
+      - redis
+    devices:
+      - /dev/kfd
+      - /dev/dri
+    group_add:
+      - video
+    ipc: host
+    shm_size: 16gb
+  frontend:
+    build:
+      context: ./frontend
+    environment:
+      VITE_API_BASE_URL: http://localhost:8000
+    ports:
+      - "5173:5173"
+    depends_on:
+      - backend

infra/amd-cloud.md ADDED Viewed

	@@ -0,0 +1,52 @@

+# AMD Developer Cloud Deployment
+## Instance
+Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.
+## Environment
+Set:
+```bash
+DEMO_MODE=false
+HF_TOKEN=...
+WHISPER_MODEL_ID=openai/whisper-large-v3
+QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
+QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
+FFMPEG_VIDEO_CODEC=h264_amf
+```
+Build with the ROCm inference extras:
+```bash
+docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
+docker compose up
+```
+The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.
+## Inference Notes
+- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
+- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
+- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
+- Keep `preferred_torch_dtype=bfloat16` on MI300X.
+## Benchmark
+Run the same source twice:
+1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
+2. AMD GPU run: expose MI300X devices and run the same command.
+Capture:
+- `input`
+- `transcription`
+- `highlight_detection`
+- `multimodal_analysis`
+- `clip_generation`
+- `total`
+For the presentation, show the API timing JSON and the finished clips side by side.

scripts/benchmark.py ADDED Viewed

	@@ -0,0 +1,66 @@

+import argparse
+import json
+import time
+import urllib.error
+import urllib.request
+def request_json(url: str, method: str = "GET", payload: dict | None = None) -> dict:
+    body = None
+    headers = {}
+    if payload is not None:
+        body = json.dumps(payload).encode("utf-8")
+        headers["Content-Type"] = "application/json"
+    request = urllib.request.Request(url, data=body, headers=headers, method=method)
+    try:
+        with urllib.request.urlopen(request, timeout=30) as response:
+            return json.loads(response.read().decode("utf-8"))
+    except urllib.error.HTTPError as exc:
+        detail = exc.read().decode("utf-8")
+        raise RuntimeError(f"{exc.code}: {detail}") from exc
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run an AI Clip Studio API benchmark.")
+    parser.add_argument("--api", default="http://localhost:8000")
+    parser.add_argument("--youtube-url", required=True)
+    parser.add_argument("--language", default="Thai")
+    parser.add_argument("--style", default="informative")
+    parser.add_argument("--niche", default="education")
+    parser.add_argument("--clip-length", type=int, default=60)
+    args = parser.parse_args()
+    payload = {
+        "youtube_url": args.youtube_url,
+        "profile": {
+            "niche": args.niche,
+            "clip_style": args.style,
+            "clip_length_seconds": args.clip_length,
+            "primary_language": args.language,
+            "target_platform": "tiktok",
+        },
+    }
+    started = time.perf_counter()
+    job = request_json(f"{args.api}/api/jobs/youtube", "POST", payload)
+    while job["status"] in {"queued", "running"}:
+        time.sleep(2)
+        job = request_json(f"{args.api}/api/jobs/{job['id']}")
+    elapsed = round(time.perf_counter() - started, 3)
+    print(
+        json.dumps(
+            {
+                "job_id": job["id"],
+                "status": job["status"],
+                "elapsed_wall_seconds": elapsed,
+                "clips": len(job.get("clips", [])),
+                "timings": job.get("timings", {}),
+                "error": job.get("error"),
+            },
+            indent=2,
+        )
+    )
+if __name__ == "__main__":
+    main()