JakgritB commited on
Commit
df9eb37
·
1 Parent(s): 09e2e7a

chore(infra): add deployment and benchmark tooling

Browse files

Add Docker Compose with AMD GPU device wiring, AMD Developer Cloud deployment notes, and an API benchmark script for CPU versus MI300X timing comparisons.

Files changed (3) hide show
  1. docker-compose.yml +40 -0
  2. infra/amd-cloud.md +52 -0
  3. scripts/benchmark.py +66 -0
docker-compose.yml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ services:
2
+ redis:
3
+ image: redis:7-alpine
4
+ ports:
5
+ - "6379:6379"
6
+
7
+ backend:
8
+ build:
9
+ context: ./backend
10
+ args:
11
+ ROCM_PYTORCH_IMAGE: ${ROCM_PYTORCH_IMAGE:-rocm/pytorch:latest}
12
+ env_file:
13
+ - .env.example
14
+ environment:
15
+ STORAGE_DIR: /app/data
16
+ REDIS_URL: redis://redis:6379/0
17
+ FRONTEND_ORIGIN: http://localhost:5173
18
+ volumes:
19
+ - ./backend/data:/app/data
20
+ ports:
21
+ - "8000:8000"
22
+ depends_on:
23
+ - redis
24
+ devices:
25
+ - /dev/kfd
26
+ - /dev/dri
27
+ group_add:
28
+ - video
29
+ ipc: host
30
+ shm_size: 16gb
31
+
32
+ frontend:
33
+ build:
34
+ context: ./frontend
35
+ environment:
36
+ VITE_API_BASE_URL: http://localhost:8000
37
+ ports:
38
+ - "5173:5173"
39
+ depends_on:
40
+ - backend
infra/amd-cloud.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AMD Developer Cloud Deployment
2
+
3
+ ## Instance
4
+
5
+ Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.
6
+
7
+ ## Environment
8
+
9
+ Set:
10
+
11
+ ```bash
12
+ DEMO_MODE=false
13
+ HF_TOKEN=...
14
+ WHISPER_MODEL_ID=openai/whisper-large-v3
15
+ QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
16
+ QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
17
+ FFMPEG_VIDEO_CODEC=h264_amf
18
+ ```
19
+
20
+ Build with the ROCm inference extras:
21
+
22
+ ```bash
23
+ docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
24
+ docker compose up
25
+ ```
26
+
27
+ The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.
28
+
29
+ ## Inference Notes
30
+
31
+ - Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
32
+ - Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
33
+ - Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
34
+ - Keep `preferred_torch_dtype=bfloat16` on MI300X.
35
+
36
+ ## Benchmark
37
+
38
+ Run the same source twice:
39
+
40
+ 1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
41
+ 2. AMD GPU run: expose MI300X devices and run the same command.
42
+
43
+ Capture:
44
+
45
+ - `input`
46
+ - `transcription`
47
+ - `highlight_detection`
48
+ - `multimodal_analysis`
49
+ - `clip_generation`
50
+ - `total`
51
+
52
+ For the presentation, show the API timing JSON and the finished clips side by side.
scripts/benchmark.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import time
4
+ import urllib.error
5
+ import urllib.request
6
+
7
+
8
+ def request_json(url: str, method: str = "GET", payload: dict | None = None) -> dict:
9
+ body = None
10
+ headers = {}
11
+ if payload is not None:
12
+ body = json.dumps(payload).encode("utf-8")
13
+ headers["Content-Type"] = "application/json"
14
+ request = urllib.request.Request(url, data=body, headers=headers, method=method)
15
+ try:
16
+ with urllib.request.urlopen(request, timeout=30) as response:
17
+ return json.loads(response.read().decode("utf-8"))
18
+ except urllib.error.HTTPError as exc:
19
+ detail = exc.read().decode("utf-8")
20
+ raise RuntimeError(f"{exc.code}: {detail}") from exc
21
+
22
+
23
+ def main() -> None:
24
+ parser = argparse.ArgumentParser(description="Run an AI Clip Studio API benchmark.")
25
+ parser.add_argument("--api", default="http://localhost:8000")
26
+ parser.add_argument("--youtube-url", required=True)
27
+ parser.add_argument("--language", default="Thai")
28
+ parser.add_argument("--style", default="informative")
29
+ parser.add_argument("--niche", default="education")
30
+ parser.add_argument("--clip-length", type=int, default=60)
31
+ args = parser.parse_args()
32
+
33
+ payload = {
34
+ "youtube_url": args.youtube_url,
35
+ "profile": {
36
+ "niche": args.niche,
37
+ "clip_style": args.style,
38
+ "clip_length_seconds": args.clip_length,
39
+ "primary_language": args.language,
40
+ "target_platform": "tiktok",
41
+ },
42
+ }
43
+ started = time.perf_counter()
44
+ job = request_json(f"{args.api}/api/jobs/youtube", "POST", payload)
45
+ while job["status"] in {"queued", "running"}:
46
+ time.sleep(2)
47
+ job = request_json(f"{args.api}/api/jobs/{job['id']}")
48
+
49
+ elapsed = round(time.perf_counter() - started, 3)
50
+ print(
51
+ json.dumps(
52
+ {
53
+ "job_id": job["id"],
54
+ "status": job["status"],
55
+ "elapsed_wall_seconds": elapsed,
56
+ "clips": len(job.get("clips", [])),
57
+ "timings": job.get("timings", {}),
58
+ "error": job.get("error"),
59
+ },
60
+ indent=2,
61
+ )
62
+ )
63
+
64
+
65
+ if __name__ == "__main__":
66
+ main()