JakgritB commited on
Commit ·
df9eb37
1
Parent(s): 09e2e7a
chore(infra): add deployment and benchmark tooling
Browse filesAdd Docker Compose with AMD GPU device wiring, AMD Developer Cloud deployment notes, and an API benchmark script for CPU versus MI300X timing comparisons.
- docker-compose.yml +40 -0
- infra/amd-cloud.md +52 -0
- scripts/benchmark.py +66 -0
docker-compose.yml
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
services:
|
| 2 |
+
redis:
|
| 3 |
+
image: redis:7-alpine
|
| 4 |
+
ports:
|
| 5 |
+
- "6379:6379"
|
| 6 |
+
|
| 7 |
+
backend:
|
| 8 |
+
build:
|
| 9 |
+
context: ./backend
|
| 10 |
+
args:
|
| 11 |
+
ROCM_PYTORCH_IMAGE: ${ROCM_PYTORCH_IMAGE:-rocm/pytorch:latest}
|
| 12 |
+
env_file:
|
| 13 |
+
- .env.example
|
| 14 |
+
environment:
|
| 15 |
+
STORAGE_DIR: /app/data
|
| 16 |
+
REDIS_URL: redis://redis:6379/0
|
| 17 |
+
FRONTEND_ORIGIN: http://localhost:5173
|
| 18 |
+
volumes:
|
| 19 |
+
- ./backend/data:/app/data
|
| 20 |
+
ports:
|
| 21 |
+
- "8000:8000"
|
| 22 |
+
depends_on:
|
| 23 |
+
- redis
|
| 24 |
+
devices:
|
| 25 |
+
- /dev/kfd
|
| 26 |
+
- /dev/dri
|
| 27 |
+
group_add:
|
| 28 |
+
- video
|
| 29 |
+
ipc: host
|
| 30 |
+
shm_size: 16gb
|
| 31 |
+
|
| 32 |
+
frontend:
|
| 33 |
+
build:
|
| 34 |
+
context: ./frontend
|
| 35 |
+
environment:
|
| 36 |
+
VITE_API_BASE_URL: http://localhost:8000
|
| 37 |
+
ports:
|
| 38 |
+
- "5173:5173"
|
| 39 |
+
depends_on:
|
| 40 |
+
- backend
|
infra/amd-cloud.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AMD Developer Cloud Deployment
|
| 2 |
+
|
| 3 |
+
## Instance
|
| 4 |
+
|
| 5 |
+
Use an AMD Instinct MI300X instance with ROCm 6.x. The backend expects ROCm-enabled PyTorch; on ROCm, PyTorch exposes AMD GPUs through the CUDA-compatible `torch.cuda` API and reports the HIP version in `torch.version.hip`.
|
| 6 |
+
|
| 7 |
+
## Environment
|
| 8 |
+
|
| 9 |
+
Set:
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
DEMO_MODE=false
|
| 13 |
+
HF_TOKEN=...
|
| 14 |
+
WHISPER_MODEL_ID=openai/whisper-large-v3
|
| 15 |
+
QWEN_TEXT_MODEL_ID=Qwen/Qwen2.5-7B-Instruct
|
| 16 |
+
QWEN_VL_MODEL_ID=Qwen/Qwen2-VL-7B-Instruct
|
| 17 |
+
FFMPEG_VIDEO_CODEC=h264_amf
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
Build with the ROCm inference extras:
|
| 21 |
+
|
| 22 |
+
```bash
|
| 23 |
+
docker compose build --build-arg INSTALL_EXTRAS=.[ai,rocm-inference] backend
|
| 24 |
+
docker compose up
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
The Docker compose file mounts `/dev/kfd` and `/dev/dri`, adds the `video` group, and uses host IPC for large model inference.
|
| 28 |
+
|
| 29 |
+
## Inference Notes
|
| 30 |
+
|
| 31 |
+
- Whisper Large V3 runs through Hugging Face `transformers` with ROCm PyTorch.
|
| 32 |
+
- Qwen2.5 highlight detection is wired for `vLLM` with ROCm backend.
|
| 33 |
+
- Qwen2-VL has a service boundary in `backend/app/services/multimodal.py`; add frame sampling there when demo time allows.
|
| 34 |
+
- Keep `preferred_torch_dtype=bfloat16` on MI300X.
|
| 35 |
+
|
| 36 |
+
## Benchmark
|
| 37 |
+
|
| 38 |
+
Run the same source twice:
|
| 39 |
+
|
| 40 |
+
1. CPU baseline: set `DEMO_MODE=false`, force CPU by hiding GPUs, and run `scripts/benchmark.py`.
|
| 41 |
+
2. AMD GPU run: expose MI300X devices and run the same command.
|
| 42 |
+
|
| 43 |
+
Capture:
|
| 44 |
+
|
| 45 |
+
- `input`
|
| 46 |
+
- `transcription`
|
| 47 |
+
- `highlight_detection`
|
| 48 |
+
- `multimodal_analysis`
|
| 49 |
+
- `clip_generation`
|
| 50 |
+
- `total`
|
| 51 |
+
|
| 52 |
+
For the presentation, show the API timing JSON and the finished clips side by side.
|
scripts/benchmark.py
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import json
|
| 3 |
+
import time
|
| 4 |
+
import urllib.error
|
| 5 |
+
import urllib.request
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def request_json(url: str, method: str = "GET", payload: dict | None = None) -> dict:
|
| 9 |
+
body = None
|
| 10 |
+
headers = {}
|
| 11 |
+
if payload is not None:
|
| 12 |
+
body = json.dumps(payload).encode("utf-8")
|
| 13 |
+
headers["Content-Type"] = "application/json"
|
| 14 |
+
request = urllib.request.Request(url, data=body, headers=headers, method=method)
|
| 15 |
+
try:
|
| 16 |
+
with urllib.request.urlopen(request, timeout=30) as response:
|
| 17 |
+
return json.loads(response.read().decode("utf-8"))
|
| 18 |
+
except urllib.error.HTTPError as exc:
|
| 19 |
+
detail = exc.read().decode("utf-8")
|
| 20 |
+
raise RuntimeError(f"{exc.code}: {detail}") from exc
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def main() -> None:
|
| 24 |
+
parser = argparse.ArgumentParser(description="Run an AI Clip Studio API benchmark.")
|
| 25 |
+
parser.add_argument("--api", default="http://localhost:8000")
|
| 26 |
+
parser.add_argument("--youtube-url", required=True)
|
| 27 |
+
parser.add_argument("--language", default="Thai")
|
| 28 |
+
parser.add_argument("--style", default="informative")
|
| 29 |
+
parser.add_argument("--niche", default="education")
|
| 30 |
+
parser.add_argument("--clip-length", type=int, default=60)
|
| 31 |
+
args = parser.parse_args()
|
| 32 |
+
|
| 33 |
+
payload = {
|
| 34 |
+
"youtube_url": args.youtube_url,
|
| 35 |
+
"profile": {
|
| 36 |
+
"niche": args.niche,
|
| 37 |
+
"clip_style": args.style,
|
| 38 |
+
"clip_length_seconds": args.clip_length,
|
| 39 |
+
"primary_language": args.language,
|
| 40 |
+
"target_platform": "tiktok",
|
| 41 |
+
},
|
| 42 |
+
}
|
| 43 |
+
started = time.perf_counter()
|
| 44 |
+
job = request_json(f"{args.api}/api/jobs/youtube", "POST", payload)
|
| 45 |
+
while job["status"] in {"queued", "running"}:
|
| 46 |
+
time.sleep(2)
|
| 47 |
+
job = request_json(f"{args.api}/api/jobs/{job['id']}")
|
| 48 |
+
|
| 49 |
+
elapsed = round(time.perf_counter() - started, 3)
|
| 50 |
+
print(
|
| 51 |
+
json.dumps(
|
| 52 |
+
{
|
| 53 |
+
"job_id": job["id"],
|
| 54 |
+
"status": job["status"],
|
| 55 |
+
"elapsed_wall_seconds": elapsed,
|
| 56 |
+
"clips": len(job.get("clips", [])),
|
| 57 |
+
"timings": job.get("timings", {}),
|
| 58 |
+
"error": job.get("error"),
|
| 59 |
+
},
|
| 60 |
+
indent=2,
|
| 61 |
+
)
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
if __name__ == "__main__":
|
| 66 |
+
main()
|