jinaai/jina-embeddings-v4 · The official jina embedding API is very slow

The official jina embedding API is very slow

#80

by yahc - opened Jan 6

Jan 6

The official rate is at 2,000 RPM & 5,000,000 TPM, but I found that at around 100-300 RPM, each response time is very long, possibly reaching 10-20s

zac-li

Jina AI org Jan 7

Thanks for the details — that’s helpful. A quick note: jina-embeddings-v4 is currently in a research/experimental mode on our side, so it isn’t provisioned with the same dedicated capacity as our production-tier models. Because of that, latency can be variable and may spike during load or contention, even if the documented RPM/TPM limits are higher.

To help us pinpoint what you’re seeing, could you clarify how you’re measuring “100–300 RPM”?

Measurement window: is that per-second burst converted to RPM, a 1-minute rolling average, or an average over a longer run?
Concurrency: how many requests are in-flight at once (e.g., 1, 10, 50)?
Input shape: average input length (chars/tokens) and batch size per request

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment