The official jina embedding API is very slow
#80
by yahc - opened
The official rate is at 2,000 RPM & 5,000,000 TPM, but I found that at around 100-300 RPM, each response time is very long, possibly reaching 10-20s
Thanks for the details — that’s helpful. A quick note: jina-embeddings-v4 is currently in a research/experimental mode on our side, so it isn’t provisioned with the same dedicated capacity as our production-tier models. Because of that, latency can be variable and may spike during load or contention, even if the documented RPM/TPM limits are higher.
To help us pinpoint what you’re seeing, could you clarify how you’re measuring “100–300 RPM”?
- Measurement window: is that per-second burst converted to RPM, a 1-minute rolling average, or an average over a longer run?
- Concurrency: how many requests are in-flight at once (e.g., 1, 10, 50)?
- Input shape: average input length (chars/tokens) and batch size per request