pgvector vs Elasticsearch vs Qdrant vs Pinecone vs Weaviate: A 14-Case Benchmark

Community Article Published April 11, 2026

vecor databases are the backbone of modern RAG, semantic search, and recommendation systems. But when you're choosing one, the marketing pages all say the same thing: "blazing fast, infinitely scalable, production-ready." Useful.

So I built a controlled, reproducible benchmark that compares five of the most popular vector databases - pgvector, Elasticsearch, Qdrant, Pinecone, and Weaviate - across 14 test cases covering the full lifecycle: ingestion, semantic search, filtered search, hybrid search, filter-only queries, batch throughput, concurrency, scaling, top-K sensitivity, mutations, and connection pooling.

The full source, reports, and reproduction instructions are open-source on GitHub: Imran-ml/vector-db-benchmark.

This post walks through the methodology, the headline results, and a few findings that genuinely surprised me.

Why another vector DB benchmark?

Most public benchmarks fall into one of two traps:

Vendor-run benchmarks that (surprise) conclude the vendor's own product wins.
Synthetic micro-benchmarks that measure a single dimension - usually raw ANN latency on a random vector dataset - and ignore the stuff that matters in production: filtering, hybrid search, mutation cost, concurrent load, end-to-end ingestion pipelines.

I wanted something that looked more like a real workload: a real dataset (10,000 Amazon products), real sentence-transformer embeddings (all-MiniLM-L6-v2, 384-dim), and the full lifecycle from raw CSV to live queries.

The setup

Component	Value
Dataset	10,000 Amazon products (title, categories, prices, ratings)
Embedding model	`sentence-transformers/all-MiniLM-L6-v2` (384-dim)
Test system	54 cores · 96 GB DDR5 · 2× NVIDIA RTX 3090 (24 GB each)
Local engines	pgvector (HNSW, m=16, ef=200), Elasticsearch (dense_vector HNSW)
Cloud engines	Qdrant Cloud, Pinecone Serverless, Weaviate Cloud (all free/starter tiers)
Default top-K	5 (swept across 5 / 10 / 25 / 50)
Concurrency	1 / 5 / 10 users

Two critical methodology choices:

Local vs Cloud are judged separately. Local engines run in Docker with ~0 ms network latency; cloud engines add 50–500 ms of RTT. Comparing them head-to-head on raw latency would be dishonest - so I rank winners within each tier.
Each query is embedded once and the same vector is reused across every engine. This eliminates encoder noise and ensures we're measuring the database, not the model.

The headline result

pgvector wins 7/7 local categories. Qdrant wins 6/7 cloud categories.

Database	Tier	Ingest (rows/s)	Semantic p50	Peak QPS (10 users)
pgvector 🏆	Local	1,943.7	5.71 ms	1,212.3
Elasticsearch	Local	1,307.7	9.12 ms	983.7
Qdrant 🏆	Cloud	1,825.2	14.30 ms	80.4
Weaviate	Cloud	1,574.6	44.38 ms	85.8
Pinecone	Cloud	256.6	300.23 ms	40.3

Let's break that down.

Finding 1: pgvector is absurdly competitive

I expected pgvector to be "good enough" - a pragmatic choice if you already run Postgres. What I found is that on a 10k-row dataset with a properly tuned HNSW index, pgvector beats Elasticsearch on every single local category:

5.71 ms p50 semantic search (vs 9.12 ms)
1,943 rows/s ingest (vs 1,307)
1,212 QPS under 10 concurrent users with zero errors (vs 983)
5.54 ms metadata updates (vs 20.49 ms - nearly 4× faster)

The lesson: if you're already running Postgres and your dataset fits in memory, you might not need a dedicated vector database at all. That's a significant architectural simplification.

Finding 2: Pinecone's network latency is brutal

Pinecone's p50 semantic search clocks in at 300 ms - roughly 50× slower than Qdrant on the same tier. Most of that is network round-trip to us-east-1 from my test location, not Pinecone's indexing. But here's the thing: that RTT is what your users actually experience.

If your app lives in the same region, Pinecone's numbers look very different. If it doesn't, Pinecone's serverless tier is a rough fit for latency-sensitive workloads.

Finding 3: Retrieval quality is not what you'd guess from latency

Because latency comparisons across tiers are meaningless, I measured top-K name overlap between engines on identical queries - a network-independent proxy for recall agreement.

Query type	pgvector ↔ Elasticsearch overlap
Semantic search	86%
Filtered search	90%
Hybrid search (vector + BM25)	70%

Pure semantic and filtered search agree on 86–90% of top-K results between the two engines - strong consensus. Hybrid search drops to 70%, which makes sense: BM25 scoring and fusion strategies differ meaningfully between engines, and that's where you'll see the biggest relevance divergence.

Practical takeaway: if you're evaluating engines on recall, don't trust semantic-search overlap as a proxy for hybrid-search overlap. Test each mode independently.

Finding 4: PgBouncer hurts pgvector under vector-search load

I ran a dedicated Case N to test a common production question: does connection pooling via PgBouncer help pgvector under concurrent vector-search load?

Concurrency	Direct QPS	PgBouncer QPS
5 users	884.3	512.0
10 users	1,154.6	545.4
20 users	1,229.8	563.2

PgBouncer cut throughput roughly in half at every concurrency level tested.

Why? Pooling only helps when connection-setup cost dominates the query. Vector search queries are short and bursty - the pooling overhead exceeds the savings. If you're pooling pgvector for vector search specifically, you might be paying for nothing.

(Pooling is still worth it for the rest of your OLTP workload - this is a narrow finding about vector queries.)

Finding 5: HNSW top-K scaling is where pgvector really shines

I swept top-K from 5 → 50 and watched how latency grew:

K	pgvector avg	Elasticsearch avg
5	5.67 ms	4.99 ms
10	5.85 ms	6.08 ms
25	5.95 ms	9.11 ms
50	6.25 ms	14.84 ms

pgvector stays nearly flat across K=5 → K=50 (10% growth). Elasticsearch grows ~3× over the same range. At K=50, pgvector is 2.4× faster.

For reranking workloads that retrieve K=50–100 candidates before a cross-encoder, this difference compounds across every query.

What I didn't test (yet)

Honest caveats matter in benchmarks. Here's what this run does not cover:

Larger datasets. 10k rows fits entirely in RAM for every engine. Recall and latency characteristics can change significantly at 10M+ rows.
Quantization. All engines ran with full-precision float32 vectors. Qdrant, Pinecone, and Weaviate all support quantization that would change the latency/memory tradeoff.
Multi-tenant isolation. Cloud engines ran on free tiers with shared resources - paid tiers will behave differently.
Geographic parity. Cloud engines were in different regions (Qdrant AWS eu-west-2, Pinecone AWS us-east-1, Weaviate GCP europe-west3), so cloud-vs-cloud comparisons are imperfect.

The harness is designed so you can change any of these with a config edit and re-run. That's the whole point of making it open-source.

Reproducing the benchmark

Everything is on GitHub and runs with either Poetry or Docker Compose:

git clone https://github.com/Imran-ml/vector-db-benchmark
cd vector-db-benchmark
cp .env.example .env  # add your cloud API keys (optional)
docker compose up --build

Then hit POST /api/benchmark/start and watch the progress in the web UI at http://localhost:3000. Reports are generated as Markdown + JSON in reports/.

Swap in your own data: drop a CSV into data/raw/ and update the dataset row cap in the config. Want to benchmark a different embedding model? Change the model name in the config - dimensions and index schemas are picked up automatically.

Conclusion

If you take only three things away from this:

pgvector is the pragmatic default for small-to-medium datasets and teams already running Postgres. It beats Elasticsearch on every category I measured and simplifies your stack.
Qdrant is the cloud winner across nearly every category, with strong concurrency headroom and the lowest mutation latencies in the cloud tier.
Always benchmark on your own workload. Public benchmarks (including this one) are starting points, not verdicts. Your data distribution, query patterns, and infra location will change the picture.

The full 14-case report - including case-by-case tables, concurrency sweeps, mutation latencies, and the retrieval quality analysis - is in the repository README and in reports/.

If you find this useful, a ⭐ on GitHub helps other people discover it. If you run the benchmark on different hardware or a different dataset and get different results, open an issue - I'd love to see them.

About the author: I'm Muhammad Imran Zaman, a machine learning engineer focused on applied ML systems. Find me on LinkedIn, Kaggle, or Google Scholar.

Run GPT-OSS model locally with Docker!

August 6, 2025

LLaMA 4 Fine-Tuning with Mental Health Counseling Data

April 14, 2025

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote