Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 82
Qwen/Qwen3-Embedding-8B converted to MLX format in float16 precision for native Apple Silicon inference.
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-Embedding-8B |
| Parameters | 8B |
| Architecture | Qwen3 (decoder-based) |
| Precision | float16 |
| Model size | ~14 GB |
| Embedding dimensions | 4096 (supports MRL: 32–4096) |
| Max context length | 32,768 tokens |
| Languages | 100+ |
| Pooling | Last-token |
| Converted with | mlx-embeddings v0.1.0 |
pip install mlx-embeddings
from mlx_embeddings import load, generate
import mlx.core as mx
model, tokenizer = load("bsisduck/Qwen3-Embedding-8B-fp16-mlx")
queries = [
"Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is MLX?"
]
documents = [
"MLX is Apple's array framework for machine learning on Apple Silicon.",
"Python is a programming language.",
]
query_embeds = generate(model, tokenizer, texts=queries).text_embeds
doc_embeds = generate(model, tokenizer, texts=documents).text_embeds
# Cosine similarity (embeddings are L2-normalized)
scores = mx.matmul(query_embeds, doc_embeds.T)
print(scores)
Tested on Apple M2 Max (32 GB):
Query-document retrieval (query: "What is Apple MLX framework?"):
| Document | Score |
|---|---|
| "MLX is an array framework for ML on Apple silicon." | 0.844 |
| "Python is a popular programming language..." | 0.260 |
| "To make banana bread, mix ripe bananas..." | 0.183 |
| "The Eiffel Tower is located in Paris..." | 0.091 |
Multilingual (query: "Machine learning is transforming healthcare"):
| Language | Score |
|---|---|
| Polish | 0.831 |
| German | 0.821 |
| French | 0.804 |
| Unrelated (EN) | 0.443 |
Performance: Load time ~10s, inference <1s for batches of 4–6 texts.
@article{qwen3embedding,
title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
journal={arXiv preprint arXiv:2506.05176},
year={2025}
}
Quantized