Qwen3-Embedding-8B — MLX fp16

Qwen/Qwen3-Embedding-8B converted to MLX format in float16 precision for native Apple Silicon inference.

Model Details

Property	Value
Base model	Qwen/Qwen3-Embedding-8B
Parameters	8B
Architecture	Qwen3 (decoder-based)
Precision	float16
Model size	~14 GB
Embedding dimensions	4096 (supports MRL: 32–4096)
Max context length	32,768 tokens
Languages	100+
Pooling	Last-token
Converted with	mlx-embeddings v0.1.0

Usage

pip install mlx-embeddings

from mlx_embeddings import load, generate
import mlx.core as mx

model, tokenizer = load("bsisduck/Qwen3-Embedding-8B-fp16-mlx")

queries = [
    "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is MLX?"
]
documents = [
    "MLX is Apple's array framework for machine learning on Apple Silicon.",
    "Python is a programming language.",
]

query_embeds = generate(model, tokenizer, texts=queries).text_embeds
doc_embeds = generate(model, tokenizer, texts=documents).text_embeds

# Cosine similarity (embeddings are L2-normalized)
scores = mx.matmul(query_embeds, doc_embeds.T)
print(scores)

Verified Results

Tested on Apple M2 Max (32 GB):

Query-document retrieval (query: "What is Apple MLX framework?"):

Document	Score
"MLX is an array framework for ML on Apple silicon."	0.844
"Python is a popular programming language..."	0.260
"To make banana bread, mix ripe bananas..."	0.183
"The Eiffel Tower is located in Paris..."	0.091

Multilingual (query: "Machine learning is transforming healthcare"):

Language	Score
Polish	0.831
German	0.821
French	0.804
Unrelated (EN)	0.443

Performance: Load time ~10s, inference <1s for batches of 4–6 texts.

Hardware Requirements

Apple Silicon Mac (M1/M2/M3/M4)
~16 GB unified memory

Limitations

This is a format conversion (bf16 to fp16 MLX), not a fine-tune. Accuracy differences vs. the original are due to fp16 precision only.
See the original model card for full limitations, biases, and ethical considerations.

References

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

@article{qwen3embedding,
  title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
  author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
  journal={arXiv preprint arXiv:2506.05176},
  year={2025}
}

Downloads last month: 273

Safetensors

Model size

8B params

Tensor type

F16

MLX

Hardware compatibility

Quantized

Model tree for bsisduck/Qwen3-Embedding-8B-fp16-mlx

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Quantized

(31)

this model

Paper for bsisduck/Qwen3-Embedding-8B-fp16-mlx

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 82