Instructions to use soichisumi/qwen3-reranker-4b-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use soichisumi/qwen3-reranker-4b-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen3-reranker-4b-mlx-4bit soichisumi/qwen3-reranker-4b-mlx-4bit
- sentence-transformers
How to use soichisumi/qwen3-reranker-4b-mlx-4bit with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("soichisumi/qwen3-reranker-4b-mlx-4bit") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3-Reranker-4B — MLX (4bit)
Qwen/Qwen3-Reranker-4B の MLX 変換 + affine 4bit 量子化 (group_size=64)。約 2.1 GB。
- 変換:
mlx_lm convert --hf-path Qwen/Qwen3-Reranker-4B --mlx-path . --quantize --q-mode affine --q-bits 4 --q-group-size 64 - 評価:
mteb/scidocs-reranking30 クエリ / 897 ペア - 結果: Kendall τ = 0.8894, nDCG@10 Δ = −0.0045 (BORDER 判定)
- 同シリーズの 4B affine8 (τ = 0.9867、約 4 GB) のほうが品質的には推奨。本 4bit はディスク約半分まで圧縮したい場合のサイズ優先オプション
- Downloads last month
- 107
Model size
0.6B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support