metadata
license: apache-2.0
tags:
- onnx
- int8
- quantized
- sentence-similarity
- embeddings
- justembed
base_model: sentence-transformers/all-mpnet-base-v2
library_name: onnxruntime
pipeline_tag: feature-extraction
MPNet INT8 — ONNX Quantized
ONNX INT8 quantized version of sentence-transformers/all-mpnet-base-v2 for efficient general-purpose sentence embeddings.
Model Details
| Property | Value |
|---|---|
| Base Model | sentence-transformers/all-mpnet-base-v2 |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | JustEmbed |
What is this?
This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy.
Use Cases
- Semantic text search
- Sentence similarity
- Clustering and topic modeling
- Paraphrase detection
- General-purpose text embeddings
Files
model_quantized.onnx— INT8 quantized ONNX modeltokenizer.json— Fast tokenizervocab.txt— Vocabulary fileconfig.json— Model configuration
Usage with JustEmbed
from justembed import Embedder
embedder = Embedder("mpnet-int8")
vectors = embedder.embed(["This is a sentence", "This is another sentence"])
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")
inputs = tokenizer("This is a sentence", return_tensors="np")
outputs = session.run(None, dict(inputs))
Quantization Details
- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32
License
This model is a derivative work of sentence-transformers/all-mpnet-base-v2.
The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.
Citation
@inproceedings{song2020mpnet,
title={MPNet: Masked and Permuted Pre-training for Language Understanding},
author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
booktitle={NeurIPS},
year={2020}
}
Acknowledgments
- Original model by UKP Lab / sentence-transformers
- Quantization and packaging by JustEmbed