SapBERT INT8 β ONNX Quantized
ONNX INT8 quantized version of cambridgeltl/SapBERT-from-PubMedBERT-fulltext for efficient biomedical entity embeddings.
Model Details
| Property | Value |
|---|---|
| Base Model | cambridgeltl/SapBERT-from-PubMedBERT-fulltext |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | JustEmbed |
What is this?
This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings.
SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning.
Use Cases
- Medical entity linking
- Biomedical concept matching
- Clinical terminology normalization
- Drug name standardization
- Disease concept mapping
Files
model_quantized.onnxβ INT8 quantized ONNX modeltokenizer.jsonβ Fast tokenizerconfig.jsonβ Model configuration
Usage with JustEmbed
from justembed import Embedder
embedder = Embedder("sapbert-int8")
vectors = embedder.embed(["aspirin", "acetylsalicylic acid"])
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")
inputs = tokenizer("aspirin", return_tensors="np")
outputs = session.run(None, dict(inputs))
Quantization Details
- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Accuracy: ~95%+ of FP32 performance on biomedical benchmarks
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32
License
This model is a derivative work of cambridgeltl/SapBERT-from-PubMedBERT-fulltext.
The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.
Citation
@inproceedings{liu2021self,
title={Self-Alignment Pretraining for Biomedical Entity Representations},
author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
booktitle={Proceedings of NAACL},
year={2021}
}
Acknowledgments
- Original model by the Cambridge Language Technology Lab
- Quantization and packaging by JustEmbed
- Downloads last month
- 21