Upload folder using huggingface_hub

653498a verified 28 days ago

2.91 kB

license: apache-2.0
tags:
  - onnx
  - int8
  - quantized
  - scientific
  - embeddings
  - justembed
base_model: allenai/scibert_scivocab_uncased
library_name: onnxruntime
pipeline_tag: feature-extraction

SciBERT INT8 — ONNX Quantized

ONNX INT8 quantized version of allenai/scibert_scivocab_uncased for efficient scientific text embeddings.

Model Details

Property	Value
Base Model	allenai/scibert_scivocab_uncased
Format	ONNX
Quantization	INT8 (dynamic quantization)
Embedding Dimension	768
Quantized by	JustEmbed

What is this?

This is a quantized ONNX export of SciBERT, a BERT model trained on a large corpus of scientific text (1.14M papers, 3.1B tokens from Semantic Scholar) by the Allen Institute for AI. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for scientific domain embeddings.

Use Cases

Scientific paper search and retrieval
Research document similarity
Academic text classification
Scientific entity recognition embeddings
Citation recommendation

Files

model_quantized.onnx — INT8 quantized ONNX model
tokenizer.json — Fast tokenizer
vocab.txt — Scientific vocabulary
config.json — Model configuration

Usage with JustEmbed

from justembed import Embedder

embedder = Embedder("scibert-int8")
vectors = embedder.embed(["neural network architectures for NLP"])

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")

inputs = tokenizer("neural network architectures for NLP", return_tensors="np")
outputs = session.run(None, dict(inputs))

Quantization Details

Method: Dynamic INT8 quantization via ONNX Runtime
Source: Original PyTorch weights converted to ONNX, then quantized
Speed: ~2-3x faster inference than FP32
Size: ~4x smaller than FP32

License

This model is a derivative work of allenai/scibert_scivocab_uncased.

The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.

Citation

@inproceedings{beltagy2019scibert,
  title={SciBERT: A Pretrained Language Model for Scientific Text},
  author={Beltagy, Iz and Lo, Kyle and Cohan, Arman},
  booktitle={Proceedings of EMNLP},
  year={2019}
}

Acknowledgments

Original model by Allen Institute for AI
Quantization and packaging by JustEmbed