cross-encoder/ms-marco-TinyBERT-L-2-v2 - LiteRT Optimized
This is a LiteRT (formerly TensorFlow Lite) export of cross-encoder/ms-marco-TinyBERT-L-2-v2.
It is optimized for mobile and edge inference (Android/iOS/Embedded).
Model Details
| Attribute | Value |
|---|---|
| Task | Ultra-Fast Reranking |
| Format | .tflite (Float32) |
| File Size | 16.8 MB |
| Input Length | 512 tokens |
| Output Dim | 1 |
Usage
import numpy as np
from ai_edge_litert.interpreter import Interpreter
from transformers import AutoTokenizer
model_path = "cross-encoder_ms-marco-TinyBERT-L-2-v2.tflite"
interpreter = Interpreter(model_path=model_path)
interpreter.allocate_tensors()
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-TinyBERT-L-2-v2")
def compute_score(query, doc):
# Tokenize Pair: [CLS] query [SEP] doc [SEP]
inputs = tokenizer(query, doc, max_length=512, padding="max_length", truncation=True, return_tensors="np")
input_details = interpreter.get_input_details()
interpreter.set_tensor(input_details[0]['index'], inputs['input_ids'].astype(np.int64))
interpreter.set_tensor(input_details[1]['index'], inputs['attention_mask'].astype(np.int64))
interpreter.invoke()
# Output is a single score (logit)
output_details = interpreter.get_output_details()
score = interpreter.get_tensor(output_details[0]['index'])[0][0]
return score
score = compute_score("What is python?", "Python is a programming language.")
print(f"Relevance Score: {score}")
Converted by Bombek1 using litert-torch
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Bombek1/ms-marco-TinyBERT-L-2-v2-litert
Base model
nreimers/BERT-Tiny_L-2_H-128_A-2 Quantized
cross-encoder/ms-marco-TinyBERT-L2-v2