German RoBERTa Sentence Transformer V2 (ONNX)

ONNX export of T-Systems-onsite/german-roberta-sentence-transformer-v2.

Model Details

Note from original authors

The new T-Systems-onsite/cross-en-de-roberta-sentence-transformer model is slightly better for German language. It is also the current best model for English language and works cross-lingually.

Files

File Description Size
model.onnx Full precision (FP32) ONNX model ~1.1 GB
tokenizer.json Fast tokenizer (converted from SentencePiece) ~17 MB

Usage with ONNX Runtime

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("vespa-engine/german-roberta-sentence-transformer-v2-ONNX")

texts = ["Berlin ist die Hauptstadt von Deutschland."]
encoded = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="np")

input_names = {inp.name for inp in session.get_inputs()}
feeds = {k: v for k, v in encoded.items() if k in input_names}
outputs = session.run(None, feeds)

# Mean pooling + normalize
mask = np.expand_dims(encoded["attention_mask"], axis=-1)
embeddings = np.sum(outputs[0] * mask, axis=1) / mask.sum(axis=1)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)

Usage with Vespa

Add the following to your services.xml:

<component id="german-roberta" type="hugging-face-embedder">
    <transformer-model url="https://huggingface.co/vespa-engine/german-roberta-sentence-transformer-v2-ONNX/resolve/main/model.onnx"/>
    <tokenizer-model url="https://huggingface.co/vespa-engine/german-roberta-sentence-transformer-v2-ONNX/resolve/main/tokenizer.json"/>
    <max-tokens>512</max-tokens>
    <pooling-strategy>mean</pooling-strategy>
</component>

Tensor type for schema fields:

tensor<float>(x[768])

See Vespa Model Hub for more details.

Conversion

Converted from the original PyTorch model using Optimum and the conversion script included in this repository (convert.py).

The SentencePiece tokenizer was converted to a fast tokenizer (tokenizer.json) using XLMRobertaTokenizerFast.

To reproduce:

# Requires uv (https://docs.astral.sh/uv/)
uv run convert.py

Attribution

This model was converted by Vespa.ai from the original model by T-Systems on site services GmbH. All credit for the original model goes to the original authors.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vespa-engine/german-roberta-sentence-transformer-v2-ONNX