German RoBERTa Sentence Transformer V2 (ONNX)
ONNX export of T-Systems-onsite/german-roberta-sentence-transformer-v2.
Model Details
- Base model: T-Systems-onsite/german-roberta-sentence-transformer-v2
- Architecture: XLM-RoBERTa
- Language: German
- Dimensions: 768
- Max tokens: 512
- Pooling: Mean
- License: MIT (same as original)
Note from original authors
The new T-Systems-onsite/cross-en-de-roberta-sentence-transformer model is slightly better for German language. It is also the current best model for English language and works cross-lingually.
Files
| File | Description | Size |
|---|---|---|
model.onnx |
Full precision (FP32) ONNX model | ~1.1 GB |
tokenizer.json |
Fast tokenizer (converted from SentencePiece) | ~17 MB |
Usage with ONNX Runtime
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("vespa-engine/german-roberta-sentence-transformer-v2-ONNX")
texts = ["Berlin ist die Hauptstadt von Deutschland."]
encoded = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="np")
input_names = {inp.name for inp in session.get_inputs()}
feeds = {k: v for k, v in encoded.items() if k in input_names}
outputs = session.run(None, feeds)
# Mean pooling + normalize
mask = np.expand_dims(encoded["attention_mask"], axis=-1)
embeddings = np.sum(outputs[0] * mask, axis=1) / mask.sum(axis=1)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
Usage with Vespa
Add the following to your services.xml:
<component id="german-roberta" type="hugging-face-embedder">
<transformer-model url="https://huggingface.co/vespa-engine/german-roberta-sentence-transformer-v2-ONNX/resolve/main/model.onnx"/>
<tokenizer-model url="https://huggingface.co/vespa-engine/german-roberta-sentence-transformer-v2-ONNX/resolve/main/tokenizer.json"/>
<max-tokens>512</max-tokens>
<pooling-strategy>mean</pooling-strategy>
</component>
Tensor type for schema fields:
tensor<float>(x[768])
See Vespa Model Hub for more details.
Conversion
Converted from the original PyTorch model using Optimum and the conversion script included in this repository (convert.py).
The SentencePiece tokenizer was converted to a fast tokenizer (tokenizer.json) using XLMRobertaTokenizerFast.
To reproduce:
# Requires uv (https://docs.astral.sh/uv/)
uv run convert.py
Attribution
This model was converted by Vespa.ai from the original model by T-Systems on site services GmbH. All credit for the original model goes to the original authors.
- Downloads last month
- 6