Jina v5 Nano ONNX for Vespa
This is a Jina v5 nano embedding model in ONNX format, optimized for use with Vespa database.
Model Details
| Property | Value |
|---|---|
| Architecture | Jina v5 nano embedding model |
| Embedding Dimension | 768 |
| Input | Token IDs (int64) + Attention Mask |
| Output | last_hidden_state (needs pooling) |
Files
| File | Description |
|---|---|
model.onnx |
ONNX model |
tokenizer.json |
HuggingFace tokenizer |
special_tokens_map.json |
Special tokens mapping |
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import PreTrainedTokenizerFast
import numpy as np
# Load ONNX model
session = ort.InferenceSession('model.onnx')
# Load tokenizer
tokenizer = PreTrainedTokenizerFast(tokenizer_file='tokenizer.json')
# Generate embedding
text = 'Your search query'
inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors='np')
outputs = session.run(None, {
'input_ids': inputs['input_ids'],
'attention_mask': inputs['attention_mask']
})
# Mean pooling
hidden = outputs[0]
mask = inputs['attention_mask']
mask_expanded = np.expand_dims(mask, -1)
pooled = np.sum(hidden * mask_expanded, axis=1) / np.clip(mask_expanded.sum(axis=1), a_min=1e-9)
print(pooled.shape) # (1, 768)
Requirements
- onnx
- onnxruntime
- transformers
- numpy
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support