Jina v5 Nano ONNX for Vespa

This is a Jina v5 nano embedding model in ONNX format, optimized for use with Vespa database.

Model Details

Property Value
Architecture Jina v5 nano embedding model
Embedding Dimension 768
Input Token IDs (int64) + Attention Mask
Output last_hidden_state (needs pooling)

Files

File Description
model.onnx ONNX model
tokenizer.json HuggingFace tokenizer
special_tokens_map.json Special tokens mapping

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import PreTrainedTokenizerFast
import numpy as np

# Load ONNX model
session = ort.InferenceSession('model.onnx')

# Load tokenizer
tokenizer = PreTrainedTokenizerFast(tokenizer_file='tokenizer.json')

# Generate embedding
text = 'Your search query'
inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors='np')

outputs = session.run(None, {
    'input_ids': inputs['input_ids'],
    'attention_mask': inputs['attention_mask']
})

# Mean pooling
hidden = outputs[0]
mask = inputs['attention_mask']
mask_expanded = np.expand_dims(mask, -1)
pooled = np.sum(hidden * mask_expanded, axis=1) / np.clip(mask_expanded.sum(axis=1), a_min=1e-9)

print(pooled.shape)  # (1, 768)

Requirements

  • onnx
  • onnxruntime
  • transformers
  • numpy
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support