trust_remote_code gets passed twice when using:

model = SentenceTransformer(
    "jinaai/jina-embeddings-v5-text-nano",
    trust_remote_code=True
)

Giving an error.

Hello!

This indeed works:

from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "jinaai/jina-embeddings-v5-text-nano",
    trust_remote_code=True,
    model_kwargs={"dtype": torch.bfloat16},  # Recommended for GPUs
    revision="refs/pr/11",
)

query_embeddings = model.encode(
    sentences=["Overview of climate change impacts on coastal cities"],
    task="retrieval",
    prompt_name="query",
)
document_embeddings = model.encode(
    sentences=[
        "Climate change has led to rising sea levels, increased frequency of extreme weather events..."
    ],
    task="retrieval",
    prompt_name="document",
)

similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.5529]])

I would recommend merging this, as excluding the revision from this script and using the model from main fails with TypeError: transformers.models.auto.tokenization_auto.AutoTokenizer.from_pretrained() got multiple values for keyword argument 'trust_remote_code'. See also https://github.com/huggingface/sentence-transformers/issues/3717

  • Tom Aarsen
Jina AI org

Thanks for the fix! I think jina-embeddings-v5-text-small has the same issue. Feel free to open a PR there too, otherwise I'll push the fix shortly.

jupyterjazz changed pull request status to merged

Thanks for reporting! I applied to fix to the small version as well: https://huggingface.co/jinaai/jina-embeddings-v5-text-small/discussions/19

Sign up or log in to comment