Update custom_st.py
#11
by mohamed99akram - opened
trust_remote_code gets passed twice when using:
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano",
trust_remote_code=True
)
Giving an error.
Hello!
This indeed works:
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano",
trust_remote_code=True,
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
revision="refs/pr/11",
)
query_embeddings = model.encode(
sentences=["Overview of climate change impacts on coastal cities"],
task="retrieval",
prompt_name="query",
)
document_embeddings = model.encode(
sentences=[
"Climate change has led to rising sea levels, increased frequency of extreme weather events..."
],
task="retrieval",
prompt_name="document",
)
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.5529]])
I would recommend merging this, as excluding the revision from this script and using the model from main fails with TypeError: transformers.models.auto.tokenization_auto.AutoTokenizer.from_pretrained() got multiple values for keyword argument 'trust_remote_code'. See also https://github.com/huggingface/sentence-transformers/issues/3717
- Tom Aarsen
Thanks for the fix! I think jina-embeddings-v5-text-small has the same issue. Feel free to open a PR there too, otherwise I'll push the fix shortly.
jupyterjazz changed pull request status to merged
Thanks for reporting! I applied to fix to the small version as well: https://huggingface.co/jinaai/jina-embeddings-v5-text-small/discussions/19