Please add usage scenario examples
Hello.
Thank you for sharing new great model weights. Your previous work is a piece of art, I use jina code embedding model and happy with it's performance, especially that I'm able to perform two-stage reranking, using short matrioshka vectors as coarse filter and full vectors afterwards.
Please add some usage examples, e.g. full prompts for each of the newly added models. Do they expect some specific tokens, like "Document: " at some specific positions? It would be great to see full flows for clustering, retrieval, classification and matching to feel exact usecase.
Thank you. Best regards.
If you scroll down in the README to Usage there are already short examples which explain how to use the model, e.g., using transformers and sentences-transformers if you want to use them locally (You just need to expand the subsections).
The prompts are added via the prompt name argument (either "query" or "document"). Based on those either "Query: " or "Document: " is prepended to the texts. For retrieval "query" is used for queries. In every other case and for every other task the prompt name should always be "document".
One thing which you need to keep in mind that this model requires installing the peft package
Is this what you are searching for?
Here one of the examples on the README (for transformers):
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained(
"jinaai/jina-embeddings-v5-text-small",
trust_remote_code=True,
_attn_implementation="flash_attention_2", # Recommended but optional
dtype=torch.bfloat16, # Recommended for GPUs
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device=device)
# Optional: set truncate_dim and max_length in encode() to control embedding size and input length
# ========================
# 1. Retrieval Task
# ========================
# Encode query
query_embeddings = model.encode(
texts=["Overview of climate change impacts on coastal cities"],
task="retrieval",
prompt_name="query",
)
# Encode document
document_embeddings = model.encode(
texts=[
"Climate change has led to rising sea levels, increased frequency of extreme weather events..."
],
task="retrieval",
prompt_name="document",
)
# ========================
# 2. Text Matching Task
# ========================
texts = [
"غروب جميل على الشاطئ", # Arabic
"海滩上美丽的日落", # Chinese
"Un beau coucher de soleil sur la plage", # French
"Ein wunderschöner Sonnenuntergang am Strand", # German
"Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
"समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
"Un bellissimo tramonto sulla spiaggia", # Italian
"浜辺に沈む美しい夕日", # Japanese
"해변 위로 아름다운 일몰", # Korean
]
text_embeddings = model.encode(texts=texts, task="text-matching")
# ========================
# 3. Classification Task
# ========================
texts = [
"My order hasn't arrived yet and it's been two weeks.",
"How do I reset my password?",
"I'd like a refund for my recent purchase.",
"Your product exceeded my expectations. Great job!",
]
classification_embeddings = model.encode(texts=texts, task="classification")
# ========================
# 4. Clustering Task
# ========================
texts = [
"We propose a novel neural network architecture for image segmentation.",
"This paper analyzes the effects of monetary policy on inflation.",
"Our method achieves state-of-the-art results on object detection benchmarks.",
"We study the relationship between interest rates and housing prices.",
"A new attention mechanism is introduced for visual recognition tasks.",
]
clustering_embeddings = model.encode(texts=texts, task="clustering")
Hello @michael-guenther , thank you for recommendation.
This is almost all I need. I think I'll be able to parse transformers source codes to understand how to construct prompts.
I plan to use a GGUF weights with llama.cpp server with --pooling last argument and /embedding endpoint.
Here is how I used it with jina-code-embeddings:
./llama-server --embedding -m ~/jina-code-embeddings-0.5b-IQ4_NL.gguf --host 0.0.0.0 --port 8082 --ctx-size 32768 --ubatch-size 8192 --pooling last -a "jina-code-emb" --api-key "my-secret"
Usage example:
curl -X POST "http://localhost:8082/embedding" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret" \
-d '{
"content": "Candidate code snippet:\nprint(\"Hello world\")"
}'
# or
curl -X POST "http://localhost:8082/embedding" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret" \
-d '{
"content": "Find the most relevant code snippet given the following query:\ngreetings in Python"
}'
It's a bit cryptic now, how to do same with new models, because actual prompts are behind transformers abstractions.
Under the "llama.cpp (GGUF)" section you can find links to model versions with GGUF exports that you can use, e.g., this one for retrieval: https://huggingface.co/jinaai/jina-embeddings-v5-text-small-retrieval
The limitation of this is that it does not allow you to switch between task categories. There you can also find examples with the prompts ("Query: " or "Document: ").