Integrate with Sentence Transformers v5.4

by tomaarsen HF Staff - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+153

-5

tomaarsen

2 days ago

•

edited 2 days ago

Hello!

Pull Request overview

Integrate E5-V with Sentence Transformers v5.4

Details

The integration uses a Transformer(feature-extraction) -> Pooling(lasttoken) -> Normalize pipeline. A custom chat_template.jinja was added that automatically applies the correct instruction suffix depending on whether the input is text-only or contains an image, so users simply pass raw text strings or PIL images to model.encode(). The processor config was updated with patch_size and vision_feature_select_strategy fields required by newer transformers versions (the model was originally published for transformers 4.41.2).

Added files:

modules.json: Defines the Transformer -> Pooling -> Normalize module pipeline
config_sentence_transformers.json: ST model metadata (cosine similarity, no prompts)
sentence_bert_config.json: Transformer config with feature-extraction task, modality config for text/image/message, and add_generation_prompt processing kwarg
1_Pooling/config.json: Lasttoken pooling with embedding dimension 4096
chat_template.jinja: Custom chat template that wraps text with "Summary above sentence in one word:" and images with "Summary above image in one word:", using the LLaMA 3 format
processor_config.json: Adds patch_size: 14 and vision_feature_select_strategy: "full" required by transformers v5
assets/dog.jpg, assets/cat.jpg: Example images for the usage snippet, these are the same as the Wikipedia ones, but Wikipedia requires User-Agents now
test_baseline.py: Baseline script using transformers directly
test_st.py: Sentence Transformers integration test script

Modified files:

README.md: Added sentence-transformers library tag, sentence-similarity pipeline tag, and a "Using Sentence Transformers" usage section. Updated image URLs to use valid Wikipedia thumbnail sizes.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("royokong/e5-v")

# Encode text inputs
texts = [
    "A dog sitting in the grass.",
    "A dog standing in the snow.",
    "A cat sitting in the grass.",
    "A cat standing in the snow.",
]
text_embeddings = model.encode(texts)
print(text_embeddings.shape)
# (4, 4096)

# Encode image inputs
images = [
    "https://huggingface.co/royokong/e5-v/resolve/main/assets/dog.jpg",
    "https://huggingface.co/royokong/e5-v/resolve/main/assets/cat.jpg",
]
image_embeddings = model.encode(images)
print(image_embeddings.shape)
# (2, 4096)

# Compute text-image similarities
similarities = model.similarity(text_embeddings, image_embeddings)
print(similarities)
# tensor([[0.7183, 0.3579],
#         [0.5806, 0.5522],
#         [0.4714, 0.6479],
#         [0.4150, 0.8081]])

This script won't work until the PR is merged, as the assets don't exist yet. To test it locally, you can replace the image URLs with local paths to the downloaded images, and you can use SentenceTransformer("royokong/e5-v", revision="refs/pr/12") to test it before merging.

Note: the image similarity scores from the Sentence Transformers snippet differ slightly from the transformers snippet (e.g. 0.7183 vs 0.7275 for the first pair). This is because the transformers example scores were produced with transformers 4.41.2 (the version the model was originally published with), while Sentence Transformers uses a newer transformers version (v5.x) which has changes in the image processing pipeline. Running the transformers example with v5.x produces scores that match the Sentence Transformers output. Text-only embeddings are identical across both versions.

Note that none of the old behaviour is affected/changed. It only adds an additional way to run this model in a familiar and common format.

Tom Aarsen

Integrate with Sentence Transformers v5.4358d3544

tomaarsen changed pull request status to open 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment