Integrate with Sentence Transformers v5.4

This PR adds the configuration files and code changes needed to load this model directly as a CrossEncoder via Sentence Transformers, with support for all four modality combinations: text-to-text, text-to-image, image-to-text, and image-to-image reranking.

The model's JinaVLForRanking architecture has several quirks that required handling:

The model requires a special score token (ID 100) appended after tokenization. The forward() method now auto-appends it if not already present, so both compute_score (which appended it manually) and Sentence Transformers (which doesn't) work correctly.
The model expects **Document**:\n{doc}\n**Query**:\n{query} formatting rather than the standard Qwen2-VL chat template. A new chat_template.jinja replaces the original template to produce this format from Sentence Transformers' structured query/document messages.
A custom JinaRerankerTransformer module swaps image-image pairs before preprocessing so the processor extracts images in doc-first order, matching the chat template's rendering order.
Moved the sigmoid+bias score normalization (sigmoid(logit - 2.65)) from compute_score into forward(), so Sentence Transformers gets normalized [0, 1] scores directly.
Fixed config.hidden_size access (now via text_config) and tie_word_embeddings (disabled before init to avoid nn.Identity lm_head issues), and extended mm_token_type_ids alongside input_ids/attention_mask when appending the score token.

Added files:

modules.json: pipeline with a single custom JinaRerankerTransformer module
sentence_bert_config.json: feature-extraction task (to load via AutoModel), structured message format, custom method_output_name: null to capture raw forward output as scores
config_sentence_transformers.json: CrossEncoder model type with Identity activation (model already returns normalized scores)
custom_transformer.py: JinaRerankerTransformer that swaps image-image pair order in preprocess to fix image extraction ordering
chat_template.jinja: reranking-specific template with query/document roles and vision token support

Changed files:

modeling.py: auto-append score token in forward(), move sigmoid normalization from compute_score to forward(), fix hidden_size/tie_word_embeddings compatibility, extend mm_token_type_ids on score token append, add explicit vision params to forward() signature
preprocessor_config.json: set max_pixels to 602112 to match compute_score's processor settings
tokenizer_config.json: point chat_template to new chat_template.jinja
README.md: added sentence-transformers tag, added usage section showing all four modality combinations (text-to-text, text-to-image, image-to-text, image-to-image)

Once the Sentence Transformers v5.4 release is out, the model can be used immediately like so:

from sentence_transformers import CrossEncoder

model = CrossEncoder("jinaai/jina-reranker-m0", trust_remote_code=True, revision="refs/pr/24")

query = "slm markdown"
documents = [
    "We present ReaderLM-v2, a compact 1.5 billion parameter language model...",
    "数据提取么？为什么不用正则啊，你用正则不就全解决了么？",
    "During the California Gold Rush, some merchants made more money selling supplies to miners than the miners made finding gold.",
]

# Text-only reranking
rankings = model.rank(query, documents)
print(rankings)
# [{'corpus_id': 0, 'score': 0.6875}, {'corpus_id': 2, 'score': 0.5938}, {'corpus_id': 1, 'score': 0.4434}]

# Text-to-image reranking
image_docs = [
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png",
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png",
]
scores = model.predict([(query, doc) for doc in image_docs])
print(scores)
# [0.7813 0.4980]

And after merging, the revision argument can be dropped.

Note that none of the old behaviour is affected/changed, feel free to double-check this. It only adds an additional way to run this model in a familiar and common format.

Tom Aarsen

tomaarsen changed pull request status to open 14 days ago

Fully remove the unnecessary chat_template in config. chat_template.jinja has priority2cec12ae

numb3r3

Jina AI org 13 days ago

•

edited 13 days ago

LGTM! really appreciate your effort @tomaarsen to make m0 available in sentence-transformers.

numb3r3 changed pull request status to merged 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment