Integrate with Sentence Transformers v5.4

#24
by tomaarsen HF Staff - opened

Hello!

Pull Request overview

  • Integrate this model as a Sentence Transformers CrossEncoder

Details

This PR adds the configuration files and code changes needed to load this model directly as a CrossEncoder via Sentence Transformers, with support for all four modality combinations: text-to-text, text-to-image, image-to-text, and image-to-image reranking.

The model's JinaVLForRanking architecture has several quirks that required handling:

  1. The model requires a special score token (ID 100) appended after tokenization. The forward() method now auto-appends it if not already present, so both compute_score (which appended it manually) and Sentence Transformers (which doesn't) work correctly.
  2. The model expects **Document**:\n{doc}\n**Query**:\n{query} formatting rather than the standard Qwen2-VL chat template. A new chat_template.jinja replaces the original template to produce this format from Sentence Transformers' structured query/document messages.
  3. A custom JinaRerankerTransformer module swaps image-image pairs before preprocessing so the processor extracts images in doc-first order, matching the chat template's rendering order.
  4. Moved the sigmoid+bias score normalization (sigmoid(logit - 2.65)) from compute_score into forward(), so Sentence Transformers gets normalized [0, 1] scores directly.
  5. Fixed config.hidden_size access (now via text_config) and tie_word_embeddings (disabled before init to avoid nn.Identity lm_head issues), and extended mm_token_type_ids alongside input_ids/attention_mask when appending the score token.

Added files:

  • modules.json: pipeline with a single custom JinaRerankerTransformer module
  • sentence_bert_config.json: feature-extraction task (to load via AutoModel), structured message format, custom method_output_name: null to capture raw forward output as scores
  • config_sentence_transformers.json: CrossEncoder model type with Identity activation (model already returns normalized scores)
  • custom_transformer.py: JinaRerankerTransformer that swaps image-image pair order in preprocess to fix image extraction ordering
  • chat_template.jinja: reranking-specific template with query/document roles and vision token support

Changed files:

  • modeling.py: auto-append score token in forward(), move sigmoid normalization from compute_score to forward(), fix hidden_size/tie_word_embeddings compatibility, extend mm_token_type_ids on score token append, add explicit vision params to forward() signature
  • preprocessor_config.json: set max_pixels to 602112 to match compute_score's processor settings
  • tokenizer_config.json: point chat_template to new chat_template.jinja
  • README.md: added sentence-transformers tag, added usage section showing all four modality combinations (text-to-text, text-to-image, image-to-text, image-to-image)

Once the Sentence Transformers v5.4 release is out, the model can be used immediately like so:

from sentence_transformers import CrossEncoder

model = CrossEncoder("jinaai/jina-reranker-m0", trust_remote_code=True, revision="refs/pr/24")

query = "slm markdown"
documents = [
    "We present ReaderLM-v2, a compact 1.5 billion parameter language model...",
    "数据提取么?为什么不用正则啊,你用正则不就全解决了么?",
    "During the California Gold Rush, some merchants made more money selling supplies to miners than the miners made finding gold.",
]

# Text-only reranking
rankings = model.rank(query, documents)
print(rankings)
# [{'corpus_id': 0, 'score': 0.6875}, {'corpus_id': 2, 'score': 0.5938}, {'corpus_id': 1, 'score': 0.4434}]

# Text-to-image reranking
image_docs = [
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png",
    "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png",
]
scores = model.predict([(query, doc) for doc in image_docs])
print(scores)
# [0.7813 0.4980]

And after merging, the revision argument can be dropped.

Note that none of the old behaviour is affected/changed, feel free to double-check this. It only adds an additional way to run this model in a familiar and common format.

  • Tom Aarsen
tomaarsen changed pull request status to open

LGTM! really appreciate your effort @tomaarsen to make m0 available in sentence-transformers.

numb3r3 changed pull request status to merged

Sign up or log in to comment