Integrate with Sentence Transformers v5.4

#1
by tomaarsen HF Staff - opened

Hello!

Preface

Congratulations on the release!
Your paper https://arxiv.org/pdf/2605.06132 also mentions a 0.6B model, will that also be released? And I'm looking forward to the README/model card with more details.

Pull Request overview

  • Add Sentence Transformers CrossEncoder support
  • Replace chat_template.jinja with the Qwen3-Reranker chat template

Details

I saw from https://x.com/_reachsumit/status/2052593269528535510 that this model was released, and that it was a finetune of Qwen/Qwen3-Reranker-4B and uses the same yes/no logit-score reranking head, so the Sentence Transformers CrossEncoder configuration can be copied directly from the base model. With these files in place, the model loads as a CrossEncoder in one line and reranks (query, document) pairs with .predict(...) or .rank(...):

import torch
from sentence_transformers import CrossEncoder

model = CrossEncoder(
    "IAAR-Shanghai/MemReranker-4B",
    model_kwargs={"torch_dtype": torch.bfloat16, "device_map": "cuda"},
)

query = "What is the capital of China?"
documents = [
    "The capital of China is Beijing.",
    "Beijing has been the capital of various Chinese dynasties.",
    "Python is a popular programming language.",
]

# Raw logit-difference scores
scores = model.predict([(query, doc) for doc in documents])
# tensor([2.969, 1.328, -3.141])

# 0-1 probability via sigmoid
probs = model.predict(
    [(query, doc) for doc in documents],
    activation_fn=torch.nn.Sigmoid(),
)

# Or get a sorted ranking directly
print(model.rank(query, documents))

The default prompt is "query", which injects the instruction "Given a web search query, retrieve relevant passages that answer the query" into the chat template. A custom instruction can be passed via prompts={"my_task": "..."} and default_prompt_name="my_task" on CrossEncoder(...).

Files added:

  • 1_LogitScore/config.json: LogitScore module config with true_token_id=9693 (yes) and false_token_id=2152 (no); the score is the log-odds of yes vs. no at the final token position.
  • modules.json: Sentence Transformers pipeline (Transformer -> LogitScore).
  • sentence_bert_config.json: Transformer module config (text-generation task emitting causal_logits).
  • config_sentence_transformers.json: CrossEncoder metadata and the default query prompt.
  • special_tokens_map.json: copied from Qwen/Qwen3-Reranker-4B for completeness.

The chat_template.jinja was replaced because the file currently in the repo is the base Qwen3 instruct template (the chat template that ships with the base model), not the reranker template. The reranker template takes system / query / document roles and is what Sentence Transformers' chat-templated CrossEncoder API expects. P.s., we can also put the chat template in an additional_chat_templates folder if you want to preserve the current one.

I validated the result with CrossEncoderNanoBEIREvaluator on NanoMSMARCO, reranking the top 100 from BM25:

Metric BM25 MemReranker-4B Δ
NDCG@10 54.04 69.45 +15.41
MAP 48.96 61.63 +12.67
MRR@10 47.75 61.18 +13.43

You don't have to adopt this outright, e.g. I just copied what worked for Qwen3, but perhaps you're using different prompts, etc. Either way, I wanted to let you know that these rerankers can work very conveniently with Sentence transformers nowadays (which acts as a convenience wrapper of transformers here).

  • Tom Aarsen
tomaarsen changed pull request status to open
kk04jy changed pull request status to merged

Sign up or log in to comment