nvidia/llama-nemotron-rerank-1b-v2 · Fix vLLM usage: add required score template for correct reranking

Fix vLLM usage: add required score template for correct reranking

#13

by nvidia-oliver-holworthy - opened 15 days ago

base: refs/heads/main

←

from: refs/pr/13

Discussion Files changed

+23

-6

nvidia-oliver-holworthy

NVIDIA org 15 days ago

•

edited 15 days ago

The existing vLLM usage instructions produce incorrect scores because the question:... passage:... prompt template is not applied. Without explicitly passing a score template via --chat-template, vLLM falls back to simple sep-token concatenation of the query and document, bypassing the prompt format the model was trained with.

Changes

Add the nemotron-rerank.jinja score template inline in the README (so users don't need a local vLLM clone)
Add --chat-template nemotron-rerank.jinja to the vllm serve command
Add chat_template=SCORE_TEMPLATE to the offline llm.score() example
Add a note explaining the template is required

Fix vLLM usage: add required score template for correct reranking2c6498e9

nvidia-oliver-holworthy changed pull request status to open 15 days ago

Fix score template whitespace in online serving example710f3dbd

BoLiu changed pull request status to merged 14 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment