Fix vLLM usage: add required score template for correct reranking
#13
by nvidia-oliver-holworthy - opened
The existing vLLM usage instructions produce incorrect scores because the question:... passage:... prompt template is not applied. Without explicitly passing a score template via --chat-template, vLLM falls back to simple sep-token concatenation of the query and document, bypassing the prompt format the model was trained with.
Changes
- Add the nemotron-rerank.jinja score template inline in the README (so users don't need a local vLLM clone)
- Add --chat-template nemotron-rerank.jinja to the vllm serve command
- Add chat_template=SCORE_TEMPLATE to the offline llm.score() example
- Add a note explaining the template is required
nvidia-oliver-holworthy changed pull request status to open
BoLiu changed pull request status to merged