nvidia/llama-nemotron-embed-vl-1b-v2 · Add vLLM Usage example to README

Add vLLM Usage example to README

#7

by nvidia-oliver-holworthy - opened 11 days ago

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

nvidia-oliver-holworthy

NVIDIA org 11 days ago

•

edited 11 days ago

Add vLLM usage instructions to README covering online serving (/v1/embeddings API with chat template) and offline/in-process (llm.embed()) usage
Examples for all supported modalities: text query, text document, image document, and image+text document
Verified outputs match transformers reference (>0.9999 cosine similarity across all modalities)

Test plan

Offline vLLM embedding produces 2048-dim vectors for all four modalities
Online vLLM server returns 200 on /v1/embeddings for all four modalities
Cosine similarity between transformers and vLLM (offline + online) embeddings >0.9999 for all embedding types

Add vLLM usage instructionsb2ea0b3a

nvidia-oliver-holworthy changed pull request status to open 11 days ago

BoLiu changed pull request status to merged 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment