Add vLLM Usage example to README

#7
  • Add vLLM usage instructions to README covering online serving (/v1/embeddings API with chat template) and offline/in-process (llm.embed()) usage
  • Examples for all supported modalities: text query, text document, image document, and image+text document
  • Verified outputs match transformers reference (>0.9999 cosine similarity across all modalities)

Test plan

  • Offline vLLM embedding produces 2048-dim vectors for all four modalities
  • Online vLLM server returns 200 on /v1/embeddings for all four modalities
  • Cosine similarity between transformers and vLLM (offline + online) embeddings >0.9999 for all embedding types
nvidia-oliver-holworthy changed pull request status to open
BoLiu changed pull request status to merged

Sign up or log in to comment