Add vLLM Usage example to README
#7
by nvidia-oliver-holworthy - opened
- Add vLLM usage instructions to README covering online serving (
/v1/embeddingsAPI with chat template) and offline/in-process (llm.embed()) usage - Examples for all supported modalities: text query, text document, image document, and image+text document
- Verified outputs match transformers reference (>0.9999 cosine similarity across all modalities)
Test plan
- Offline vLLM embedding produces 2048-dim vectors for all four modalities
- Online vLLM server returns 200 on
/v1/embeddingsfor all four modalities - Cosine similarity between transformers and vLLM (offline + online) embeddings >0.9999 for all embedding types
nvidia-oliver-holworthy changed pull request status to open
BoLiu changed pull request status to merged