Inference is slow with A100

by weisunding - opened 8 days ago

I tested the model with vLLM + A100/80G, the inference is slow, it's the Gemma4 dense model issue?

Red Hat AI org 7 days ago

Hi @weisunding , please look through thread here to determine appropriate vllm benchmark commands to see improvement gains: https://github.com/vllm-project/llm-compressor/issues/2549

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment