Inference is slow with A100

#3
by weisunding - opened

I tested the model with vLLM + A100/80G, the inference is slow, it's the Gemma4 dense model issue?

Red Hat AI org

Hi @weisunding , please look through thread here to determine appropriate vllm benchmark commands to see improvement gains: https://github.com/vllm-project/llm-compressor/issues/2549

Sign up or log in to comment