A100*4 deployment failed

#4
by Namgnik - opened

Ask experts how to successfully deploy.
My server environment:
NVIDIA A100 SXM14-80G * 4
CPU 192 Cores
Memory 530GB
Ubuntu 24.04.2 LTS
NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8
cuda_12.8.r12.8/compiler.35583870_0
Python 3.12.11
transformers.version 4.54.1
torch.version 2.7.1+cu126
numpy.version 2.2.6
torchvision.version 0.22.1+cu126
vllm version 0.10.0

The first startup method:
vllm serve "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8"
image.png

The second startup method:
The model file has been downloaded locally.
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "/home/dfzb/models/Qwen3-235B-A22B-Instruct-2507-FP8" --tensor_parallel_size=4 &
image.png
image.png

The third startup method:
image.png
image.png
It looks like my device doesn't support running FP8 models。

Humbly seek advice from experts, where did I go wrong or was the software dependency not prepared?

as the screen shown, fp8 require compute capability over 8.9 and A100 is 8.0. you can try other quantization or upgrade your device

Sign up or log in to comment