A100*4 deployment failed

by Namgnik - opened Jul 31, 2025

Jul 31, 2025

Ask experts how to successfully deploy.
My server environment：
NVIDIA A100 SXM14-80G * 4
CPU 192 Cores
Memory 530GB
Ubuntu 24.04.2 LTS
NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8
cuda_12.8.r12.8/compiler.35583870_0
Python 3.12.11
transformers.version 4.54.1
torch.version 2.7.1+cu126
numpy.version 2.2.6
torchvision.version 0.22.1+cu126
vllm version 0.10.0

The first startup method:
vllm serve "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8"

The second startup method:
The model file has been downloaded locally.
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "/home/dfzb/models/Qwen3-235B-A22B-Instruct-2507-FP8" --tensor_parallel_size=4 &

The third startup method:

It looks like my device doesn't support running FP8 models。

Humbly seek advice from experts, where did I go wrong or was the software dependency not prepared?

zpoteiti

Aug 15, 2025

as the screen shown, fp8 require compute capability over 8.9 and A100 is 8.0. you can try other quantization or upgrade your device

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment