How to run deepseek on Ada GPUs?Mine is L20.
Does the L20 card not support this model? I am using vllm.
The L20 GPU has 48 GB of memory, so you don't have enough space to load the DeepSeek-V4 models. From my understanding you need at least 158~ GB of memory for V4-Flash.
The L20 GPU has 48 GB of memory, so you don't have enough space to load the DeepSeek-V4 models. From my understanding you need at least 158~ GB of memory for V4-Flash.
I have 8*L20. GPU memory enough, The architecture simply doesn't support running it.
The L20 GPU has 48 GB of memory, so you don't have enough space to load the DeepSeek-V4 models. From my understanding you need at least 158~ GB of memory for V4-Flash.
I have 8*L20. GPU memory enough, The architecture simply doesn't support running it.
This PR may help you, I have not tried this PR yet. https://github.com/vllm-project/vllm/pull/40906 But seems the decoding speed is not satisfying.
Unfortunately L20 is SM89, so it will not be officially supported by vLLM. From: https://github.com/vllm-project/vllm/issues/40902:
We don't plan to support hardwares under SM90 in the official repo since that will introduce significant maintenance overhead.
The PR is your best bet. Alternatively, start from the inference code provided with DeepSeek-V4.