Can't get vLLM running on 1xRTX 4090

by slyfox1186 - opened Feb 25

Discussion

slyfox1186

Feb 25

I can not get --cpu-offload-gb to work. Anyone have this working on a 24GB VRAM Nvidia card?

tclf90

QuantTrio org Feb 25

for vllm, one can refer to https://github.com/guqiong96/Lvllm

ktransformers via sglang

tclf90 changed discussion title from Can't get vLLM running on RTX 4090 to Can't get vLLM running on 1xRTX 4090 Feb 26

rpcosta

Mar 3

Hi everyone 👋

I’m currently trying to run Qwen3.5-35B-A3B-AWQ locally using vLLM, but I’m running into repeated issues related to KV cache memory and engine initialization.

My setup:
• GPU: NVIDIA RTX 3090 (24GB)
• CUDA: 13.1
• Driver: 590.48.01
• vLLM (latest stable)
• Model: Qwen3.5-35B-A3B-AWQ (downloaded locally)

Typical issues I’m facing:
• Negative or extremely small KV cache memory
• Engine failing during CUDA graph capture
• Assertion errors during warmup
• Instability when increasing max context length

I’ve experimented with:
• --gpu-memory-utilization between 0.70 and 0.96
• --max-model-len from 1024 up to 4096
• --enforce-eager
• Limiting concurrency

But I still haven’t found a stable configuration.

⸻

My main questions:
1. Has anyone successfully run Qwen3.5-35B-A3B-AWQ on a single 24GB GPU (like a 3090)?
2. If so, could you share:
• Your full vLLM command
• Max context length used
• Whether you needed swap space
• Any special flags
3. Is this model realistically expected to run reliably on a single 24GB GPU, or is multi-GPU / 48GB+ VRAM effectively required?

Any guidance or known-good configurations would be greatly appreciated 🙏

Thanks in advance!

tclf90

QuantTrio org Mar 3

•

edited Mar 3

You'd probably need to disable the vision part in order to run this model efficiently on one 24GB card (--language-model-only). Also fp8 kv cache (--kv-cache-dtype fp8_e4m3)

Is this model realistically expected to run reliably on a single 24GB GPU, or is multi-GPU / 48GB+ VRAM effectively required?

2x20GB is tested to be enough

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment