Cannot run on RTX PRO 6000 Blackwell + WSL2 β€” Mamba state cache OOM

#10
by noMugop - opened

Trying to run Qwen3.6-27B-FP8 with vLLM 0.20.0 / 0.17.1 and SGLang 0.5.10 on:

  • GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB VRAM, sm_120)
  • OS: WSL2 Ubuntu 22.04 on Windows 11 host
  • NVIDIA driver: 596.36 (also tested 581.80)

Result: model loads successfully (28.5 GB), but Mamba state cache allocation fails with torch.OutOfMemoryError:

torch.OutOfMemoryError: CUDA out of memory.
Tried to allocate 3.48 GiB.
GPU 0 has a total capacity of 95.59 GiB of which 50.40 GiB is free.
this process has 16 GiB memory in use [non-PyTorch CUDA overhead]

8+ hours of testing reveal this is a WSL2 GPU passthrough issue specific to Blackwell + hybrid Mamba models. The 16 GiB hidden overhead consumes invisible VRAM, leaving insufficient contiguous space for Mamba state cache.

Same issue also affects:

  • Qwen3.6-35B-A3B-FP8 (MoE version) β€” fails with 4.99 GiB allocation
  • Both 27B and 35B-A3B BF16 versions (likely fail similarly)

Filed bugs

Questions for community

  1. Has anyone successfully run Qwen3.6 family on Blackwell + WSL2?
  2. If yes β€” what was your config?
  3. If only on native Linux β€” confirmed.
  4. Are there plans to support llama.cpp / Ollama / MLC for hybrid Mamba models?

Workarounds tested (none ideal)

  • ❌ All vLLM/SGLang flag combinations
  • ❌ NVIDIA driver downgrade (596.36 β†’ 581.80)
  • ❌ vLLM downgrade (0.20.0 β†’ 0.17.1)
  • ❌ Tight Mamba memory ratios in SGLang
  • βœ… Switch to non-Mamba Qwen (Qwen3-32B-AWQ) β€” works but loses Qwen3.6 features
  • βœ… Dual-boot native Linux β€” works but Windows lost

Currently waiting for either:

  • vLLM patch to allocate Mamba state in chunks
  • WSL2/NVIDIA fix for hidden 16 GiB overhead on Blackwell
  • llama.cpp adding Qwen3.6 support

Curious if Qwen team or community has any insights.

Thanks for the great model release. Hardware compatibility is the only blocker β€” Qwen3.6 architecture is otherwise excellent.

Sign up or log in to comment