Instructions to use Qwen/Qwen3.6-27B-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen3.6-27B-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.6-27B-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Qwen/Qwen3.6-27B-FP8")
model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.6-27B-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Qwen/Qwen3.6-27B-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen3.6-27B-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen3.6-27B-FP8

SGLang

How to use Qwen/Qwen3.6-27B-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen3.6-27B-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen3.6-27B-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen3.6-27B-FP8 with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen3.6-27B-FP8
```

Cannot run on RTX PRO 6000 Blackwell + WSL2 — Mamba state cache OOM

#10

by noMugop - opened 3 days ago

Discussion

noMugop

3 days ago

Trying to run Qwen3.6-27B-FP8 with vLLM 0.20.0 / 0.17.1 and SGLang 0.5.10 on:

GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB VRAM, sm_120)
OS: WSL2 Ubuntu 22.04 on Windows 11 host
NVIDIA driver: 596.36 (also tested 581.80)

Result: model loads successfully (28.5 GB), but Mamba state cache allocation fails with torch.OutOfMemoryError:

torch.OutOfMemoryError: CUDA out of memory.
Tried to allocate 3.48 GiB.
GPU 0 has a total capacity of 95.59 GiB of which 50.40 GiB is free.
this process has 16 GiB memory in use [non-PyTorch CUDA overhead]

8+ hours of testing reveal this is a WSL2 GPU passthrough issue specific to Blackwell + hybrid Mamba models. The 16 GiB hidden overhead consumes invisible VRAM, leaving insufficient contiguous space for Mamba state cache.

Same issue also affects:

Qwen3.6-35B-A3B-FP8 (MoE version) — fails with 4.99 GiB allocation
Both 27B and 35B-A3B BF16 versions (likely fail similarly)

Filed bugs

vLLM: https://github.com/vllm-project/vllm/issues/41619 (main report with full debugging)

Questions for community

Has anyone successfully run Qwen3.6 family on Blackwell + WSL2?
If yes — what was your config?
If only on native Linux — confirmed.
Are there plans to support llama.cpp / Ollama / MLC for hybrid Mamba models?

Workarounds tested (none ideal)

❌ All vLLM/SGLang flag combinations
❌ NVIDIA driver downgrade (596.36 → 581.80)
❌ vLLM downgrade (0.20.0 → 0.17.1)
❌ Tight Mamba memory ratios in SGLang
✅ Switch to non-Mamba Qwen (Qwen3-32B-AWQ) — works but loses Qwen3.6 features
✅ Dual-boot native Linux — works but Windows lost

Currently waiting for either:

vLLM patch to allocate Mamba state in chunks
WSL2/NVIDIA fix for hidden 16 GiB overhead on Blackwell
llama.cpp adding Qwen3.6 support

Curious if Qwen team or community has any insights.

Thanks for the great model release. Hardware compatibility is the only blocker — Qwen3.6 architecture is otherwise excellent.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment