Thank you!

by mtcl - opened Jan 26

Discussion

mtcl

Jan 26

I have been waiting for this. Thank you!

kbuettner

Jan 29

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto — thank you!

maglat

Feb 10

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto — thank you!

what exact settings you are using to have it running with vLLM?

kbuettner

Feb 10

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto — thank you!

what exact settings you are using to have it running with vLLM?

Currently running vllm 0.16.0rc1.dev188+g80f921ba4 using this command:

VLLM_MARLIN_USE_ATOMIC_ADD=1
vllm serve cyankiwi/GLM-4.7-REAP-268B-A32B-AWQ-4bit
-tp 2
--max-num-seqs 3
--gpu-memory-utilization 0.965
--trust_remote_code
--reasoning-parser glm45
--tool-call-parser glm47 --enable-auto-tool-choice
--kv-cache-dtype fp8 --calculate-kv-scales
--kv_offloading_backend native --kv_offloading_size 64
--disable-hybrid-kv-cache-manager
--max-model-len 184992

varemu

Mar 9

•

edited Mar 9

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto — thank you!

Same here! Exact specs! this model rocks!

varemu

Mar 9

Looking forward GLM-5-REAP-AWQ-4bit!

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment