Thank you!

#1
by mtcl - opened

I have been waiting for this. Thank you!

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto β€” thank you!

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto β€” thank you!

what exact settings you are using to have it running with vLLM?

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto β€” thank you!

what exact settings you are using to have it running with vLLM?

Currently running vllm 0.16.0rc1.dev188+g80f921ba4 using this command:

VLLM_MARLIN_USE_ATOMIC_ADD=1
vllm serve cyankiwi/GLM-4.7-REAP-268B-A32B-AWQ-4bit
-tp 2
--max-num-seqs 3
--gpu-memory-utilization 0.965
--trust_remote_code
--reasoning-parser glm45
--tool-call-parser glm47 --enable-auto-tool-choice
--kv-cache-dtype fp8 --calculate-kv-scales
--kv_offloading_backend native --kv_offloading_size 64
--disable-hybrid-kv-cache-manager
--max-model-len 184992

I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.

So... ditto β€” thank you!

Same here! Exact specs! this model rocks!

Looking forward GLM-5-REAP-AWQ-4bit!

Thank you!

Sign up or log in to comment