Thank you!
I have been waiting for this. Thank you!
I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.
So... ditto β thank you!
I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.
So... ditto β thank you!
what exact settings you are using to have it running with vLLM?
I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.
So... ditto β thank you!
what exact settings you are using to have it running with vLLM?
Currently running vllm 0.16.0rc1.dev188+g80f921ba4 using this command:
VLLM_MARLIN_USE_ATOMIC_ADD=1
vllm serve cyankiwi/GLM-4.7-REAP-268B-A32B-AWQ-4bit
-tp 2
--max-num-seqs 3
--gpu-memory-utilization 0.965
--trust_remote_code
--reasoning-parser glm45
--tool-call-parser glm47 --enable-auto-tool-choice
--kv-cache-dtype fp8 --calculate-kv-scales
--kv_offloading_backend native --kv_offloading_size 64
--disable-hybrid-kv-cache-manager
--max-model-len 184992
I've been testing this model / quant for several days now. At the present time, this is the best model that I can run on my two RTX Pro 6000 Blackwell GPUs.
So... ditto β thank you!
Same here! Exact specs! this model rocks!
Looking forward GLM-5-REAP-AWQ-4bit!
Thank you!