Hello, its me again, Back at it with another GLM Model!

#1
by InfernalDread - opened

Hello,

Now that GLM 4.6 model has the files released, is it possible for you to do the same MXFP4 model for this as well? I absolutly love your GLM 4.5 version and really hope that we can get it for 4.6 as well.

Thank you for your time and efforts, we appreciate you!

Seems like you might know. Does this 4.5 model run as-is in vLLM?

Also, +1 to a 4.6 version

Seems like you might know. Does this 4.5 model run as-is in vLLM?

I have not tried GGUFs with vLLM, but I do recall there be experimental support for it in the past. However, this is a different than usual quant type, so I am at a bit of a loss for this question. I do hope that it does run though, the capabilities that vLLM provide make it a great inference engine for LLMs.

Yeah, no worries, it's being uploaded now to here, should be available shortly: https://huggingface.co/sm54/GLM-4.6-MXFP4_MOE

You'll need the latest version of llama cpp to run it, however I've not tested it yet to see how it performs.

Yeah, no worries, it's being uploaded now to here, should be available shortly: https://huggingface.co/sm54/GLM-4.6-MXFP4_MOE

You'll need the latest version of llama cpp to run it, however I've not tested it yet to see how it performs.

Thank you so much! I really appreciate the work your doing!

Sign up or log in to comment