This is a imatrix MXFP4_MOE quantization of the model GLM-4.6V, based on the imatrix from unsloth.
Get the latest llama.cpp in order to run it.
For the mmproj, try to use the largest version you can fit in memory in order to get the best results.F32 > BF16 > F16 > Q8_0.
Chat template
4-bit
Base model