This is a imatrix MXFP4_MOE quantization of the model GLM-4.6V, based on the imatrix from unsloth.

Get the latest llama.cpp in order to run it.

For the mmproj, try to use the largest version you can fit in memory in order to get the best results.
F32 > BF16 > F16 > Q8_0.

GGUF

Model size

107B params

Architecture

glm4moe

Hardware compatibility

4-bit

Model tree for noctrex/GLM-4.6V-MXFP4_MOE-GGUF

Base model

Quantized

(18)

this model