These are MXFP4_MOE quantizations of the model GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill.

I have created a normal one, and also an importance-aware MXFP4_MOE quantization that dynamically allocates precision based on tensor importance scores from an imatrix I created with code_tiny.
This is a coding optimized quantization and is slightly larger than the mainline MXFP4_MOE, and the way it works is that it keeps a better quantization depending on the importance of each tensor.

BF16 (16-bit) for highly important tensors (>75% importance)
Q8_0 (8-bit) for moderately important tensors (>60% importance)
MXFP4 (4-bit) for less important tensors (<50% importance)

As I've mentioned it is experimental, and still not have done any benchmark on it, to see if it's any better than mainline, but you are freely to try it out and report back!

Downloads last month: 204

GGUF

Model size

30B params

Architecture

deepseek2

Hardware compatibility

4-bit

View +1 variant

Model tree for noctrex/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill

Base model

zai-org/GLM-4.7-Flash

Finetuned

unsloth/GLM-4.7-Flash

Finetuned

TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill

Quantized

(14)

this model