Intel
/

GLM-5-int4-mixed-AutoRound

text-generation-inference

4-bit precision

Model card Files Files and versions

Resources

View closed (0)

vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)

#2 opened about 1 month ago by

oliverjohnwilson

This model always predicts some few nonsense sequences

#1 opened about 1 month ago by

CharlesChen2023