vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)
1
#2 opened about 1 month ago
by
oliverjohnwilson
This model always predicts some few nonsense sequences
8
#1 opened about 1 month ago
by
CharlesChen2023