Failed to initialize the context: quantized V cache was requested, but this requires Flash Attention
#4
by SilverJim - opened
When I use the Q4_K_XL in LMStudio and set the V quantization type as Q8_0 there is a error "Failed to initialize the context: quantized V cache was requested, but this requires Flash Attention"
This comment has been hidden (marked as Off-Topic)
you need to enable "Flash Attention".. cmon the error is self-explanatory
Note: GLM-4.7-Flash does not use V, only K.