Failed to initialize the context: quantized V cache was requested, but this requires Flash Attention

by SilverJim - opened Jan 29

Jan 29

When I use the Q4_K_XL in LMStudio and set the V quantization type as Q8_0 there is a error "Failed to initialize the context: quantized V cache was requested, but this requires Flash Attention"

calculatortamer

Jan 30

This comment has been hidden (marked as Off-Topic)

floory

Feb 4

•

edited Feb 4

you need to enable "Flash Attention".. cmon the error is self-explanatory
Note: GLM-4.7-Flash does not use V, only K.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment