LM Studio RotorQuant KV cache setting missing

by RudieVoeller - opened 7 days ago

•

Hey,
sorry for probably being stupid but I just can't find the RotorQuant KV cache setting you are referencing. Where is it supposed to be? Im on Windows with LM Studio 0.4.11 (Build 1) with CUDA 12 llama.cpp (Windows) v2.13.0 runtime that uses llama.cpp release b8733 (commit d6f3030).

Can't see anything about RotorQuant KV cache in the logs while loading the model. The standard LM Studio settings doesn't seem to have that option either.

majentik

Owner 7 days ago

•

edited 7 days ago

If you specifically want RotorQuant, you'd need to build the forked llama.cpp yourself from source and run it via CLI, bypassing LM Studio entirely. We will correct the instructions. The GGUF file itself is a standard GGUF. The weight quantization (Q5_K_M) works regardless of which KV cache strategy you use.

ponytang3

6 days ago

Hi! Thank you for sharing the Qwen3.5-27B-RotorQuant-GGUF quantization — it's a really interesting approach.

I'd love to learn how to create similar quantized GGUF versions myself. Could you point me in the right direction?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment