Spaces:

burakaydinofficial
/

Quantuzo

Running

App Files Files Community

Qwen 3.5 9b

#1

by danielus - opened 28 days ago

Can you test also Qwen 3.5 9b Q8, It would be nice to see how it competes against the 27B at 4-bit.

burakaydinofficial

Owner 28 days ago

Already in the queue! Will tag you when it lands.

burakaydinofficial

Owner 19 days ago

•

edited 19 days ago

Can you test also Qwen 3.5 9b Q8, It would be nice to see how it competes against the 27B at 4-bit.

I did some runs with qwen3.5-9b-q8 with f16 and q8 kv quantization levels. Also renewed the 27B test to make sure the scores are accurate.

Have you checked the new merged PR in llama.cpp that implements the KV vector rotation to have better accuracy?

burakaydinofficial

Owner 19 days ago

•

edited 19 days ago

Are you talking about this one https://github.com/ggml-org/llama.cpp/pull/21192 ? I just saw it. I don't know if it would help a lot. In my tests I don't see any useful difference other than random noise on swe bench lite benchmark with different kv cache quantizations. I was working on a brand new benchmark focusing on summarization tasks on long context window. To test the performance after the context exceeds 128k.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment