Spaces:
Running
Qwen 3.5 9b
Can you test also Qwen 3.5 9b Q8, It would be nice to see how it competes against the 27B at 4-bit.
Already in the queue! Will tag you when it lands.
Can you test also Qwen 3.5 9b Q8, It would be nice to see how it competes against the 27B at 4-bit.
I did some runs with qwen3.5-9b-q8 with f16 and q8 kv quantization levels. Also renewed the 27B test to make sure the scores are accurate.
Have you checked the new merged PR in llama.cpp that implements the KV vector rotation to have better accuracy?
Are you talking about this one https://github.com/ggml-org/llama.cpp/pull/21192 ? I just saw it. I don't know if it would help a lot. In my tests I don't see any useful difference other than random noise on swe bench lite benchmark with different kv cache quantizations. I was working on a brand new benchmark focusing on summarization tasks on long context window. To test the performance after the context exceeds 128k.