Please update llama.cpp to see improved performance!

#5
by danielhanchen - opened
Unsloth AI org

Hey guys, please update llama.cpp to use the latest updates from 2 days ago. According to many of people and our tests, you should see large improvements in Devstral 2 etc for use cases like tool calling as well. Looping should be also less.

We'll be reconverting today and all should be reuploaded by tomorrow.

See these 2 pull requests and issues:
https://github.com/ggml-org/llama.cpp/pull/17945
https://github.com/ggml-org/llama.cpp/issues/17980

Good news :)

Unfortunately, Q4_K_XL and Q6_K_XL is not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.

danielhanchen pinned discussion
Unsloth AI org

Unfortunately, Q4_K_XL and Q6_K_XL is not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.

Could you try the full precision and see if it happens? We tested it and it doesn't seem to have issues.

{"choices":[{"finish_reason":"stop","index":0,
"message":{"role":"assistant","content":"Hello! How can I assist you today?"}}],
"created":1765810152,"model":"Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf"
 ....

Runs for me with llama.cpp:server-cuda. πŸ‘

I've tested 123b Q8_K_XL and it works fine. I am using llama.rocm. Tomorrow I am going to re-test Q6.

I've tested Q6_K_XL and now it is working, so I think the problem was the old version of llama.cpp. Now I have llama.cpp version: 1122 (d6a1e18c).

Sign up or log in to comment