Please update llama.cpp to see improved performance!
Hey guys, please update llama.cpp to use the latest updates from 2 days ago. According to many of people and our tests, you should see large improvements in Devstral 2 etc for use cases like tool calling as well. Looping should be also less.
We'll be reconverting today and all should be reuploaded by tomorrow.
See these 2 pull requests and issues:
https://github.com/ggml-org/llama.cpp/pull/17945
https://github.com/ggml-org/llama.cpp/issues/17980
Good news :)
Unfortunately, Q4_K_XL and Q6_K_XL is not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.
Unfortunately,
Q4_K_XLandQ6_K_XLis not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.
Could you try the full precision and see if it happens? We tested it and it doesn't seem to have issues.
{"choices":[{"finish_reason":"stop","index":0,
"message":{"role":"assistant","content":"Hello! How can I assist you today?"}}],
"created":1765810152,"model":"Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf"
....
Runs for me with llama.cpp:server-cuda. π
I've tested 123b Q8_K_XL and it works fine. I am using llama.rocm. Tomorrow I am going to re-test Q6.
I've tested Q6_K_XL and now it is working, so I think the problem was the old version of llama.cpp. Now I have llama.cpp version: 1122 (d6a1e18c).