Please update llama.cpp to see improved performance!

pinned

by danielhanchen - opened Dec 13, 2025

Unsloth AI org Dec 13, 2025

Hey guys, please update llama.cpp to use the latest updates from 2 days ago. According to many of people and our tests, you should see large improvements in Devstral 2 etc for use cases like tool calling as well. Looping should be also less.

We'll be reconverting today and all should be reuploaded by tomorrow.

See these 2 pull requests and issues:
https://github.com/ggml-org/llama.cpp/pull/17945
https://github.com/ggml-org/llama.cpp/issues/17980

vico44

Dec 14, 2025

Good news :)

puchuu

Dec 14, 2025

Unfortunately, Q4_K_XL and Q6_K_XL is not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.

danielhanchen pinned discussion Dec 15, 2025

danielhanchen

Unsloth AI org Dec 15, 2025

Unfortunately, Q4_K_XL and Q6_K_XL is not working for me. It hangs and spams random sentence in an infinite loop. Meanwhile devstral 2 small is working perfectly.

Could you try the full precision and see if it happens? We tested it and it doesn't seem to have issues.

martinsu

Dec 15, 2025

{"choices":[{"finish_reason":"stop","index":0,
"message":{"role":"assistant","content":"Hello! How can I assist you today?"}}],
"created":1765810152,"model":"Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf"
 ....

Runs for me with llama.cpp:server-cuda. 👍

puchuu

Dec 15, 2025

I've tested 123b Q8_K_XL and it works fine. I am using llama.rocm. Tomorrow I am going to re-test Q6.

puchuu

Dec 16, 2025

I've tested Q6_K_XL and now it is working, so I think the problem was the old version of llama.cpp. Now I have llama.cpp version: 1122 (d6a1e18c).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment