Do NOT use CUDA 13.2

pinned

by danielhanchen - opened 12 days ago

Unsloth AI org 12 days ago

•

Hey guys, please do not use CUDA 13.2 to run any quantized model or GGUF. Using CUDA 13.2 can lead to gibberish or otherwise incorrect outputs, and tool calling may break on Gemma 4, GLM-5.1, and all models.

We’ve confirmed this internally, and the issue has also been reported by llama.cpp and 30+ users. This is not an Unsloth GGUF specific issue. See here.

We notified NVIDIA 5–6 days ago, but the issue still does not appear to be fixed. This may explain why some of you have been seeing wildly different results with Gemma 4 or quants in general. It may also explain why some GGUFs seem broken in llama.cpp, leading people to assume it’s a quant/GGUF problem (when it's not), while the same models work fine in Unsloth Studio, Ollama, or LM Studio.

For now, you can:

use our precompiled llama.cpp binary, which uses CUDA 13,
use Unsloth Studio, which does not use CUDA 13.2, or
use any CUDA version lower than 13.2.

Thanks so much and let me know if you have any questions! :)

danielhanchen pinned discussion 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment