CUDA 13.2 causes gibberish for UD-IQ4_XS quant on llama.cpp (b8815/latest)
I'm getting gibberish and sometimes crashes when running the UD-IQ4_XS quant with CUDA, but only CUDA on this quant . It works fine with Vulkan. And the UD-Q4_K_XL quant works fine with either backend. I don't know if this is a quantization issue, or a llama.cpp problem.
Example:
> PING
[Start thinking]
所在
( 小证2318238MTech工程:01119
269年285394/77.4942S2S20.jpg]
The purpose and
Unf
there
\$$
h_{o}o\orrlist /{}\nb8:3c
2am06de394f9
354cO27z489zY40g8gH16f0006e2564331585968e584e028361a22669_67264
} </FSUB}
> What is your knowledge cutoff date?
[Start thinking]
#|PNSD"}
20206319167
## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x192b6) [0x7fd96d3502b6]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x203) [0x7fd96d350633]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x315c9) [0x7fd96d3685c9]
/usr/lib/libstdc++.so.6(+0xb3dbc) [0x7fd9602b3dbc]
/usr/lib/libstdc++.so.6(_ZSt10unexpectedv+0x0) [0x7fd960294644]
/usr/lib/libstdc++.so.6(+0xb40c8) [0x7fd9602b40c8]
./llama.cpp/build/bin/llama-cli(+0x71dfb) [0x5610b14dddfb]
./llama.cpp/build/bin/llama-cli(+0x20d5e8) [0x5610b16795e8]
./llama.cpp/build/bin/llama-cli(+0xe9a73) [0x5610b1555a73]
./llama.cpp/build/bin/llama-cli(+0x123859) [0x5610b158f859]
./llama.cpp/build/bin/llama-cli(+0x1373b8) [0x5610b15a33b8]
./llama.cpp/build/bin/llama-cli(+0xe21e8) [0x5610b154e1e8]
./llama.cpp/build/bin/llama-cli(+0xc57ea) [0x5610b15317ea]
/usr/lib/libc.so.6(+0x27c0e) [0x7fd95fe27c0e]
/usr/lib/libc.so.6(__libc_start_main+0x8b) [0x7fd95fe27d4b]
./llama.cpp/build/bin/llama-cli(+0xcd175) [0x5610b1539175]
terminate called after throwing an instance of 'std::runtime_error'
what(): Failed to parse input at pos 22: <think>
#|PNSD"}
20206319167
## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)� 05,108,106-10 1001-7, 2 43 x,p7709582,78 ��
Aborted (core dumped) ./llama.cpp/build/bin/llama-cli -dev CUDA0 -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ4_XS
If you're on CUDA 13.2, it's actually neither an Unsloth problem nor a llama.cpp problem but a CUDA problem.
CUDA 13.2
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0
Reportedly 13.3 is going to have a fix.
Yes sadly CUDA 13.2 is broken :( - see https://github.com/unslothai/unsloth/issues/4849
Temporarily use https://github.com/unslothai/llama.cpp/releases/tag/b8811 which has pre-compiled binaries for linux with CUDA 13.0 support - they work on CUDA 13.2 machines as well so no need to reinstall since CUDA is backwards compatible