CUDA 13.2 causes gibberish for UD-IQ4_XS quant on llama.cpp (b8815/latest)

#3
by CodeFault - opened

I'm getting gibberish and sometimes crashes when running the UD-IQ4_XS quant with CUDA, but only CUDA on this quant . It works fine with Vulkan. And the UD-Q4_K_XL quant works fine with either backend. I don't know if this is a quantization issue, or a llama.cpp problem.

Example:

> PING

[Start thinking]

 所在

( 小证2318238MTech工程:01119

269年285394/77.4942S2S20.jpg]

The purpose and

Unf

there  

\$$
h_{o}o\orrlist /{}\nb8:3c
2am06de394f9

354cO27z489zY40g8gH16f0006e2564331585968e584e028361a22669_67264
} </FSUB}
> What is your knowledge cutoff date?

[Start thinking]

#|PNSD"}

20206319167

## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x192b6) [0x7fd96d3502b6]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x203) [0x7fd96d350633]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x315c9) [0x7fd96d3685c9]
/usr/lib/libstdc++.so.6(+0xb3dbc) [0x7fd9602b3dbc]
/usr/lib/libstdc++.so.6(_ZSt10unexpectedv+0x0) [0x7fd960294644]
/usr/lib/libstdc++.so.6(+0xb40c8) [0x7fd9602b40c8]
./llama.cpp/build/bin/llama-cli(+0x71dfb) [0x5610b14dddfb]
./llama.cpp/build/bin/llama-cli(+0x20d5e8) [0x5610b16795e8]
./llama.cpp/build/bin/llama-cli(+0xe9a73) [0x5610b1555a73]
./llama.cpp/build/bin/llama-cli(+0x123859) [0x5610b158f859]
./llama.cpp/build/bin/llama-cli(+0x1373b8) [0x5610b15a33b8]
./llama.cpp/build/bin/llama-cli(+0xe21e8) [0x5610b154e1e8]
./llama.cpp/build/bin/llama-cli(+0xc57ea) [0x5610b15317ea]
/usr/lib/libc.so.6(+0x27c0e) [0x7fd95fe27c0e]
/usr/lib/libc.so.6(__libc_start_main+0x8b) [0x7fd95fe27d4b]
./llama.cpp/build/bin/llama-cli(+0xcd175) [0x5610b1539175]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to parse input at pos 22: <think>


#|PNSD"}

20206319167

## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)� 05,108,106-10 1001-7, 2 43 x,p7709582,78 ��
Aborted                    (core dumped) ./llama.cpp/build/bin/llama-cli -dev CUDA0 -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ4_XS
Unsloth AI org

What CUDA version are you using? @CodeFault

If you're on CUDA 13.2, it's actually neither an Unsloth problem nor a llama.cpp problem but a CUDA problem.

@danielhanchen

CUDA 13.2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0

Reportedly 13.3 is going to have a fix.

Unsloth AI org

Yes sadly CUDA 13.2 is broken :( - see https://github.com/unslothai/unsloth/issues/4849
Temporarily use https://github.com/unslothai/llama.cpp/releases/tag/b8811 which has pre-compiled binaries for linux with CUDA 13.0 support - they work on CUDA 13.2 machines as well so no need to reinstall since CUDA is backwards compatible

Unsloth AI org
This comment has been hidden (marked as Resolved)
shimmyshimmer changed discussion title from UD-IQ4_XS quant with CUDA on llama.cpp (b8815/latest) produces gibberish to CUDA 13.2 causes gibberish for UD-IQ4_XS quant on llama.cpp (b8815/latest)

Sign up or log in to comment