CUDA 13.2 causes gibberish for UD-IQ4_XS quant on llama.cpp (b8815/latest)

by CodeFault - opened 6 days ago

I'm getting gibberish and sometimes crashes when running the UD-IQ4_XS quant with CUDA, but only CUDA on this quant . It works fine with Vulkan. And the UD-Q4_K_XL quant works fine with either backend. I don't know if this is a quantization issue, or a llama.cpp problem.

Example:

> PING

[Start thinking]

 所在

( 小证2318238MTech工程：01119

269年285394/77.4942S2S20.jpg]

The purpose and

Unf

there  

\$$
h_{o}o\orrlist /{}\nb8:3c
2am06de394f9

354cO27z489zY40g8gH16f0006e2564331585968e584e028361a22669_67264
} </FSUB}

> What is your knowledge cutoff date?

[Start thinking]

#|PNSD"}

20206319167

## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x192b6) [0x7fd96d3502b6]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x203) [0x7fd96d350633]
/home/kent/Downloads/llama.cpp/build/bin/libggml-base.so.0(+0x315c9) [0x7fd96d3685c9]
/usr/lib/libstdc++.so.6(+0xb3dbc) [0x7fd9602b3dbc]
/usr/lib/libstdc++.so.6(_ZSt10unexpectedv+0x0) [0x7fd960294644]
/usr/lib/libstdc++.so.6(+0xb40c8) [0x7fd9602b40c8]
./llama.cpp/build/bin/llama-cli(+0x71dfb) [0x5610b14dddfb]
./llama.cpp/build/bin/llama-cli(+0x20d5e8) [0x5610b16795e8]
./llama.cpp/build/bin/llama-cli(+0xe9a73) [0x5610b1555a73]
./llama.cpp/build/bin/llama-cli(+0x123859) [0x5610b158f859]
./llama.cpp/build/bin/llama-cli(+0x1373b8) [0x5610b15a33b8]
./llama.cpp/build/bin/llama-cli(+0xe21e8) [0x5610b154e1e8]
./llama.cpp/build/bin/llama-cli(+0xc57ea) [0x5610b15317ea]
/usr/lib/libc.so.6(+0x27c0e) [0x7fd95fe27c0e]
/usr/lib/libc.so.6(__libc_start_main+0x8b) [0x7fd95fe27d4b]
./llama.cpp/build/bin/llama-cli(+0xcd175) [0x5610b1539175]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to parse input at pos 22: <think>


#|PNSD"}

20206319167

## ?_ 查看opetakebieekxzxservicePathGhithang2250. (C4052, 0)� 05,108,106-10 1001-7, 2 43 x,p7709582,78 ��
Aborted                    (core dumped) ./llama.cpp/build/bin/llama-cli -dev CUDA0 -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ4_XS

danielhanchen

Unsloth AI org 6 days ago

What CUDA version are you using? @CodeFault

ilintar

6 days ago

If you're on CUDA 13.2, it's actually neither an Unsloth problem nor a llama.cpp problem but a CUDA problem.

CodeFault

6 days ago

@danielhanchen

CUDA 13.2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0

ilintar

6 days ago

Reportedly 13.3 is going to have a fix.

danielhanchen

Unsloth AI org 6 days ago

Yes sadly CUDA 13.2 is broken :( - see https://github.com/unslothai/unsloth/issues/4849
Temporarily use https://github.com/unslothai/llama.cpp/releases/tag/b8811 which has pre-compiled binaries for linux with CUDA 13.0 support - they work on CUDA 13.2 machines as well so no need to reinstall since CUDA is backwards compatible

shimmyshimmer

Unsloth AI org 6 days ago

This comment has been hidden (marked as Resolved)

shimmyshimmer changed discussion title from UD-IQ4_XS quant with CUDA on llama.cpp (b8815/latest) produces gibberish to CUDA 13.2 causes gibberish for UD-IQ4_XS quant on llama.cpp (b8815/latest) 6 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment