New updates with many llama.cpp fixes

by danielhanchen - opened 14 days ago

Unsloth AI org 14 days ago

•

Please re-download. We just updated them again in response to:

kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens https://github.com/ggml-org/llama.cpp/pull/21566
vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406

Do NOT use CUDA 13.2 to run any quant (this is not an Unsloth issue) see here. You can use our llama.cpp precompiled binary which uses CUDA 13, or you can use Unsloth Studio which does not use 13.2.

danielhanchen pinned discussion 14 days ago

danielhanchen unpinned discussion 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment