New uploads to add llama.cpp fixes

by danielhanchen - opened 19 days ago

Discussion

danielhanchen

Unsloth AI org 19 days ago

•

edited 19 days ago

New uploads add the following fixes:

vocab: fix Gemma4 tokenizer (#21343) - https://github.com/ggml-org/llama.cpp/pull/21343
fix: gemma 4 template (#21326) - https://github.com/ggml-org/llama.cpp/pull/21326

DiffusionFanatic1

18 days ago

Some of these simply don't work in LMStudio. with gemma-4-E2B-it-UD-Q8_K_XL:
2026-04-03 23:25:05 [DEBUG]
llama.cpp abort:1276: GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS) failed

DiffusionFanatic1

18 days ago

That said you did not in fact add new uploads for E2B specifically six hours ago?

danielhanchen

Unsloth AI org 14 days ago

We just updated them again in response to:

kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens https://github.com/ggml-org/llama.cpp/pull/21566
vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment