New uploads to add llama.cpp fixes
#3
by danielhanchen - opened
New uploads add the following fixes:
- vocab: fix Gemma4 tokenizer (#21343) - https://github.com/ggml-org/llama.cpp/pull/21343
- fix: gemma 4 template (#21326) - https://github.com/ggml-org/llama.cpp/pull/21326
Some of these simply don't work in LMStudio. with gemma-4-E2B-it-UD-Q8_K_XL:
2026-04-03 23:25:05 [DEBUG]
llama.cpp abort:1276: GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS) failed
That said you did not in fact add new uploads for E2B specifically six hours ago?
We just updated them again in response to:
- kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
- CUDA: check for buffer overlap before fusing - CRITICAL fixes
<unused24> tokenshttps://github.com/ggml-org/llama.cpp/pull/21566 - vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
- convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
- common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
- llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
- llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406