New uploads to add llama.cpp fixes

#3
by danielhanchen - opened

New uploads add the following fixes:

  1. vocab: fix Gemma4 tokenizer (#21343) - https://github.com/ggml-org/llama.cpp/pull/21343
  2. fix: gemma 4 template (#21326) - https://github.com/ggml-org/llama.cpp/pull/21326

Some of these simply don't work in LMStudio. with gemma-4-E2B-it-UD-Q8_K_XL:
2026-04-03 23:25:05 [DEBUG]
llama.cpp abort:1276: GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS) failed

That said you did not in fact add new uploads for E2B specifically six hours ago?

Unsloth AI org

We just updated them again in response to:

  1. kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
  2. CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens https://github.com/ggml-org/llama.cpp/pull/21566
  3. vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
  4. convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
  5. common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
  6. llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
  7. llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406

Sign up or log in to comment