Model is stopping in the middle of coding & not able to properly use tools
Maybe somebody have better config?
c:\0_llama_server\llama-server ^
-m a:\0_LM_Studio\unsloth\gemma-4-26B-A4B-it-GGUF\gemma-4-26B-A4B-it-UD-Q5_K_S.gguf ^
-ngl 999 --threads 22 --ctx-size 196610 --alias qwen3.5:27b ^
--flash-attn on ^
--log-prefix ^
--batch-size 1024 ^
--ubatch-size 512 ^
--host 0.0.0.0 ^
--port 8080 ^
--repeat_penalty 1.0 ^
--presence_penalty 1.0 ^
--temp 1.0 --top-p 0.95 --top-k 64 ^
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 ^
--jinja --ctx-checkpoints 128 --reasoning 1
Have you tried using it in Unsloth Studio? The tooling might be much better
What perplexity are you getting with the full bf16 and the quants for these recent gemma-4 models? I just did a fresh convert from mainline llama.cpp and on wiki.test.raw usual it looks oddly high. Did you guys check before releasing?
# gemma-4-31B-it bf16 gguf
perplexity: calculating perplexity over 580 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 10.43 seconds per pass - ETA 25.20 minutes
[1]574.4568,[2]4089.3231,[3]4249.0486,[4]5063.7465,[5]4281.4964,[6]4130.3125,[7]3823.6853,[8]3746.9452,[9]4295.3372,[10]4034.1529,[11]4100.3942,[12]4216.5958,[13]4897.4907,[14]4997.9979,[15]4653.1832,[16]5105.5784,[17]4468.6070,[18]4778.2157,[19]4544.0890,[20]4442.8254,[21]4147.2106,
Final estimate: PPL = 2929.9010 +/- 57.85519
Thanks!
EDIT: I see you mentioned high PPL here recently https://github.com/ggml-org/llama.cpp/issues/21321#issuecomment-4180380395
@danielhanchenI would love to use unsloth studio, the problem is i need to install it from some script, i would rather have the executable installation like lmstudio. Due to recent events i don't trust running any installations from scripts that are editable anytime. When i would download some exe file, i know it is saved and not changed. If it's a script online, it can always we changed anytime. :D
And i don't mean like intention from unsloth, but if somebody would get access and change the script :D
Confirm. Tool calling is full shit. Can call some of them, but sometimes loop trying call tool. Call it with error again and again. Unusable as agentic model.
@danielhanchenI would love to use unsloth studio, the problem is i need to install it from some script, i would rather have the executable installation like lmstudio. Due to recent events i don't trust running any installations from scripts that are editable anytime. When i would download some exe file, i know it is saved and not changed. If it's a script online, it can always we changed anytime. :D
And i don't mean like intention from unsloth, but if somebody would get access and change the script :D
I didn't check how unsloth does it, but typically you can just download the repo yourself to your machine and build it too, but I get what you mean.
See https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/discussions/20 for updated quants
Ulala :D thank you <3 I have found the problem that i had, it was using opencode with 2GB RAM xD, changed it to 4GB and now it's fine