Model is stopping in the middle of coding & not able to properly use tools

by Ukro - opened 20 days ago

Maybe somebody have better config?

c:\0_llama_server\llama-server ^
-m a:\0_LM_Studio\unsloth\gemma-4-26B-A4B-it-GGUF\gemma-4-26B-A4B-it-UD-Q5_K_S.gguf ^
-ngl 999 --threads 22 --ctx-size 196610 --alias qwen3.5:27b ^
--flash-attn on ^
--log-prefix ^
--batch-size 1024 ^
--ubatch-size 512 ^
--host 0.0.0.0 ^
--port 8080 ^
--repeat_penalty 1.0 ^
--presence_penalty 1.0 ^
--temp 1.0 --top-p 0.95 --top-k 64 ^
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 ^
--jinja --ctx-checkpoints 128 --reasoning 1

danielhanchen

Unsloth AI org 20 days ago

•

edited 18 days ago

Have you tried using it in Unsloth Studio? The tooling might be much better

ubergarm

20 days ago

•

edited 20 days ago

@danielhanchen

What perplexity are you getting with the full bf16 and the quants for these recent gemma-4 models? I just did a fresh convert from mainline llama.cpp and on wiki.test.raw usual it looks oddly high. Did you guys check before releasing?

# gemma-4-31B-it bf16 gguf
perplexity: calculating perplexity over 580 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 10.43 seconds per pass - ETA 25.20 minutes
[1]574.4568,[2]4089.3231,[3]4249.0486,[4]5063.7465,[5]4281.4964,[6]4130.3125,[7]3823.6853,[8]3746.9452,[9]4295.3372,[10]4034.1529,[11]4100.3942,[12]4216.5958,[13]4897.4907,[14]4997.9979,[15]4653.1832,[16]5105.5784,[17]4468.6070,[18]4778.2157,[19]4544.0890,[20]4442.8254,[21]4147.2106,

Final estimate: PPL = 2929.9010 +/- 57.85519

Thanks!

EDIT: I see you mentioned high PPL here recently https://github.com/ggml-org/llama.cpp/issues/21321#issuecomment-4180380395

Ukro

20 days ago

•

edited 20 days ago

@danielhanchenI would love to use unsloth studio, the problem is i need to install it from some script, i would rather have the executable installation like lmstudio. Due to recent events i don't trust running any installations from scripts that are editable anytime. When i would download some exe file, i know it is saved and not changed. If it's a script online, it can always we changed anytime. :D
And i don't mean like intention from unsloth, but if somebody would get access and change the script :D

DmitryAk

20 days ago

Confirm. Tool calling is full shit. Can call some of them, but sometimes loop trying call tool. Call it with error again and again. Unusable as agentic model.

YearZero

19 days ago

@danielhanchenI would love to use unsloth studio, the problem is i need to install it from some script, i would rather have the executable installation like lmstudio. Due to recent events i don't trust running any installations from scripts that are editable anytime. When i would download some exe file, i know it is saved and not changed. If it's a script online, it can always we changed anytime. :D
And i don't mean like intention from unsloth, but if somebody would get access and change the script :D

I didn't check how unsloth does it, but typically you can just download the repo yourself to your machine and build it too, but I get what you mean.

danielhanchen

Unsloth AI org 14 days ago

See https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/discussions/20 for updated quants

Ukro

14 days ago

See https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/discussions/20 for updated quants

Ulala :D thank you <3 I have found the problem that i had, it was using opencode with 2GB RAM xD, changed it to 4GB and now it's fine

Ukro changed discussion status to closed 14 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment