I think I have been solved Gemma-4-31B-it heavy system memory occupied

#14
by HougeLangley - opened

The default size of "checkpoint" is 3600.054 MiB, more or less 4G. Everyone has know that default checkpoint of Gemma-4-31B-it is "32". Total memory of my Archlinux is 128G, 3600.054 MiB*32=115,201.73 MiB, more or less 112~120G will be occupied in my system.

So, I have been uesed --swa-checkpoint 10 in my system. Every session has limited 36G system memory in my Archlinux.

But, another problem is -cram . Everyone has been noticed you have ending one session, and starting new one, the system memory have not reduce, I means not reduce to the beginning. As the volume of context increases, the number of sessions rises, and system memory steadily diminishes. So, I set -cram 0 could solved this problem.

llama-server -hf unsloth/gemma-4-31B-it-GGUF:BF16 --jinja -c 0 --host 0.0.0.0 --port 8033 --device CUDA0,CUDA1,CUDA2,CUDA3 --flash-attn on -ngl 999 --no-warmup --direct-io --temp 1.0 --top-p 0.95 --top-k 64 --split-mode row --reasoning on --cache-ram 0 --swa-checkpoints 10 --repeat-penalty 1.0

If you guys have any ideas, please give me some advice. :)

Noticed the same, currently using
--temp 1.0
--top-p 0.95
--top-k 64
--chat-template-kwargs '{"enable_thinking":false}'
-np 2
--no-mmap
--cache-ram 0
--ctx-checkpoints 16

On strix halo 128G, RAM usage dropped significantly after the last 2 options were added

I just finished posting about memory problems I'm having with this model over here. Does my description sound like the same thing that you guys are reporting?

Sign up or log in to comment