I think I have been solved Gemma-4-31B-it heavy system memory occupied
The default size of "checkpoint" is 3600.054 MiB, more or less 4G. Everyone has know that default checkpoint of Gemma-4-31B-it is "32". Total memory of my Archlinux is 128G, 3600.054 MiB*32=115,201.73 MiB, more or less 112~120G will be occupied in my system.
So, I have been uesed --swa-checkpoint 10 in my system. Every session has limited 36G system memory in my Archlinux.
But, another problem is -cram . Everyone has been noticed you have ending one session, and starting new one, the system memory have not reduce, I means not reduce to the beginning. As the volume of context increases, the number of sessions rises, and system memory steadily diminishes. So, I set -cram 0 could solved this problem.
llama-server -hf unsloth/gemma-4-31B-it-GGUF:BF16 --jinja -c 0 --host 0.0.0.0 --port 8033 --device CUDA0,CUDA1,CUDA2,CUDA3 --flash-attn on -ngl 999 --no-warmup --direct-io --temp 1.0 --top-p 0.95 --top-k 64 --split-mode row --reasoning on --cache-ram 0 --swa-checkpoints 10 --repeat-penalty 1.0
If you guys have any ideas, please give me some advice. :)
Noticed the same, currently using
--temp 1.0
--top-p 0.95
--top-k 64
--chat-template-kwargs '{"enable_thinking":false}'
-np 2
--no-mmap
--cache-ram 0
--ctx-checkpoints 16
On strix halo 128G, RAM usage dropped significantly after the last 2 options were added