latency is too long in llama.cpp
#5
by AMator - opened
llama-server -m /home/user/Gemma4_Q4_K_M.gguf -fa on -np 2 -ctk q8_0 -ctv q8_0 --host 0.0.0.0 -b 4096 -ub 1024 --port 11434 -ngl 99 -c 16384 --mmproj /home/user/Gemma4_mmproj.gguf
it took 5mins to start generation per gen; compared less than 1sec in original gemma 4.
solved. not model's fault. sorry.