It's working,Thanks!

#4
by stxpnet888 - opened
    killall llama-server 2>/dev/null; sleep 3 

cd ~/turbo-tan/build/bin && LLAMA_SET_ROWS=0 ./llama-server
-m /data/models/Qwen3.6-35B-A3B-TQ3_4S.gguf
--host 0.0.0.0 --port 12026 --fit off
--ctx-size 186000 -n -1
--batch-size 4096 --ubatch-size 2048
--cache-type-k q4_0 --cache-type-v tq3_0 --cache-reuse 768
--parallel 1 --threads 3 --temp 0.63 --top_p 0.95 --top_k 23
--reasoning on --reasoning-budget 1024 --reasoning-format deepseek
--metrics --jinja --seed 42 --cont-batching --threads-batch 5
-ngl 99
--no-mmap --log-file /tmp/llama-p100.log &

It works like a charm on my old I5 10600-64GRAM,P100 16G card. average 30 tokens/second ! Thank you!

30 tok/s! That amazing for such an old card!

Sign up or log in to comment