EPYC 9355 CPU-only sweep-bench

#6
by sousekd - opened

People on Reddit sometimes ask about EPYC CPU-only performance; my GPUs are currently out-of-order, so here are CPU-only results from a single Turin 9355 (12x DDR5-6400) running GLM-4.7-Flash IQ5_K with ik_llama.cpp:

./llama-sweep-bench \
    --model "$MODEL_PATH" \
    --no-mmap --merge-qkv \
    -mla 3 -amb 512 \
    -b 2048 -ub 1024 \
    -ctk f16 -ctv f16 -c 131072 \
    --threads 20 \
    --threads-batch 30 \
    --warmup-batch \
    -n 128
PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
1024 128 0 1.118 916.26 1.650 77.58
1024 128 31744 11.758 87.09 4.543 28.18
1024 128 64512 22.428 45.66 7.834 16.34
1024 128 97280 33.221 30.82 11.212 11.42
1024 128 130048 43.154 23.73 14.239 8.99

PP

image

TG

image

Sign up or log in to comment