Can confirm that the new uploads seem to work fine. UD-IQ1_S Quant tested on RAM maxed machine

#3
by nimishchaudhari - opened

I confirm having tested DeepSeek-V3.2-UD-IQ1_S on my 32GB VRAM (RTX 3090 + 3060 12GB) and 192GB DDR5 5600 i7-14700F.

I recently got these ram sticks and didn't try out the big models yet, this works great.
I can run 65k context, I get 5 tps tg and 7 tps for pp. This is not bad at all from my very budget setup 😀
(Got lucky with a store in MX, got 2x 96GB sets for 400€ each.

Forgot to mention I use ik_llama.cpp

This is my command for reference.

"DeepSeek-V3.2-UD-IQ1_S":
cmd: |
${ik_llama_cpp}
-m ${models_dir}/LLMs//UD-IQ1_S/DeepSeek-V3.2-UD-IQ1_S-00001-of-00004.gguf
--jinja
-a 'Deepseek-v3.2'
-ger --special
--reasoning-format deepseek
-ngl 99
-sm layer
-ts 0.67,0.33
-ot exps=CPU
-mla 3 -fa 1 -amb 512
-c 65565
-ctk q8_0
--temp 0.6
--parallel 1
--threads 8
--top-p 0.95
--min-p 0.01

prompt eval time = 400335.07 ms / 7179 tokens ( 55.76 ms per token, 17.93 tokens per second)
time = 10465.57 ms / 43 tokens ( 243.39 ms per token, 4.11 tokens per second)

2 2m ago DeepSeek-V3.2-UD-IQ1_S - 7,179 43 17.93 t/s 4.11 t/s 410.80s
1 20m ago DeepSeek-V3.2-UD-IQ1_S - 19 690 6.32 t/s 4.33 t/s 162.51s

Sign up or log in to comment