Thanks mradermacher!

#2
by wimmmm - opened

Just wanted to share my appreciation for your quants!

I don't know what's going on exactly, but I get noticably faster token generation (30t/s on Ryzen AI 395 vs 25t/s) out of your IQ3 quant compared to other Q2_* quants despite your iquant having more bits per weight.

This is my new go-to model for reasoning and coding :)

I don't know what's going on exactly, but I get noticably faster token generation (30t/s on Ryzen AI 395 vs 25t/s) out of your IQ3 quant compared to other Q2_* quants despite your iquant having more bits per weight.

Hi @wimmmm , it's really interesting to read about the specific speed of larger models for the Ryzen AI 395.
What would always be interesting and valuable information (and which can't be found anywhere!) is how fast larger models run on the Ryzen AI 395.
Do you have experience with models such as GLM-4.7 Q5_K_M (255GB) or IQ4_XS (192GB) or similar (Deepseek) how fast they run on a Ryzen AI 395 ??
Such large models on limited systems must always run in hybrid mode partially from SSD... (ik_llama and settings for -ngl and -ot exps=CPU are particularly interesting here)

Sign up or log in to comment