Thanks mradermacher!

by wimmmm - opened Dec 30, 2025

Dec 30, 2025

Just wanted to share my appreciation for your quants!

I don't know what's going on exactly, but I get noticably faster token generation (30t/s on Ryzen AI 395 vs 25t/s) out of your IQ3 quant compared to other Q2_* quants despite your iquant having more bits per weight.

This is my new go-to model for reasoning and coding :)

inputout

Jan 11

I don't know what's going on exactly, but I get noticably faster token generation (30t/s on Ryzen AI 395 vs 25t/s) out of your IQ3 quant compared to other Q2_* quants despite your iquant having more bits per weight.

Hi @wimmmm , it's really interesting to read about the specific speed of larger models for the Ryzen AI 395.
What would always be interesting and valuable information (and which can't be found anywhere!) is how fast larger models run on the Ryzen AI 395.
Do you have experience with models such as GLM-4.7 Q5_K_M (255GB) or IQ4_XS (192GB) or similar (Deepseek) how fast they run on a Ryzen AI 395 ??
Such large models on limited systems must always run in hybrid mode partially from SSD... (ik_llama and settings for -ngl and -ot exps=CPU are particularly interesting here)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment