Awesome Quant - great performance on m4 max

#1
by Narutoouz - opened

I am getting 51 tokens / s , then falls down to 15 - 20 tokens/ s .
Apple silicon local AI coding as came a long way.

Thanks for making this quant and enabling mlx community to enjoy a wonderful model in their own hardware!

  • Can you also make gemma 26b it quant of this sort ?

thankyou

Sign up or log in to comment