Awesome Quant - great performance on m4 max
#1
by Narutoouz - opened
I am getting 51 tokens / s , then falls down to 15 - 20 tokens/ s .
Apple silicon local AI coding as came a long way.
Thanks for making this quant and enabling mlx community to enjoy a wonderful model in their own hardware!
- Can you also make gemma 26b it quant of this sort ?
thankyou
https://huggingface.co/thetom-ai/Gemma-4-26B-A4B-it-ConfigI-MLX
Untested so far. got lots on my plate.