Doesn't seem to work.
On a Mac mini m4 pro with 24gb this just outputs nonsense in oMLX v.0.3.6
I’m going to upload its updated quant soon. I don’t know what happened to this one. oQ2 is the only quant for which it spits garbage on my setup (with requantized models from scratch)
Thanks for letting me know, I wouldn’t have caught it because I don’t use this model in my stack.
It should work now. I just updated and it passed my smoke test.
It should work now. I just updated and it passed my smoke test.
Yep, it works now; thank you!
It still spits out the occasional formatting issues or weird words here and there, but that is most likely due to the 3-bit quant as these Gemma models seem particularly sensitive to such compression.
Maybe the oQ4 (I can't run it sadly) or a oQ3.5e would be more accurate, but it'll have to wait for a hardware upgrade.