Can we get it on 2bit?
#1
by sabowsla - opened
or is that too much ?
I can try, you have 16gb RAM, Right?
Have done a 2.5bit quant that work very well after a few tests. It’s about 10GB in size. Will upload soon
ty ! yes i do have 16Gb, but i just implemented a new algorightm for mlx that reduces Memory Usage at inference time by 82% more or less, did allowed me 200k context on this machine with gemma4 e2b and 100k wiht gemma e4b
Ok its uploaded, you can Try: https://huggingface.co/mlx-community/gemma-4-26B-A4B-it-heretic-msq-2.6bit
tysm! will test
Please leave a comment if it works well for you :)