Can we get it on 2bit?

#1
by sabowsla - opened

or is that too much ?

MLX Community org

I can try, you have 16gb RAM, Right?

MLX Community org

Have done a 2.5bit quant that work very well after a few tests. It’s about 10GB in size. Will upload soon

ty ! yes i do have 16Gb, but i just implemented a new algorightm for mlx that reduces Memory Usage at inference time by 82% more or less, did allowed me 200k context on this machine with gemma4 e2b and 100k wiht gemma e4b

MLX Community org

tysm! will test

MLX Community org

Please leave a comment if it works well for you :)

Sign up or log in to comment