Can we get it on 2bit?

by sabowsla - opened 13 days ago

Discussion

sabowsla

13 days ago

or is that too much ?

iSkye

MLX Community org 13 days ago

I can try, you have 16gb RAM, Right?

iSkye

MLX Community org 9 days ago

Have done a 2.5bit quant that work very well after a few tests. It’s about 10GB in size. Will upload soon

sabowsla

9 days ago

•

edited 9 days ago

ty ! yes i do have 16Gb, but i just implemented a new algorightm for mlx that reduces Memory Usage at inference time by 82% more or less, did allowed me 200k context on this machine with gemma4 e2b and 100k wiht gemma e4b

iSkye

MLX Community org 9 days ago

Ok its uploaded, you can Try: https://huggingface.co/mlx-community/gemma-4-26B-A4B-it-heretic-msq-2.6bit

sabowsla

9 days ago

tysm! will test

iSkye

MLX Community org 9 days ago

Please leave a comment if it works well for you :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment