Suggestion: Add IQ3 variants.

#7
by tarruda - opened

IQ3 might extract more of the original model's performance on 128G RAM devices. For example, I have a M1 ultra with 128G and it can run MiMo 2.5 (310B parameter) with unsloth's IQ3_XXS quant and 128k context:
image

This is fully GPU offloaded BTW.

Sign up or log in to comment