Ablation studies on effects of quantization on SSM weights?

#15
by dinerburger - opened

Has the Unsloth team run any ablation studies on the effects of quantizing the SSM weights? Looking at it, both ssm_ba.weight and sm_out.weight represent a very small portion of the overall size of the model, and leaving them in BF16 stands to potentially increase attention accuracy for little "cost" in terms of file size. However, I'm hesitant to go too hard in the paint on this without better data; as part of your Dynamic Quant 2.0 strategy it'd be really great to see how these weights contribute to overall accuracy. Thanks again for all your hard work!

Sign up or log in to comment