Request: oQ4 quantization

by 0xtimi2233 - opened about 21 hours ago

Hi there, since you've already done the awesome 8-bit MLX conversion, any chance you could help make a standard oQ4 quant for this Hy-MT2-30B model? I'm really eager to try it on Mac, but my local hardware just doesn't have the juice to handle a 30B calibration myself.

For reference, the official workflow can be found here: oMLX oQ Quantization Guide.

It would be awesome if you could look into this when you have some time. Thanks for your great work on the MLX version!

QwQbb changed discussion status to closed about 21 hours ago

QwQbb changed discussion status to open about 21 hours ago

QwQbb

Owner about 21 hours ago

Hi, thanks a lot for the kind words and for the request!

I actually tried to make an oQ quant for Hy-MT2-30B-A3B as well, but I’m currently unable to produce a valid oQ4/oQ6/oQ8 quant with the oMLX oQ pipeline.

Even the oQ8 run failed during the sensitivity-measurement stage with this error:

oQ8: sensitivity measurement produced no scores. Check the preceding log lines for the root cause (model load, calibration data, or layer discovery), and either fix it or pass an explicit sensitivity_model_path.

As far as I understand, oQ needs that calibration/sensitivity-measurement step in order to build the layer-wise quantization plan. Since the sensitivity scores are not produced, I cannot generate a proper standard oQ4 quant for this model at the moment.

My current guess is that this is related to the hy_v3 architecture used by Hy-MT2. The 8-bit MLX version works through a custom hy_v3.py MLX-LM adapter, but it looks like the oMLX oQ calibration path may need explicit support or fixes for the hy_v3 architecture before a reliable oQ quant can be produced.

So for now, I think it is better to wait until oMLX / MLX-LM support for this architecture improves, or until a reliable workaround is found.

In the meantime, I’ll make and upload a regular MLX 4-bit version instead. It will not be the same as a standard oQ4 quant, but it should still be useful for running the model on Mac with lower memory usage.

Thanks again for checking out the model and for the suggestion!

0xtimi2233

about 21 hours ago

Thanks for the detailed explanation! That makes perfect sense regarding the hy_v3 architecture issue.

A regular 4-bit MLX version would be awesome in the meantime to get it running on Mac. Really appreciate your quick reply and help with this!

0xtimi2233 changed discussion status to closed about 21 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment