Instructions to use QwQbb/Hy-MT2-30B-A3B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use QwQbb/Hy-MT2-30B-A3B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Hy-MT2-30B-A3B-MLX-8bit QwQbb/Hy-MT2-30B-A3B-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Request: oQ4 quantization
Hi there, since you've already done the awesome 8-bit MLX conversion, any chance you could help make a standard oQ4 quant for this Hy-MT2-30B model? I'm really eager to try it on Mac, but my local hardware just doesn't have the juice to handle a 30B calibration myself.
For reference, the official workflow can be found here: oMLX oQ Quantization Guide.
It would be awesome if you could look into this when you have some time. Thanks for your great work on the MLX version!
Hi, thanks a lot for the kind words and for the request!
I actually tried to make an oQ quant for Hy-MT2-30B-A3B as well, but I’m currently unable to produce a valid oQ4/oQ6/oQ8 quant with the oMLX oQ pipeline.
Even the oQ8 run failed during the sensitivity-measurement stage with this error:
oQ8: sensitivity measurement produced no scores. Check the preceding log lines for the root cause (model load, calibration data, or layer discovery), and either fix it or pass an explicit sensitivity_model_path.
As far as I understand, oQ needs that calibration/sensitivity-measurement step in order to build the layer-wise quantization plan. Since the sensitivity scores are not produced, I cannot generate a proper standard oQ4 quant for this model at the moment.
My current guess is that this is related to the hy_v3 architecture used by Hy-MT2. The 8-bit MLX version works through a custom hy_v3.py MLX-LM adapter, but it looks like the oMLX oQ calibration path may need explicit support or fixes for the hy_v3 architecture before a reliable oQ quant can be produced.
So for now, I think it is better to wait until oMLX / MLX-LM support for this architecture improves, or until a reliable workaround is found.
In the meantime, I’ll make and upload a regular MLX 4-bit version instead. It will not be the same as a standard oQ4 quant, but it should still be useful for running the model on Mac with lower memory usage.
Thanks again for checking out the model and for the suggestion!
Thanks for the detailed explanation! That makes perfect sense regarding the hy_v3 architecture issue.
A regular 4-bit MLX version would be awesome in the meantime to get it running on Mac. Really appreciate your quick reply and help with this!

