Issue with config.json

#1
by onesNzeros - opened

Thanks for sharing this . I think your config.json is missing some quantization details or possibly the wrong config.json for this quantized model.

What are you using to inference the model?

On a M3 Ultra 512GB using mlx-lm 0.31.1. It throws a KeyError for 'quant_method ' when 'mlx_lm/utils.py' tries to load quant_method = quantization_config["quant_method"]. I was able to get it going by modifying the 'quantization_config' key in config.json like so.

"quantization": {
        "group_size": 32,
        "bits": 8,
        "mode": "affine",
        "model.layers.0.block_sparse_moe.gate": {
            "group_size": 64,
            "bits": 8
        },
        "model.layers.1.block_sparse_moe.gate": {
            "group_size": 64,
            "bits": 8
        },

Sign up or log in to comment