Issue with config.json

by onesNzeros - opened 7 days ago

Thanks for sharing this . I think your config.json is missing some quantization details or possibly the wrong config.json for this quantized model.

inferencerlabs

Owner 7 days ago

What are you using to inference the model?

onesNzeros

7 days ago

On a M3 Ultra 512GB using mlx-lm 0.31.1. It throws a KeyError for 'quant_method ' when 'mlx_lm/utils.py' tries to load quant_method = quantization_config["quant_method"]. I was able to get it going by modifying the 'quantization_config' key in config.json like so.

"quantization": {
        "group_size": 32,
        "bits": 8,
        "mode": "affine",
        "model.layers.0.block_sparse_moe.gate": {
            "group_size": 64,
            "bits": 8
        },
        "model.layers.1.block_sparse_moe.gate": {
            "group_size": 64,
            "bits": 8
        },

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment