Issue with config.json
#1
by onesNzeros - opened
Thanks for sharing this . I think your config.json is missing some quantization details or possibly the wrong config.json for this quantized model.
What are you using to inference the model?
On a M3 Ultra 512GB using mlx-lm 0.31.1. It throws a KeyError for 'quant_method ' when 'mlx_lm/utils.py' tries to load quant_method = quantization_config["quant_method"]. I was able to get it going by modifying the 'quantization_config' key in config.json like so.
"quantization": {
"group_size": 32,
"bits": 8,
"mode": "affine",
"model.layers.0.block_sparse_moe.gate": {
"group_size": 64,
"bits": 8
},
"model.layers.1.block_sparse_moe.gate": {
"group_size": 64,
"bits": 8
},