Anybody got this working with vLLM ?
trying like this:
export MODEL_ID=unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit
docker run
--runtime nvidia
-e VLLM_USE_V1=1
--gpus all
--ipc=host
-p "${MODEL_PORT}:8000"
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}"
-v "${HF_HOME}:/root/.cache/huggingface"
vllm/vllm-openai:latest
--model ${MODEL_ID}
--enforce-eager
--tool-call-parser mistral
--config_format mistral
--load_format mistral
--enable-auto-tool-choice
--tokenizer-mode mistral
--quantization bitsandbytes
--limit-mm-per-prompt 'image=1'
and getting the error :
Value error, Failed to load mistral 'params.json' config for model unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit. Please check if the model is a mistral-format model and if the config file exists. [type=value_error, input_value=ArgsKwargs((), {'model': ..., 'model_impl': 'auto'}), input_type=ArgsKwargs]
I tried downloading params.json from the original model, but it doesn't seem to find it.
I am also unable to get it work work. It fails in the same way for me with those parameters.
When running justvllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:
ERROR 06-24 09:58:52 [core.py:515] File "...", line 1131, in weight_loader
ERROR 06-24 09:58:52 [core.py:515] assert param_data.shape == loaded_weight.shape
ERROR 06-24 09:58:52 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-24 09:58:52 [core.py:515] AssertionError
However, the older version runs just fine using vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .
Thanks ! I do appreciate the unsloth dynamic quants greatly !
I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.
I am also unable to get it work work. It fails in the same way for me with those parameters.
When running just
vllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:ERROR 06-24 09:58:52 [core.py:515] File "...", line 1131, in weight_loader ERROR 06-24 09:58:52 [core.py:515] assert param_data.shape == loaded_weight.shape ERROR 06-24 09:58:52 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-24 09:58:52 [core.py:515] AssertionErrorHowever, the older version runs just fine using
vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .Thanks ! I do appreciate the unsloth dynamic quants greatly !
same here
I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.
What were the changes you made to the params ?
Thanks!
Hi! I noticed there is a difference in the config.json file compared to unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit, this unsloth version sets model type as pixtral in its config.