Anybody got this working with vLLM ?

#1
by alecauduro - opened

trying like this:

export MODEL_ID=unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit

docker run
--runtime nvidia
-e VLLM_USE_V1=1
--gpus all
--ipc=host
-p "${MODEL_PORT}:8000"
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}"
-v "${HF_HOME}:/root/.cache/huggingface"
vllm/vllm-openai:latest
--model ${MODEL_ID}
--enforce-eager
--tool-call-parser mistral
--config_format mistral
--load_format mistral
--enable-auto-tool-choice
--tokenizer-mode mistral
--quantization bitsandbytes
--limit-mm-per-prompt 'image=1'

and getting the error :
Value error, Failed to load mistral 'params.json' config for model unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit. Please check if the model is a mistral-format model and if the config file exists. [type=value_error, input_value=ArgsKwargs((), {'model': ..., 'model_impl': 'auto'}), input_type=ArgsKwargs]

I tried downloading params.json from the original model, but it doesn't seem to find it.

I am also unable to get it work work. It fails in the same way for me with those parameters.

When running justvllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:

ERROR 06-24 09:58:52 [core.py:515]   File "...", line 1131, in weight_loader
ERROR 06-24 09:58:52 [core.py:515]     assert param_data.shape == loaded_weight.shape
ERROR 06-24 09:58:52 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-24 09:58:52 [core.py:515] AssertionError

However, the older version runs just fine using vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .

Thanks ! I do appreciate the unsloth dynamic quants greatly !

I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.

I am also unable to get it work work. It fails in the same way for me with those parameters.

When running justvllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:

ERROR 06-24 09:58:52 [core.py:515]   File "...", line 1131, in weight_loader
ERROR 06-24 09:58:52 [core.py:515]     assert param_data.shape == loaded_weight.shape
ERROR 06-24 09:58:52 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-24 09:58:52 [core.py:515] AssertionError

However, the older version runs just fine using vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .

Thanks ! I do appreciate the unsloth dynamic quants greatly !

same here

I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.

What were the changes you made to the params ?

Thanks!

Hi! I noticed there is a difference in the config.json file compared to unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit, this unsloth version sets model type as pixtral in its config.

Sign up or log in to comment