Anybody got this working with vLLM ?

by alecauduro - opened Jun 24, 2025

Jun 24, 2025

trying like this:

export MODEL_ID=unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit

docker run
--runtime nvidia
-e VLLM_USE_V1=1
--gpus all
--ipc=host
-p "${MODEL_PORT}:8000"
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}"
-v "${HF_HOME}:/root/.cache/huggingface"
vllm/vllm-openai:latest
--model ${MODEL_ID}
--enforce-eager
--tool-call-parser mistral
--config_format mistral
--load_format mistral
--enable-auto-tool-choice
--tokenizer-mode mistral
--quantization bitsandbytes
--limit-mm-per-prompt 'image=1'

and getting the error :
Value error, Failed to load mistral 'params.json' config for model unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit. Please check if the model is a mistral-format model and if the config file exists. [type=value_error, input_value=ArgsKwargs((), {'model': ..., 'model_impl': 'auto'}), input_type=ArgsKwargs]

I tried downloading params.json from the original model, but it doesn't seem to find it.

cbunivofutah

Jun 24, 2025

•

edited Jun 24, 2025

I am also unable to get it work work. It fails in the same way for me with those parameters.

When running justvllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:

ERROR 06-24 09:58:52 [core.py:515]   File "...", line 1131, in weight_loader
ERROR 06-24 09:58:52 [core.py:515]     assert param_data.shape == loaded_weight.shape
ERROR 06-24 09:58:52 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-24 09:58:52 [core.py:515] AssertionError

However, the older version runs just fine using vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .

Thanks ! I do appreciate the unsloth dynamic quants greatly !

alecauduro

Jun 25, 2025

•

edited Jun 25, 2025

I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.

schauppi

Jul 2, 2025

I am also unable to get it work work. It fails in the same way for me with those parameters.

When running justvllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit, I get the following error:
ERROR 06-24 09:58:52 [core.py:515]   File "...", line 1131, in weight_loader
ERROR 06-24 09:58:52 [core.py:515]     assert param_data.shape == loaded_weight.shape
ERROR 06-24 09:58:52 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-24 09:58:52 [core.py:515] AssertionError
However, the older version runs just fine using vllm serve unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
I am on vllm v0.9.1 .

Thanks ! I do appreciate the unsloth dynamic quants greatly !

same here

cbunivofutah

Jul 7, 2025

I figured out this model is using the HF repo format, that's why I got the error. But still couldn't run it after fixing the params accordingly.
But unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit also worked for me. Not sure what the difference is.

What were the changes you made to the params ?

Thanks!

jspab

Aug 7, 2025

Hi! I noticed there is a difference in the config.json file compared to unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit, this unsloth version sets model type as pixtral in its config.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment