Issues while running the model within vLLMs nightly docker build as given in the readme
Tried running the command given inside the readme and having a 5060 Ti which is Blackwell too, but getting the issue of Key Error 'Ministral3' not found.
So wanted to clarify how did you guys test it out?
Looked inside and found that transformers latest build, 4.57.3 doesn't have Ministral3 model integrated and is present on 5.0.0rc0
So how do I run the model using vLLM?
Detailed logs:
sudo docker run --gpus all -p 8001:8000 --ipc=host vllm/vllm-openai:nightly --model Firworks/Ministral-3-14B-Instruct-2512-nvfp4 --dtype auto --gpu-memory-utilization 0.92 --max-model-len 16000
WARNING 12-20 03:39:20 [argparse_utils.py:195] With `vllm serve`, you should provide the model as a positional argument or in a config file instead of via the `--model` option. The `--model` option will be removed in v0.13.
(APIServer pid=1) INFO 12-20 03:39:20 [api_server.py:1262] vLLM API server version 0.14.0rc1.dev26+gff2168bca
(APIServer pid=1) INFO 12-20 03:39:20 [utils.py:253] non-default args: {'model_tag': 'Firworks/Ministral-3-14B-Instruct-2512-nvfp4', 'model': 'Firworks/Ministral-3-14B-Instruct-2512-nvfp4', 'max_model_len': 16000, 'gpu_memory_utilization': 0.92}
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1309, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1328, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 171, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 197, in build_async_engine_client_from_engine_args
(APIServer pid=1) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1344, in create_engine_config
(APIServer pid=1) model_config = self.create_model_config()
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1201, in create_model_config
(APIServer pid=1) return ModelConfig(
(APIServer pid=1) ^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/config/model.py", line 458, in __post_init__
(APIServer pid=1) hf_config = get_config(
(APIServer pid=1) ^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 613, in get_config
(APIServer pid=1) config_dict, config = config_parser.parse(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 148, in parse
(APIServer pid=1) config = AutoConfig.from_pretrained(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1372, in from_pretrained
(APIServer pid=1) return config_class.from_dict(config_dict, **unused_kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 808, in from_dict
(APIServer pid=1) config = cls(**config_dict)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/configuration_mistral3.py", line 113, in __init__
(APIServer pid=1) text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config)
(APIServer pid=1) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
(APIServer pid=1) raise KeyError(key)
(APIServer pid=1) KeyError: 'ministral3'
If you look near the top of the model card I had a note about running into the same issue:
Note: I was not able to get these running with latest, nightly, or v0.12.0 of the VLLM docker container. This might still be useful for someone wanting to run it with transformers or anyone who wants to see if they can figure out a tricky VLLM command to get this running. If I figure it out or someone lets me know a command to get it running in the VLLM docker container I'll update the model card command.
Hopefully VLLM will get updated and support it soon but it doesn't yet.
I haven't tried it but it might be worth trying to base from the existing VLLM docker container and update transformers in it, that's how I got GLM-4.6V-Flash running. You might try following the same steps from the GLM-4.6V-Flash repo but for this Ministral repo.