KeyError: 'mistral4' running vllm-ms4

#9
by TimKoornstra - opened

Running with image mistralllm/vllm-ms4:latest gives me the following stack trace:

(APIServer pid=50) Traceback (most recent call last):
(APIServer pid=50)   File "/usr/local/bin/vllm", line 6, in <module>
(APIServer pid=50)     sys.exit(main())
(APIServer pid=50)              ^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=50)     args.dispatch_function(args)
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=50)     uvloop.run(run_server(args))
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=50)     return __asyncio.run(
(APIServer pid=50)            ^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=50)     return runner.run(main)
(APIServer pid=50)            ^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=50)     return self._loop.run_until_complete(task)
(APIServer pid=50)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=50)     return await main
(APIServer pid=50)            ^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 653, in run_server
(APIServer pid=50)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 667, in run_server_worker
(APIServer pid=50)     async with build_async_engine_client(
(APIServer pid=50)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=50)     return await anext(self.gen)
(APIServer pid=50)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 101, in build_async_engine_client
(APIServer pid=50)     async with build_async_engine_client_from_engine_args(
(APIServer pid=50)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=50)     return await anext(self.gen)
(APIServer pid=50)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 127, in build_async_engine_client_from_engine_args
(APIServer pid=50)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=50)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/engine/arg_utils.py", line 1495, in create_engine_config
(APIServer pid=50)     model_config = self.create_model_config()
(APIServer pid=50)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/engine/arg_utils.py", line 1347, in create_model_config
(APIServer pid=50)     return ModelConfig(
(APIServer pid=50)            ^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=50)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=50)   File "/workspace/vllm/vllm/config/model.py", line 476, in __post_init__
(APIServer pid=50)     hf_config = get_config(
(APIServer pid=50)                 ^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/transformers_utils/config.py", line 643, in get_config
(APIServer pid=50)     config_dict, config = config_parser.parse(
(APIServer pid=50)                           ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/workspace/vllm/vllm/transformers_utils/config.py", line 189, in parse
(APIServer pid=50)     config = AutoConfig.from_pretrained(
(APIServer pid=50)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1484, in from_pretrained
(APIServer pid=50)     return config_class.from_dict(config_dict, **unused_kwargs)
(APIServer pid=50)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 757, in from_dict
(APIServer pid=50)     config = cls(**config_dict)
(APIServer pid=50)              ^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/dataclasses.py", line 279, in init_with_validate
(APIServer pid=50)     initial_init(self, *args, **kwargs)  # type: ignore [call-arg]
(APIServer pid=50)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/dataclasses.py", line 194, in __init__
(APIServer pid=50)     self.__post_init__(**additional_kwargs)
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/configuration_mistral3.py", line 84, in __post_init__
(APIServer pid=50)     self.text_config = CONFIG_MAPPING[self.text_config["model_type"]](**self.text_config)
(APIServer pid=50)                        ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=50)   File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1175, in __getitem__
(APIServer pid=50)     raise KeyError(key)
(APIServer pid=50) KeyError: 'mistral4'

Seems like a configuration issue?

TimKoornstra changed discussion title from KeyError: 'mistral4' to KeyError: 'mistral4' running vllm-ms4
Mistral AI_ org

Checking!

Mistral AI_ org

Hmm I cannot reproduce this - everything works as expected for me with this command:

vllm serve mistralai/Mistral-Small-4-119B-2603 --max-model-len 262144 --tensor-parallel-size 2 --attention-backend FLASH_ATTN_MLA \
  --tool-call-parser mistral --enable-auto-tool-choice --reasoning-parser mistral --max_num_batched_tokens 16384 --max_num_seqs 128 \
  --gpu_memory_utilization 0.8

Could it be that you overwrote the default transformers version of this docker?

Mistral AI_ org

Transformers version should be 5.3.0.dev

Redownloaded the model and now its working fine, thanks!

TimKoornstra changed discussion status to closed

Sign up or log in to comment