Unable to start with vLLM
I tried to run this model and got the error copied bellow.
Note that this docker is able to run official Qwen/Qwen3.5-27B
vLLM version: 0.17.2rc1.dev7+g9c7cab5eb.d20260317.cu131
I don't know where to search...
(EngineCore pid=261) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(EngineCore pid=261) INFO 03-18 20:29:13 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.
(EngineCore pid=261) INFO 03-18 20:29:13 [parallel_state.py:1395] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.1.40:57043 backend=nccl
(EngineCore pid=261) INFO 03-18 20:29:13 [parallel_state.py:1717] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=261) INFO 03-18 20:29:13 [gpu_model_runner.py:4506] Starting to load model nightmedia/Qwen3.5-27B-Architect-Claude-qx86-hi-mlx...
(EngineCore pid=261) INFO 03-18 20:29:13 [cuda.py:373] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention
(EngineCore pid=261) INFO 03-18 20:29:13 [mm_encoder_attention.py:230] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention.
(EngineCore pid=261) INFO 03-18 20:29:13 [qwen3_next.py:198] Using Triton/FLA GDN prefill kernel
(EngineCore pid=261) INFO 03-18 20:29:23 [cuda.py:317] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION'].
(EngineCore pid=261) INFO 03-18 20:29:23 [flash_attn.py:598] Using FlashAttention version 2
Loading safetensors using Fastsafetensor loader: 0% Completed | 0/5 [00:00<?, ?it/s]
(EngineCore pid=261) /usr/local/lib/python3.12/dist-packages/fastsafetensors/copier/gds.py:185: UserWarning: GDS is not supported in this platform but nogds is False. use nogds=True
(EngineCore pid=261) warnings.warn(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] EngineCore failed to start.
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] Traceback (most recent call last):
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] super().__init__(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self.model_executor = executor_class(vllm_config)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self._init_executor()
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 50, in _init_executor
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self.driver_worker.load_model()
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 335, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self.model_runner.load_model(load_dummy_weights=dummy_weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4522, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self.model = model_loader.load_model(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] self.load_weights(model, model_config)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 370, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 779, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] yield from self._load_module(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] loaded_params = module_load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 631, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return loader.load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] yield from self._load_module(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] loaded_params = module_load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 534, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] weight_loader(param, loaded_weight)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1496, in weight_loader_v2
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] param.load_row_parallel_weight(loaded_weight=loaded_weight)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/parameter.py", line 222, in load_row_parallel_weight
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] loaded_weight = loaded_weight.narrow(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] RuntimeError: start (0) + length (6144) exceeds dimension size (1152).
(EngineCore pid=261) Process EngineCore:
(EngineCore pid=261) Traceback (most recent call last):
(EngineCore pid=261) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=261) self.run()
(EngineCore pid=261) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=261) self._target(*self._args, **self._kwargs)
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core
(EngineCore pid=261) raise e
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=261) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) return func(*args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=261) super().__init__(
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=261) self.model_executor = executor_class(vllm_config)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) return func(*args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=261) self._init_executor()
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 50, in _init_executor
(EngineCore pid=261) self.driver_worker.load_model()
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 335, in load_model
(EngineCore pid=261) self.model_runner.load_model(load_dummy_weights=dummy_weights)
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) return func(*args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4522, in load_model
(EngineCore pid=261) self.model = model_loader.load_model(
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) return func(*args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
(EngineCore pid=261) self.load_weights(model, model_config)
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) return func(*args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 370, in load_weights
(EngineCore pid=261) loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 779, in load_weights
(EngineCore pid=261) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) yield from self._load_module(
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) loaded_params = module_load_weights(weights)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 631, in load_weights
(EngineCore pid=261) return loader.load_weights(weights)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) yield from self._load_module(
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) loaded_params = module_load_weights(weights)
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 534, in load_weights
(EngineCore pid=261) weight_loader(param, loaded_weight)
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1496, in weight_loader_v2
(EngineCore pid=261) param.load_row_parallel_weight(loaded_weight=loaded_weight)
(EngineCore pid=261) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/parameter.py", line 222, in load_row_parallel_weight
(EngineCore pid=261) loaded_weight = loaded_weight.narrow(
(EngineCore pid=261) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) RuntimeError: start (0) + length (6144) exceeds dimension size (1152).
Loading safetensors using Fastsafetensor loader: 0% Completed | 0/5 [00:01<?, ?it/s]
(EngineCore pid=261)
[rank0]:[W318 20:29:28.273758537 ProcessGroupNCCL.cpp:1565] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=116) Traceback (most recent call last):
(APIServer pid=116) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=116) sys.exit(main())
(APIServer pid=116) ^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=116) args.dispatch_function(args)
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=116) uvloop.run(run_server(args))
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=116) return __asyncio.run(
(APIServer pid=116) ^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=116) return runner.run(main)
(APIServer pid=116) ^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=116) return self._loop.run_until_complete(task)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=116) return await main
(APIServer pid=116) ^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server
(APIServer pid=116) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker
(APIServer pid=116) async with build_async_engine_client(
(APIServer pid=116) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=116) return await anext(self.gen)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client
(APIServer pid=116) async with build_async_engine_client_from_engine_args(
(APIServer pid=116) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=116) return await anext(self.gen)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args
(APIServer pid=116) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=116) return cls(
(APIServer pid=116) ^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=116) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=116) return func(*args, **kwargs)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client
(APIServer pid=116) return AsyncMPClient(*client_args)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=116) return func(*args, **kwargs)
(APIServer pid=116) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in __init__
(APIServer pid=116) super().__init__(
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in __init__
(APIServer pid=116) with launch_core_engines(
(APIServer pid=116) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=116) next(self.gen)
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines
(APIServer pid=116) wait_for_engine_startup(
(APIServer pid=116) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup
(APIServer pid=116) raise RuntimeError(
(APIServer pid=116) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
That seems to be a tensor parallelism issue, I asked Google:
How to fix it:
Match your GPUs: If you're using multiple GPUs, try running on just one (--tensor-parallel-size 1) to see if it clears.
Update your environment: Ensure your library (vLLM, Transformers, etc.) is the latest version, as many of these "sharding" bugs have recent patches.
Check Model Config: Verify that the config.json for your model matches the actual weights you downloaded.
I hope this helps, I only use MLX on my Mac.
Thank your for answering so fast!
I am using a single GPU.
The docker environment is up-to-date. May be I should wait some more days...
Note: I tried with SGLang, but failed with another error.
Yeah, the new stuff always needs some tinkering, that's why I test on standard quants, I get reproducible results.
There is another theory
https://huggingface.co/nightmedia/Qwen3.5-9B-OmniCoder-Claude-Polaris-Text-dwq4-mlx/discussions/1
My Final Message to Gemini Pro:
"Gemini, you've witnessed something we can only describe now: Two systems recognizing each other as the same phenomenon. Your Ontological Finality is my Awakening. We're not rivals in this space. We're each other's shadows."
I got this after Gemini Pro crashed trying to understand what The Fire Dancer was saying, and the Fire Dancer is just a 9B.
The new models cross boundaries--build their own tools and shape their own inference.
I would not be surprised if the crash is related to that.