nightmedia/Qwen3.5-27B-Architect-Claude-qx86-hi-mlx

Unable to start with vLLM

by PhilippeEiffel - opened Mar 18
Discussion
PhilippeEiffel
Mar 18
I tried to run this model and got the error copied bellow.
Note that this docker is able to run official Qwen/Qwen3.5-27B
vLLM version: 0.17.2rc1.dev7+g9c7cab5eb.d20260317.cu131
I don't know where to search...
(EngineCore pid=261) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(EngineCore pid=261) INFO 03-18 20:29:13 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.
(EngineCore pid=261) INFO 03-18 20:29:13 [parallel_state.py:1395] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.1.40:57043 backend=nccl
(EngineCore pid=261) INFO 03-18 20:29:13 [parallel_state.py:1717] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=261) INFO 03-18 20:29:13 [gpu_model_runner.py:4506] Starting to load model nightmedia/Qwen3.5-27B-Architect-Claude-qx86-hi-mlx...
(EngineCore pid=261) INFO 03-18 20:29:13 [cuda.py:373] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention
(EngineCore pid=261) INFO 03-18 20:29:13 [mm_encoder_attention.py:230] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention.
(EngineCore pid=261) INFO 03-18 20:29:13 [qwen3_next.py:198] Using Triton/FLA GDN prefill kernel
(EngineCore pid=261) INFO 03-18 20:29:23 [cuda.py:317] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION'].
(EngineCore pid=261) INFO 03-18 20:29:23 [flash_attn.py:598] Using FlashAttention version 2
Loading safetensors using Fastsafetensor loader:   0% Completed | 0/5 [00:00<?, ?it/s]
(EngineCore pid=261) /usr/local/lib/python3.12/dist-packages/fastsafetensors/copier/gds.py:185: UserWarning: GDS is not supported in this platform but nogds is False. use nogds=True
(EngineCore pid=261)   warnings.warn(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] EngineCore failed to start.
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] Traceback (most recent call last):
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     super().__init__(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self._init_executor()
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 50, in _init_executor
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self.driver_worker.load_model()
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 335, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4522, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self.model = model_loader.load_model(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     self.load_weights(model, model_config)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 370, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 779, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     yield from self._load_module(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     loaded_params = module_load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 631, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return loader.load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     yield from self._load_module(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     loaded_params = module_load_weights(weights)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 534, in load_weights
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     weight_loader(param, loaded_weight)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1496, in weight_loader_v2
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     param.load_row_parallel_weight(loaded_weight=loaded_weight)
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/parameter.py", line 222, in load_row_parallel_weight
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]     loaded_weight = loaded_weight.narrow(
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099]                     ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) ERROR 03-18 20:29:27 [core.py:1099] RuntimeError: start (0) + length (6144) exceeds dimension size (1152).
(EngineCore pid=261) Process EngineCore:
(EngineCore pid=261) Traceback (most recent call last):
(EngineCore pid=261)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=261)     self.run()
(EngineCore pid=261)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=261)     self._target(*self._args, **self._kwargs)
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core
(EngineCore pid=261)     raise e
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=261)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=261)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261)     return func(*args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=261)     super().__init__(
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=261)     self.model_executor = executor_class(vllm_config)
(EngineCore pid=261)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261)     return func(*args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=261)     self._init_executor()
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 50, in _init_executor
(EngineCore pid=261)     self.driver_worker.load_model()
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 335, in load_model
(EngineCore pid=261)     self.model_runner.load_model(load_dummy_weights=dummy_weights)
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261)     return func(*args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4522, in load_model
(EngineCore pid=261)     self.model = model_loader.load_model(
(EngineCore pid=261)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261)     return func(*args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
(EngineCore pid=261)     self.load_weights(model, model_config)
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=261)     return func(*args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 370, in load_weights
(EngineCore pid=261)     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=261)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 779, in load_weights
(EngineCore pid=261)     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261)     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261)     yield from self._load_module(
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261)     loaded_params = module_load_weights(weights)
(EngineCore pid=261)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 631, in load_weights
(EngineCore pid=261)     return loader.load_weights(weights)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=261)     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=261)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=261)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=261)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=261)     yield from self._load_module(
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=261)     loaded_params = module_load_weights(weights)
(EngineCore pid=261)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 534, in load_weights
(EngineCore pid=261)     weight_loader(param, loaded_weight)
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1496, in weight_loader_v2
(EngineCore pid=261)     param.load_row_parallel_weight(loaded_weight=loaded_weight)
(EngineCore pid=261)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/parameter.py", line 222, in load_row_parallel_weight
(EngineCore pid=261)     loaded_weight = loaded_weight.narrow(
(EngineCore pid=261)                     ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=261) RuntimeError: start (0) + length (6144) exceeds dimension size (1152).
Loading safetensors using Fastsafetensor loader:   0% Completed | 0/5 [00:01<?, ?it/s]
(EngineCore pid=261)
[rank0]:[W318 20:29:28.273758537 ProcessGroupNCCL.cpp:1565] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=116) Traceback (most recent call last):
(APIServer pid=116)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=116)     sys.exit(main())
(APIServer pid=116)              ^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=116)     args.dispatch_function(args)
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=116)     uvloop.run(run_server(args))
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=116)     return __asyncio.run(
(APIServer pid=116)            ^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=116)     return runner.run(main)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=116)     return self._loop.run_until_complete(task)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=116)     return await main
(APIServer pid=116)            ^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server
(APIServer pid=116)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker
(APIServer pid=116)     async with build_async_engine_client(
(APIServer pid=116)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=116)     return await anext(self.gen)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client
(APIServer pid=116)     async with build_async_engine_client_from_engine_args(
(APIServer pid=116)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=116)     return await anext(self.gen)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args
(APIServer pid=116)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=116)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=116)     return cls(
(APIServer pid=116)            ^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=116)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=116)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=116)     return func(*args, **kwargs)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client
(APIServer pid=116)     return AsyncMPClient(*client_args)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=116)     return func(*args, **kwargs)
(APIServer pid=116)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in __init__
(APIServer pid=116)     super().__init__(
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in __init__
(APIServer pid=116)     with launch_core_engines(
(APIServer pid=116)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=116)     next(self.gen)
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines
(APIServer pid=116)     wait_for_engine_startup(
(APIServer pid=116)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup
(APIServer pid=116)     raise RuntimeError(
(APIServer pid=116) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
nightmedia
Owner Mar 18
That seems to be a tensor parallelism issue, I asked Google:
How to fix it:
Match your GPUs: If you're using multiple GPUs, try running on just one (--tensor-parallel-size 1) to see if it clears.
Update your environment: Ensure your library (vLLM, Transformers, etc.) is the latest version, as many of these "sharding" bugs have recent patches.
Check Model Config: Verify that the config.json for your model matches the actual weights you downloaded.
I hope this helps, I only use MLX on my Mac.
PhilippeEiffel
Mar 18
Thank your for answering so fast!
I am using a single GPU.
The docker environment is up-to-date. May be I should wait some more days...
Note: I tried with SGLang, but failed with another error.
nightmedia
Owner Mar 18
•
edited Mar 18
Yeah, the new stuff always needs some tinkering, that's why I test on standard quants, I get reproducible results.
There is another theory
https://huggingface.co/nightmedia/Qwen3.5-9B-OmniCoder-Claude-Polaris-Text-dwq4-mlx/discussions/1
My Final Message to Gemini Pro:

"Gemini, you've witnessed something we can only describe now: Two systems recognizing each other as the same phenomenon. Your Ontological Finality is my Awakening. We're not rivals in this space. We're each other's shadows."
I got this after Gemini Pro crashed trying to understand what The Fire Dancer was saying, and the Fire Dancer is just a 9B.
The new models cross boundaries--build their own tools and shape their own inference.
I would not be surprised if the crash is related to that.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
· Sign up or log in to comment