VLLM INFERENCE ERROR?
(base) jovyan@2386d5c2975d:/work$ vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 /work$
--port 8000
--tensor-parallel-size 1
--max-model-len 32768
--reasoning-parser deepseek_r1
INFO 11-08 18:27:14 [init.py:216] Automatically detected platform cuda.
(APIServer pid=21447) INFO 11-08 18:27:17 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=21447) INFO 11-08 18:27:17 [utils.py:233] non-default args: {'model_tag': 'Qwen/Qwen3-Next-80B-A3B-Thinking-FP8', 'model': 'Qwen/Qwen3-Next-80B-A3B-Thinking-FP8', 'max_model_len': 32768, 'reasoning_parser': 'deepseek_r1'}
(APIServer pid=21447) INFO 11-08 18:27:18 [model.py:547] Resolved architecture: Qwen3NextForCausalLM
(APIServer pid=21447) torch_dtype is deprecated! Use dtype instead!
(APIServer pid=21447) INFO 11-08 18:27:18 [model.py:1510] Using max model len 32768
(APIServer pid=21447) INFO 11-08 18:27:19 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=21447) INFO 11-08 18:27:19 [config.py:297] Hybrid or mamba-based model detected: disabling prefix caching since it is not yet supported.
(APIServer pid=21447) INFO 11-08 18:27:19 [config.py:308] Hybrid or mamba-based model detected: setting cudagraph mode to FULL_AND_PIECEWISE in order to optimize performance.
(APIServer pid=21447) INFO 11-08 18:27:19 [config.py:376] Setting attention block size to 544 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=21447) INFO 11-08 18:27:19 [config.py:397] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal.
INFO 11-08 18:27:23 [init.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:26 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:26 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='Qwen/Qwen3-Next-80B-A3B-Thinking-FP8', speculative_config=None, tokenizer='Qwen/Qwen3-Next-80B-A3B-Thinking-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='deepseek_r1'), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-Next-80B-A3B-Thinking-FP8, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:27 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=21559) WARNING 11-08 18:27:27 [topk_topp_sampler.py:66] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:27 [gpu_model_runner.py:2602] Starting to load model Qwen/Qwen3-Next-80B-A3B-Thinking-FP8...
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:27 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=21559) torch_dtype is deprecated! Use dtype instead!
(EngineCore_DP0 pid=21559) WARNING 11-08 18:27:27 [fp8.py:457] Failed to import DeepGemm kernels.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:27 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod.
(EngineCore_DP0 pid=21559) WARNING 11-08 18:27:27 [cuda.py:352] FlashInfer failed to import for V1 engine on Blackwell (SM 10.0) GPUs; it is recommended to install FlashInfer for better performance.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:27 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=21559) INFO 11-08 18:27:28 [weight_utils.py:392] Using model weights format ['*.safetensors']
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self.model = model_loader.load_model(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] self.load_weights(model, model_config)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] loaded_weights = model.load_weights(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1185, in load_weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] return loader.load_weights(weights)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 243, in _load_module
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] for child_prefix, child_weights in self._groupby_prefix(weights):
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 132, in _groupby_prefix
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] for prefix, group in itertools.groupby(weights_by_parts,
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 130, in
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] for weight_name, weight_data in weights)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 291, in
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] weights = ((name, weight) for name, weight in weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 246, in get_all_weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] yield from self._get_weights_iterator(primary_weights)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 160, in _get_weights_iterator
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 112, in _prepare_weights
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] hf_folder = download_weights_from_hf(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 398, in download_weights_from_hf
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] hf_folder = snapshot_download(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 89, in _inner_fn
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] return fn(*args, **kwargs)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py", line 388, in snapshot_download
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] bytes_progress = tqdm_class(
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] ^^^^^^^^^^^
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 84, in init
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] super().init(*args, **kwargs, disable=True)
(EngineCore_DP0 pid=21559) ERROR 11-08 18:27:29 [core.py:708] TypeError: tqdm.asyncio.tqdm_asyncio.init() got multiple values for keyword argument 'disable'
(EngineCore_DP0 pid=21559) Process EngineCore_DP0:
(EngineCore_DP0 pid=21559) Traceback (most recent call last):
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=21559) self.run()
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=21559) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=21559) raise e
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=21559) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=21559) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=21559) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=21559) self._init_executor()
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=21559) self.collective_rpc("load_model")
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=21559) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=21559) return func(*args, **kwargs)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=21559) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=21559) self.model = model_loader.load_model(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=21559) self.load_weights(model, model_config)
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=21559) loaded_weights = model.load_weights(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1185, in load_weights
(EngineCore_DP0 pid=21559) return loader.load_weights(weights)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=21559) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 243, in _load_module
(EngineCore_DP0 pid=21559) for child_prefix, child_weights in self._groupby_prefix(weights):
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 132, in _groupby_prefix
(EngineCore_DP0 pid=21559) for prefix, group in itertools.groupby(weights_by_parts,
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 130, in
(EngineCore_DP0 pid=21559) for weight_name, weight_data in weights)
(EngineCore_DP0 pid=21559) ^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 291, in
(EngineCore_DP0 pid=21559) weights = ((name, weight) for name, weight in weights
(EngineCore_DP0 pid=21559) ^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 246, in get_all_weights
(EngineCore_DP0 pid=21559) yield from self._get_weights_iterator(primary_weights)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 160, in _get_weights_iterator
(EngineCore_DP0 pid=21559) hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 112, in _prepare_weights
(EngineCore_DP0 pid=21559) hf_folder = download_weights_from_hf(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 398, in download_weights_from_hf
(EngineCore_DP0 pid=21559) hf_folder = snapshot_download(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 89, in _inner_fn
(EngineCore_DP0 pid=21559) return fn(*args, **kwargs)
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py", line 388, in snapshot_download
(EngineCore_DP0 pid=21559) bytes_progress = tqdm_class(
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 84, in init
(EngineCore_DP0 pid=21559) super().init(*args, **kwargs, disable=True)
(EngineCore_DP0 pid=21559) TypeError: tqdm.asyncio.tqdm_asyncio.init() got multiple values for keyword argument 'disable'
(EngineCore_DP0 pid=21559) Exception ignored in: <function tqdm.__del__ at 0x798419b94220>
(EngineCore_DP0 pid=21559) Traceback (most recent call last):
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/tqdm/std.py", line 1149, in del
(EngineCore_DP0 pid=21559) self.close()
(EngineCore_DP0 pid=21559) File "/opt/conda/lib/python3.12/site-packages/tqdm/std.py", line 1268, in close
(EngineCore_DP0 pid=21559) if self.disable:
(EngineCore_DP0 pid=21559) ^^^^^^^^^^^^
(EngineCore_DP0 pid=21559) AttributeError: 'DisabledTqdm' object has no attribute 'disable'
[rank0]:[W1108 18:27:29.928684650 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=21447) Traceback (most recent call last):
(APIServer pid=21447) File "/opt/conda/bin/vllm", line 7, in
(APIServer pid=21447) sys.exit(main())
(APIServer pid=21447) ^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=21447) args.dispatch_function(args)
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=21447) uvloop.run(run_server(args))
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=21447) return __asyncio.run(
(APIServer pid=21447) ^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=21447) return runner.run(main)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=21447) return self._loop.run_until_complete(task)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=21447) return await main
(APIServer pid=21447) ^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=21447) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=21447) async with build_async_engine_client(
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=21447) return await anext(self.gen)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=21447) async with build_async_engine_client_from_engine_args(
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=21447) return await anext(self.gen)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=21447) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/utils/init.py", line 1572, in inner
(APIServer pid=21447) return fn(*args, **kwargs)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=21447) return cls(
(APIServer pid=21447) ^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in init
(APIServer pid=21447) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=21447) return AsyncMPClient(*client_args)
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=21447) super().init(
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=21447) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=21447) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=21447) File "/opt/conda/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=21447) next(self.gen)
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=21447) wait_for_engine_startup(
(APIServer pid=21447) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=21447) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=21447) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(base) jovyan@2386d5c2975d:
