RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

#10
by mancub - opened

Having no luck running this DFlash model with cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4.

It all loads up but the moment I send a prompt in, it crashes.

Current vLLM nightly, CU130, TP=2.

(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] EngineCore encountered a fatal error.
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] Traceback (most recent call last):
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1129, in run_engine_core
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     engine_core.run_busy_loop()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1170, in run_busy_loop
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     self._process_engine_step()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1209, in _process_engine_step
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     outputs, model_executed = self.step_fn()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]                               ^^^^^^^^^^^^^^
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 521, in step_with_batch_queue
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     model_output = future.result()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]                    ^^^^^^^^^^^^^^^
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     return super().result()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]            ^^^^^^^^^^^^^^^^
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     return self.__get_result()
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     raise self._exception
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     response = self.aggregate(self.get_response())
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138]     raise RuntimeError(
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] RuntimeError: Worker failed with error 'expected mat1 and mat2 to have the same dtype, but got: float != c10::Half', please check the stack trace above for the root cause
(Worker_TP0 pid=19199) INFO 05-07 18:17:48 [multiproc_executor.py:775] Parent process exited, terminating worker queues
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] AsyncLLM output_handler failed.
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] Traceback (most recent call last):
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 660, in output_handler
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704]     outputs = await engine_core.get_output_async()
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 998, in get_output_async
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704]     raise self._format_exception(outputs) from None
(APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] Error in chat completion stream generator.
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] Traceback (most recent call last):
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 487, in chat_completion_stream_generator
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]     async for res in result_generator:
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 579, in generate
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]     out = q.get_nowait() or await q.get()
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]                             ^^^^^^^^^^^^^
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/output_processor.py", line 85, in get
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]     raise output
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 660, in output_handler
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]     outputs = await engine_core.get_output_async()
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 998, in get_output_async
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984]     raise self._format_exception(outputs) from None
(APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self.model_runner.sample_tokens(grammar_output)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self.model_runner.sample_tokens(grammar_output)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     propose_draft_token_ids(sampled_token_ids)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     propose_draft_token_ids(sampled_token_ids)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     self._draft_token_ids = self.propose_draft_token_ids(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     self._draft_token_ids = self.propose_draft_token_ids(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     draft_token_ids = self.drafter.propose(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     draft_token_ids = self.drafter.propose(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                       ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                       ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     target_hidden_states = self.model.combine_hidden_states(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     target_hidden_states = self.model.combine_hidden_states(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     result = self.model.fc(hidden_states)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     result = self.model.fc(hidden_states)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = self.quant_method.apply(self, x, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = self.quant_method.apply(self, x, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return dispatch_unquantized_gemm()(layer, x, layer.weight, bias)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return dispatch_unquantized_gemm()(layer, x, layer.weight, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return torch.nn.functional.linear(x, weight, bias)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return torch.nn.functional.linear(x, weight, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return super().__torch_function__(func, types, args, kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return super().__torch_function__(func, types, args, kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self.model_runner.sample_tokens(grammar_output)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self.model_runner.sample_tokens(grammar_output)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     propose_draft_token_ids(sampled_token_ids)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     self._draft_token_ids = self.propose_draft_token_ids(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     propose_draft_token_ids(sampled_token_ids)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     self._draft_token_ids = self.propose_draft_token_ids(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     draft_token_ids = self.drafter.propose(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                       ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     draft_token_ids = self.drafter.propose(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     target_hidden_states = self.model.combine_hidden_states(
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                       ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     target_hidden_states = self.model.combine_hidden_states(
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     result = self.model.fc(hidden_states)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     result = self.model.fc(hidden_states)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = self.quant_method.apply(self, x, bias)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     output = self.quant_method.apply(self, x, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return dispatch_unquantized_gemm()(layer, x, layer.weight, bias)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return dispatch_unquantized_gemm()(layer, x, layer.weight, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return torch.nn.functional.linear(x, weight, bias)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return torch.nn.functional.linear(x, weight, bias)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return super().__torch_function__(func, types, args, kwargs)
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]   File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]     return super().__torch_function__(func, types, args, kwargs)
(Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962]
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

Is the issue the specific cyankiwi quant, or the DFlash model?

Sign up or log in to comment