RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
#10
by mancub - opened
Having no luck running this DFlash model with cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4.
It all loads up but the moment I send a prompt in, it crashes.
Current vLLM nightly, CU130, TP=2.
(EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] EngineCore encountered a fatal error. (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] Traceback (most recent call last): (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1129, in run_engine_core (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] engine_core.run_busy_loop() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1170, in run_busy_loop (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] self._process_engine_step() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1209, in _process_engine_step (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] outputs, model_executed = self.step_fn() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] ^^^^^^^^^^^^^^ (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 521, in step_with_batch_queue (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] model_output = future.result() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] ^^^^^^^^^^^^^^^ (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] return super().result() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] ^^^^^^^^^^^^^^^^ (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] return self.__get_result() (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] raise self._exception (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] response = self.aggregate(self.get_response()) (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] raise RuntimeError( (EngineCore pid=19180) ERROR 05-07 18:17:48 [core.py:1138] RuntimeError: Worker failed with error 'expected mat1 and mat2 to have the same dtype, but got: float != c10::Half', please check the stack trace above for the root cause (Worker_TP0 pid=19199) INFO 05-07 18:17:48 [multiproc_executor.py:775] Parent process exited, terminating worker queues (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] AsyncLLM output_handler failed. (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] Traceback (most recent call last): (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 660, in output_handler (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] outputs = await engine_core.get_output_async() (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 998, in get_output_async (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] raise self._format_exception(outputs) from None (APIServer pid=19011) ERROR 05-07 18:17:48 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] Error in chat completion stream generator. (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] Traceback (most recent call last): (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 487, in chat_completion_stream_generator (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] async for res in result_generator: (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 579, in generate (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] out = q.get_nowait() or await q.get() (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] ^^^^^^^^^^^^^ (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/output_processor.py", line 85, in get (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] raise output (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 660, in output_handler (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] outputs = await engine_core.get_output_async() (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 998, in get_output_async (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] raise self._format_exception(outputs) from None (APIServer pid=19011) ERROR 05-07 18:17:48 [serving.py:984] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self.model_runner.sample_tokens(grammar_output) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self.model_runner.sample_tokens(grammar_output) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] propose_draft_token_ids(sampled_token_ids) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] propose_draft_token_ids(sampled_token_ids) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] self._draft_token_ids = self.propose_draft_token_ids( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] self._draft_token_ids = self.propose_draft_token_ids( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] draft_token_ids = self.drafter.propose( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] draft_token_ids = self.drafter.propose( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] target_hidden_states = self.model.combine_hidden_states( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] target_hidden_states = self.model.combine_hidden_states( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] result = self.model.fc(hidden_states) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] result = self.model.fc(hidden_states) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = self.quant_method.apply(self, x, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = self.quant_method.apply(self, x, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return dispatch_unquantized_gemm()(layer, x, layer.weight, bias) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return dispatch_unquantized_gemm()(layer, x, layer.weight, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return torch.nn.functional.linear(x, weight, bias) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return torch.nn.functional.linear(x, weight, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return super().__torch_function__(func, types, args, kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return super().__torch_function__(func, types, args, kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self.model_runner.sample_tokens(grammar_output) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 778, in sample_tokens (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self.model_runner.sample_tokens(grammar_output) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] propose_draft_token_ids(sampled_token_ids) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4309, in sample_tokens (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] self._draft_token_ids = self.propose_draft_token_ids( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] propose_draft_token_ids(sampled_token_ids) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4270, in propose_draft_token_ids (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] self._draft_token_ids = self.propose_draft_token_ids( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] draft_token_ids = self.drafter.propose( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4810, in propose_draft_token_ids (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] draft_token_ids = self.drafter.propose( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] target_hidden_states = self.model.combine_hidden_states( (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 450, in propose (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] target_hidden_states = self.model.combine_hidden_states( (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] result = self.model.fc(hidden_states) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_dflash.py", line 588, in combine_hidden_states (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] result = self.model.fc(hidden_states) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = self.quant_method.apply(self, x, bias) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 397, in forward (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] output = self.quant_method.apply(self, x, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return dispatch_unquantized_gemm()(layer, x, layer.weight, bias) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 231, in apply (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return dispatch_unquantized_gemm()(layer, x, layer.weight, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return torch.nn.functional.linear(x, weight, bias) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 98, in default_unquantized_gemm (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return torch.nn.functional.linear(x, weight, bias) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return super().__torch_function__(func, types, args, kwargs) (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] File "/home/user/Envs/llm/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 126, in __torch_function__ (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] return super().__torch_function__(func, types, args, kwargs) (Worker_TP1 pid=19200) ERROR 05-07 18:17:48 [multiproc_executor.py:962] (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=19199) ERROR 05-07 18:17:48 [multiproc_executor.py:962] RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
Is the issue the specific cyankiwi quant, or the DFlash model?