mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-FP8-Dynamic

Input tensor shape warnings

by Qnibbles - opened 28 days ago

Loading the model with vllm/vllm-openai:cu130-nightly (0.18.1rc1.dev32+g1f0d21064) on an RTX PRO 6000 causes warnings:

vllm | (EngineCore pid=239) INFO 03-25 23:49:57 [monitor.py:48] torch.compile took 97.99 s in total
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (32) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (32) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) INFO 03-25 23:50:53 [monitor.py:76] Initial profiling/warmup run took 55.70 s

Everything seems to work normally, but I wonder if this is affecting it negatively somehow.
Thanks for the quant!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment