Input tensor shape warnings

#2
by Qnibbles - opened

Loading the model with vllm/vllm-openai:cu130-nightly (0.18.1rc1.dev32+g1f0d21064) on an RTX PRO 6000 causes warnings:

vllm | (EngineCore pid=239) INFO 03-25 23:49:57 [monitor.py:48] torch.compile took 97.99 s in total
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (32) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (32) < num_heads (48). This may indicate the inputs were passed in head-first format [B, H, T, ...] when head_first=False was specified. Please verify your input tensor format matches the expected shape [B, T, H, ...].
vllm | (EngineCore pid=239)   return fn(*contiguous_args, **contiguous_kwargs)
vllm | (EngineCore pid=239) INFO 03-25 23:50:53 [monitor.py:76] Initial profiling/warmup run took 55.70 s

Everything seems to work normally, but I wonder if this is affecting it negatively somehow.
Thanks for the quant!

Sign up or log in to comment