vllm docker version vllm/vllm-openai:gemma4 detect Qwen3ForCausalLM model architecture and only allow max_model_len 40960

#20

by jiangtaozh - opened 15 days ago

(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299]
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.1rc1.dev28+g8617f8676
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299] █▄█▀ █ █ █ █ model Qwen/Qwen3-0.6B
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:299]
(APIServer pid=1) INFO 04-04 05:23:08 [utils.py:233] non-default args: {'model_tag': '/app/models/google/gemma-4-31B-it', 'enable_auto_tool_choice': True, 'tool_call_parser': 'gemma4', 'host': '0.0.0.0', 'api_key': ['mykey'], 'trust_remote_code': True, 'max_model_len': 131072, 'served_model_name': ['gemma-4-31B-it'], 'override_generation_config': {'temperature': 1.0, 'top_p': 0.95, 'top_k': 64}, 'reasoning_parser': 'gemma4', 'disable_custom_all_reduce': True, 'gpu_memory_utilization': 0.95, 'limit_mm_per_prompt': {'image': 1}, 'max_num_batched_tokens': 131072, 'max_num_seqs': 8}
(APIServer pid=1) INFO 04-04 05:23:15 [model.py:554] Resolved architecture: Qwen3ForCausalLM
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "", line 198, in _run_module_as_main
(APIServer pid=1) File "", line 88, in _run_code
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 726, in
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 686, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 700, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 101, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 125, in build_async_engine_client_from_engine_args
(APIServer pid=1) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1574, in create_engine_config
(APIServer pid=1) model_config = self.create_model_config()
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1422, in create_model_config
(APIServer pid=1) return ModelConfig(
(APIServer pid=1) ^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in init
(APIServer pid=1) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=1) Value error, User-specified max_model_len (131072) is greater than the derived max_model_len (max_position_embeddings=40960.0 or model_max_length=None in model's config.json). To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1. VLLM_ALLOW_LONG_MAX_MODEL_LEN must be used with extreme caution. If the model uses relative position encoding (RoPE), positions exceeding derived_max_model_len lead to nan. If the model uses absolute position encoding, positions exceeding derived_max_model_len will cause a CUDA array out-of-bounds error. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]
(APIServer pid=1) For further information visit https://errors.pydantic.dev/2.12/v/value_error

jiangtaozh

15 days ago

rebuilt with this:
uv venv
source .venv/bin/activate
uv
pip install -U vllm --pre
--extra-index-url https://wheels.vllm.ai/nightly/cu129
--extra-index-url https://download.pytorch.org/whl/cu129
--index-strategy unsafe-best-match
uv
pip install transformers==5.5.0

The same.

jiangtaozh

15 days ago

got the reason. used ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"] as the entrypoint. within the vllm/vllm-openai:gemma4, they use vllm server as the entrypoint.

jiangtaozh changed discussion status to closed 15 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment