Error: Cannot set `add_generation_prompt`
#4
by SlavikF - opened
I'm running this:
services:
vllm:
image: vllm/vllm-openai:v0.13.0
container_name: vllm-devstral
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0'] # 4090D
ports:
- "8080:8000"
environment:
TORCH_CUDA_ARCH_LIST: "8.9"
volumes:
- /home/slavik/.cache:/root/.cache
command: [
"--dtype", "half",
"--enable-auto-tool-choice",
"--gpu-memory-utilization", "0.95",
"--host", "0.0.0.0",
"--kv-cache-dtype", "fp8",
"--max-model-len", "100000",
"--max-num-batched-tokens", "10240",
"--max-num-seqs", "2",
"--model", "cyankiwi/Devstral-Small-2-24B-Instruct-2512-AWQ-4bit",
"--quantization", "compressed-tensors",
"--served-model-name", "local-devstral",
"--tool-call-parser", "mistral",
"--tensor-parallel-size", "1"
]
ipc: host
And using open-webui 0.7.1 with code interpreter.
Query:
use python to count how many r is inside the word “strawberry”
Correct code generated and executed, but results are not analyzed, because right after code execution, this error is in VLLM log:
WARNING 01-09 20:30:50 [protocol.py:116] The following fields were present in the request but ignored: {'num_ctx'}
ERROR 01-09 20:30:50 [chat_utils.py:1883] An error occurred in `mistral_common` while applying chat template
ERROR 01-09 20:30:50 [chat_utils.py:1883] Traceback (most recent call last):
ERROR 01-09 20:30:50 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [chat_utils.py:1883] return tokenizer.apply_chat_template(
ERROR 01-09 20:30:50 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 429, in apply_chat_template
ERROR 01-09 20:30:50 [chat_utils.py:1883] messages, tools = _prepare_apply_chat_template_tools_and_messages(
ERROR 01-09 20:30:50 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 118, in _prepare_apply_chat_template_tools_and_messages
ERROR 01-09 20:30:50 [chat_utils.py:1883] raise ValueError(
ERROR 01-09 20:30:50 [chat_utils.py:1883] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
ERROR 01-09 20:30:50 [serving_chat.py:255] Error in preprocessing prompt inputs
ERROR 01-09 20:30:50 [serving_chat.py:255] Traceback (most recent call last):
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255] return tokenizer.apply_chat_template(
ERROR 01-09 20:30:50 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 429, in apply_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255] messages, tools = _prepare_apply_chat_template_tools_and_messages(
ERROR 01-09 20:30:50 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 118, in _prepare_apply_chat_template_tools_and_messages
ERROR 01-09 20:30:50 [serving_chat.py:255] raise ValueError(
ERROR 01-09 20:30:50 [serving_chat.py:255] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
ERROR 01-09 20:30:50 [serving_chat.py:255]
ERROR 01-09 20:30:50 [serving_chat.py:255] The above exception was the direct cause of the following exception:
ERROR 01-09 20:30:50 [serving_chat.py:255]
ERROR 01-09 20:30:50 [serving_chat.py:255] Traceback (most recent call last):
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 237, in create_chat_completion
ERROR 01-09 20:30:50 [serving_chat.py:255] conversation, engine_prompts = await self._preprocess_chat(
ERROR 01-09 20:30:50 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 1146, in _preprocess_chat
ERROR 01-09 20:30:50 [serving_chat.py:255] request_prompt = await self._apply_mistral_chat_template_async(
ERROR 01-09 20:30:50 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run
ERROR 01-09 20:30:50 [serving_chat.py:255] result = self.fn(*self.args, **self.kwargs)
ERROR 01-09 20:30:50 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1886, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255] raise ValueError(str(e)) from e
ERROR 01-09 20:30:50 [serving_chat.py:255] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
INFO: 192.168.0.241:51545 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
The error above was with open-webui with
Functional calling = Default
When I tried native mode, getting another error:
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] An error occurred in `mistral_common` while applying chat template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] Traceback (most recent call last):
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] return tokenizer.apply_chat_template(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 433, in apply_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] return self.transformers_tokenizer.apply_chat_template(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_mistral_common.py", line 1504, in apply_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] chat_request = ChatCompletionRequest.from_openai(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/request.py", line 185, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] converted_messages: list[ChatMessage] = convert_openai_messages(messages)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/converters.py", line 31, in convert_openai_messages
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] message = AssistantMessage.from_openai(openai_message)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/messages.py", line 226, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] tools_calls.append(ToolCall.from_openai(openai_tool_call))
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/tool_calls.py", line 168, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] return cls.model_validate(tool_call)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] File "/usr/local/lib/python3.12/dist-packages/pydantic/main.py", line 716, in model_validate
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] return cls.__pydantic_validator__.validate_python(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] pydantic_core._pydantic_core.ValidationError: 1 validation error for ToolCall
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] index
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Error in preprocessing prompt inputs
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Traceback (most recent call last):
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] return tokenizer.apply_chat_template(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 433, in apply_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] return self.transformers_tokenizer.apply_chat_template(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_mistral_common.py", line 1504, in apply_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] chat_request = ChatCompletionRequest.from_openai(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/request.py", line 185, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] converted_messages: list[ChatMessage] = convert_openai_messages(messages)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/converters.py", line 31, in convert_openai_messages
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] message = AssistantMessage.from_openai(openai_message)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/messages.py", line 226, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] tools_calls.append(ToolCall.from_openai(openai_tool_call))
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/tool_calls.py", line 168, in from_openai
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] return cls.model_validate(tool_call)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/pydantic/main.py", line 716, in model_validate
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] return cls.__pydantic_validator__.validate_python(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] pydantic_core._pydantic_core.ValidationError: 1 validation error for ToolCall
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] index
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] The above exception was the direct cause of the following exception:
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Traceback (most recent call last):
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 237, in create_chat_completion
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] conversation, engine_prompts = await self._preprocess_chat(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 1146, in _preprocess_chat
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] request_prompt = await self._apply_mistral_chat_template_async(
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] result = self.fn(*self.args, **self.kwargs)
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1886, in apply_mistral_chat_template
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] raise ValueError(str(e)) from e
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ValueError: 1 validation error for ToolCall
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] index
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral | (APIServer pid=1) INFO: 192.168.0.241:9119 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request