Error: Cannot set `add_generation_prompt`

#4
by SlavikF - opened

I'm running this:

services:
  vllm:
    image: vllm/vllm-openai:v0.13.0
    container_name: vllm-devstral
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0']  # 4090D
    ports:
      - "8080:8000"
    environment:
      TORCH_CUDA_ARCH_LIST: "8.9"
    volumes:
      - /home/slavik/.cache:/root/.cache
    command: [
      "--dtype", "half",
      "--enable-auto-tool-choice",
      "--gpu-memory-utilization", "0.95",
      "--host", "0.0.0.0",
      "--kv-cache-dtype", "fp8",
      "--max-model-len", "100000",
      "--max-num-batched-tokens", "10240",
      "--max-num-seqs", "2",
      "--model", "cyankiwi/Devstral-Small-2-24B-Instruct-2512-AWQ-4bit",
      "--quantization", "compressed-tensors",
      "--served-model-name", "local-devstral",
      "--tool-call-parser", "mistral",
      "--tensor-parallel-size", "1"
    ]
    ipc: host

And using open-webui 0.7.1 with code interpreter.

Query:

use python to count how many r is inside the word “strawberry”

Correct code generated and executed, but results are not analyzed, because right after code execution, this error is in VLLM log:

WARNING 01-09 20:30:50 [protocol.py:116] The following fields were present in the request but ignored: {'num_ctx'}
ERROR 01-09 20:30:50 [chat_utils.py:1883] An error occurred in `mistral_common` while applying chat template
ERROR 01-09 20:30:50 [chat_utils.py:1883] Traceback (most recent call last):
ERROR 01-09 20:30:50 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [chat_utils.py:1883]     return tokenizer.apply_chat_template(
ERROR 01-09 20:30:50 [chat_utils.py:1883]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 429, in apply_chat_template
ERROR 01-09 20:30:50 [chat_utils.py:1883]     messages, tools = _prepare_apply_chat_template_tools_and_messages(
ERROR 01-09 20:30:50 [chat_utils.py:1883]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 118, in _prepare_apply_chat_template_tools_and_messages
ERROR 01-09 20:30:50 [chat_utils.py:1883]     raise ValueError(
ERROR 01-09 20:30:50 [chat_utils.py:1883] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
ERROR 01-09 20:30:50 [serving_chat.py:255] Error in preprocessing prompt inputs
ERROR 01-09 20:30:50 [serving_chat.py:255] Traceback (most recent call last):
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255]     return tokenizer.apply_chat_template(
ERROR 01-09 20:30:50 [serving_chat.py:255]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 429, in apply_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255]     messages, tools = _prepare_apply_chat_template_tools_and_messages(
ERROR 01-09 20:30:50 [serving_chat.py:255]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 118, in _prepare_apply_chat_template_tools_and_messages
ERROR 01-09 20:30:50 [serving_chat.py:255]     raise ValueError(
ERROR 01-09 20:30:50 [serving_chat.py:255] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
ERROR 01-09 20:30:50 [serving_chat.py:255]
ERROR 01-09 20:30:50 [serving_chat.py:255] The above exception was the direct cause of the following exception:
ERROR 01-09 20:30:50 [serving_chat.py:255]
ERROR 01-09 20:30:50 [serving_chat.py:255] Traceback (most recent call last):
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 237, in create_chat_completion
ERROR 01-09 20:30:50 [serving_chat.py:255]     conversation, engine_prompts = await self._preprocess_chat(
ERROR 01-09 20:30:50 [serving_chat.py:255]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 1146, in _preprocess_chat
ERROR 01-09 20:30:50 [serving_chat.py:255]     request_prompt = await self._apply_mistral_chat_template_async(
ERROR 01-09 20:30:50 [serving_chat.py:255]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run
ERROR 01-09 20:30:50 [serving_chat.py:255]     result = self.fn(*self.args, **self.kwargs)
ERROR 01-09 20:30:50 [serving_chat.py:255]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-09 20:30:50 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1886, in apply_mistral_chat_template
ERROR 01-09 20:30:50 [serving_chat.py:255]     raise ValueError(str(e)) from e
ERROR 01-09 20:30:50 [serving_chat.py:255] ValueError: Cannot set `add_generation_prompt` to True when the last message is from the assistant. Consider using `continue_final_message` instead.
INFO:     192.168.0.241:51545 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

The error above was with open-webui with
Functional calling = Default

When I tried native mode, getting another error:


vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] An error occurred in `mistral_common` while applying chat template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] Traceback (most recent call last):
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     return tokenizer.apply_chat_template(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 433, in apply_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     return self.transformers_tokenizer.apply_chat_template(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_mistral_common.py", line 1504, in apply_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     chat_request = ChatCompletionRequest.from_openai(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/request.py", line 185, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     converted_messages: list[ChatMessage] = convert_openai_messages(messages)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/converters.py", line 31, in convert_openai_messages
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     message = AssistantMessage.from_openai(openai_message)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/messages.py", line 226, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     tools_calls.append(ToolCall.from_openai(openai_tool_call))
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/tool_calls.py", line 168, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     return cls.model_validate(tool_call)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   File "/usr/local/lib/python3.12/dist-packages/pydantic/main.py", line 716, in model_validate
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     return cls.__pydantic_validator__.validate_python(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] pydantic_core._pydantic_core.ValidationError: 1 validation error for ToolCall
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883] index
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]   Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [chat_utils.py:1883]     For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Error in preprocessing prompt inputs
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Traceback (most recent call last):
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1866, in apply_mistral_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     return tokenizer.apply_chat_template(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/mistral.py", line 433, in apply_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     return self.transformers_tokenizer.apply_chat_template(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_mistral_common.py", line 1504, in apply_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     chat_request = ChatCompletionRequest.from_openai(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/request.py", line 185, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     converted_messages: list[ChatMessage] = convert_openai_messages(messages)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/converters.py", line 31, in convert_openai_messages
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     message = AssistantMessage.from_openai(openai_message)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/messages.py", line 226, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     tools_calls.append(ToolCall.from_openai(openai_tool_call))
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/mistral_common/protocol/instruct/tool_calls.py", line 168, in from_openai
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     return cls.model_validate(tool_call)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/pydantic/main.py", line 716, in model_validate
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     return cls.__pydantic_validator__.validate_python(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] pydantic_core._pydantic_core.ValidationError: 1 validation error for ToolCall
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] index
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] 
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] The above exception was the direct cause of the following exception:
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] 
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] Traceback (most recent call last):
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 237, in create_chat_completion
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     conversation, engine_prompts = await self._preprocess_chat(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 1146, in _preprocess_chat
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     request_prompt = await self._apply_mistral_chat_template_async(
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     result = self.fn(*self.args, **self.kwargs)
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1886, in apply_mistral_chat_template
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     raise ValueError(str(e)) from e
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] ValueError: 1 validation error for ToolCall
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255] index
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]   Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
vllm-devstral  | (APIServer pid=1) ERROR 01-09 21:30:08 [serving_chat.py:255]     For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
vllm-devstral  | (APIServer pid=1) INFO:     192.168.0.241:9119 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Sign up or log in to comment