ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported

#1
by Qnibbles - opened

I am unable to start the model with version 0.18.2rc1.dev54+g73f48ce55 (vllm/vllm-openai:cu130-nightly image) of vLLM. The log follows:

INFO 04-02 13:58:04 [utils.py:299] 
INFO 04-02 13:58:04 [utils.py:299]        β–ˆ     β–ˆ     β–ˆβ–„   β–„β–ˆ
INFO 04-02 13:58:04 [utils.py:299]  β–„β–„ β–„β–ˆ β–ˆ     β–ˆ     β–ˆ β–€β–„β–€ β–ˆ  version 0.18.2rc1.dev54+g73f48ce55
INFO 04-02 13:58:04 [utils.py:299]   β–ˆβ–„β–ˆβ–€ β–ˆ     β–ˆ     β–ˆ     β–ˆ  model   kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16
INFO 04-02 13:58:04 [utils.py:299]    β–€β–€  β–€β–€β–€β–€β–€ β–€β–€β–€β–€β–€ β–€     β–€
INFO 04-02 13:58:04 [utils.py:299] 
INFO 04-02 13:58:04 [utils.py:233] non-default args: {'model_tag': 'kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'model': 'kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16', 'trust_remote_code': True, 'max_model_len': 131072, 'served_model_name': ['Qwen35_27B_FP8'], 'override_generation_config': {'temperature': 0.6, 'top_p': 0.95, 'top_k': 20, 'min_p': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 1.0}, 'reasoning_parser': 'qwen3', 'gpu_memory_utilization': 0.95, 'language_model_only': True, 'max_num_seqs': 8}
INFO 04-02 13:58:05 [model.py:549] Resolved architecture: Qwen3_5ForConditionalGeneration
INFO 04-02 13:58:05 [model.py:1679] Using max model len 131072
INFO 04-02 13:58:05 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 04-02 13:58:05 [vllm.py:799] Asynchronous scheduling is enabled.
INFO 04-02 13:58:05 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
INFO 04-02 13:58:05 [compilation.py:290] Enabled custom fusions: act_quant
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
    await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
    async with build_async_engine_client(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 135, in __init__
    self.renderer = renderer = renderer_from_config(self.vllm_config)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/registry.py", line 83, in renderer_from_config
    tokenizer = cached_tokenizer_from_config(model_config, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/registry.py", line 227, in cached_tokenizer_from_config
    return cached_get_tokenizer(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/registry.py", line 210, in get_tokenizer
    tokenizer = tokenizer_cls_.from_pretrained(tokenizer_name, *args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/hf.py", line 110, in from_pretrained
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/hf.py", line 85, in from_pretrained
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1153, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Any ideas as to the cause?

Same error on vLLM version 0.19.1rc1.dev65+g2df2c85be on RTX Pro 6000 with vllm/vllm-openai:cu130-nightly:

(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299]
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299] β–ˆ β–ˆ β–ˆβ–„ β–„β–ˆ
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299] β–„β–„ β–„β–ˆ β–ˆ β–ˆ β–ˆ β–€β–„β–€ β–ˆ version 0.19.1rc1.dev65+g2df2c85be
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299] β–ˆβ–„β–ˆβ–€ β–ˆ β–ˆ β–ˆ β–ˆ model kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299] β–€β–€ β–€β–€β–€β–€β–€ β–€β–€β–€β–€β–€ β–€ β–€
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:299]
(APIServer pid=1) INFO 04-11 06:24:40 [utils.py:233] non-default args: {'model_tag': 'kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'model': 'kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16', 'trust_remote_code': True, 'max_model_len': 147456, 'quantization': 'compressed-tensors', 'served_model_name': ['llm-large'], 'reasoning_parser': 'qwen3', 'gpu_memory_utilization': 0.4, 'enable_prefix_caching': True, 'mm_processor_cache_type': 'shm', 'max_num_seqs': 16, 'async_scheduling': True}
(APIServer pid=1) INFO 04-11 06:24:43 [model.py:554] Resolved architecture: Qwen3_5ForConditionalGeneration
(APIServer pid=1) INFO 04-11 06:24:43 [model.py:1684] Using max model len 147456
(APIServer pid=1) INFO 04-11 06:24:43 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1) WARNING 04-11 06:24:43 [config.py:308] Mamba cache mode is set to 'align' for Qwen3_5ForConditionalGeneration by default when prefix caching is enabled
(APIServer pid=1) INFO 04-11 06:24:43 [config.py:328] Warning: Prefix caching in Mamba cache 'align' mode is currently enabled. Its support for Mamba layers is experimental. Please report any issues you may observe.
(APIServer pid=1) INFO 04-11 06:24:43 [vllm.py:799] Asynchronous scheduling is enabled.
(APIServer pid=1) INFO 04-11 06:24:43 [kernel.py:199] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(APIServer pid=1) INFO 04-11 06:24:43 [compilation.py:290] Enabled custom fusions: act_quant
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self.loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 135, in init
(APIServer pid=1) self.renderer = renderer = renderer_from_config(self.vllm_config)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/registry.py", line 83, in renderer_from_config
(APIServer pid=1) tokenizer = cached_tokenizer_from_config(model_config, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/registry.py", line 227, in cached_tokenizer_from_config
(APIServer pid=1) return cached_get_tokenizer(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/registry.py", line 210, in get_tokenizer
(APIServer pid=1) tokenizer = tokenizer_cls
.from_pretrained(tokenizer_name, *args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/hf.py", line 110, in from_pretrained
(APIServer pid=1) raise e
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tokenizers/hf.py", line 85, in from_pretrained
(APIServer pid=1) tokenizer = AutoTokenizer.from_pretrained(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1153, in from_pretrained
(APIServer pid=1) raise ValueError(
(APIServer pid=1) ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.
user@puget-286350:~/llm-server$

The Kaitchup org

Using Transformers < 5.0?
Upgrading should remove the issue.

Indeed I was. I'll try that, thanks for the solution.

Sign up or log in to comment