Cannot load the model in vLLM 0.17.1 (latest stable)

#1
by Lukamodeo - opened

I tryed to load this model in vLLM 0.17.1 (latest stable) having this error:
Tokenizer class TokenizersBackend does not exist or is not currently imported.

The original and not quantized Qwen/Qwen3.5-27B has this in its tokenizer_config.json:
"tokenizer_class": "Qwen2Tokenizer"

This Intel/Qwen3.5-27B-int4-AutoRound has this in its tokenizer_config.json:
"tokenizer_class": "TokenizersBackend"

But vLLM 0.17.1 requires: transformers >= 4.56.0, < 5 and new TokenizersBackend isn't managed

Intel org

Please ignore vLLM requires and upgrade transformers to the latest version.

Thanks, but, for now, i tryed a simple solution.
I cloned the repo locally and replaced "TokenizersBackend" with "Qwen2Tokenizer" in the tokenizer_config.json... and vLLM 0.17.1 loaded the model without errors.

Thanks, but, for now, i tryed a simple solution.
I cloned the repo locally and replaced "TokenizersBackend" with "Qwen2Tokenizer" in the tokenizer_config.json... and vLLM 0.17.1 loaded the model without errors.

Hey, did you try the model? I'm planning to install it for my organization. How's the performance?

Hello
For now i only tested it for raw performances (now with vLLM 0.19.0 + replaced "TokenizersBackend" with "Qwen2Tokenizer" in the tokenizer_config.json) and i get near 100 tokens/s on a L40s

Hello
For now i only tested it for raw performances (now with vLLM 0.19.0 + replaced "TokenizersBackend" with "Qwen2Tokenizer" in the tokenizer_config.json) and i get near 100 tokens/s on a L40s

Very nice. What about the modal quality, does it satisfy you? What do you use it for, and how does it perform on these tasks? Thanks in advance.

Sorry, but i don't tested its quality for now, since i used Intel/Qwen3.5-27B-int4-AutoRound for real production tasks (on italian legal documents domain).
My initially experiments with 9B were for a very fast RAG (work in progress...) PoC.

Sign up or log in to comment