How to get it to work with vLLM? (I tried GGUF and MLX
#23
by Timmo - opened
I tried to get it running with vLLM (vllm/vllm-openai:latest) :Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF:Q8_0 --host 0.0.0.0 --port 8000 --enforce-eager --gpu-memory-utilization 0.95 --max-model-len 128k --trust-remote-code
But I get the error:
(APIServer pid=19) raise RuntimeError(f"Can't get gguf config for {config.model_type}.")
(APIServer pid=19) RuntimeError: Can't get gguf config for qwen3_5.
I also tried to use the MLX 8bit, different error:
ValueError: Invalid type transformers.tokenization_utils_base.BatchEncoding received in array initialization.