How to get it to work with vLLM? (I tried GGUF and MLX

#23

by Timmo - opened 26 days ago

•

I tried to get it running with vLLM (vllm/vllm-openai:latest) :
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF:Q8_0 --host 0.0.0.0 --port 8000 --enforce-eager --gpu-memory-utilization 0.95 --max-model-len 128k --trust-remote-code

But I get the error:
(APIServer pid=19) raise RuntimeError(f"Can't get gguf config for {config.model_type}.")
(APIServer pid=19) RuntimeError: Can't get gguf config for qwen3_5.

I also tried to use the MLX 8bit, different error:
ValueError: Invalid type transformers.tokenization_utils_base.BatchEncoding received in array initialization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment