Prerequisite
VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
pip install git+https://github.com/huggingface/transformers.git
Basic Usage
vllm serve naaviii/glm-prun-4bit-32g-nextj \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
vLLM
- using pip (must use pypi.org as the index url):
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git
vLLM
vllm serve naaviii/glm-prun-4bit-32g-nextj \
--tensor-parallel-size 4 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \