Prerequisite

VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
pip install git+https://github.com/huggingface/transformers.git

Basic Usage

vllm serve naaviii/glm-prun-4bit-32g-nextj \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \

vLLM

using pip (must use pypi.org as the index url):

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

vLLM

vllm serve naaviii/glm-prun-4bit-32g-nextj \
     --tensor-parallel-size 4 \
     --speculative-config.method mtp \
     --speculative-config.num_speculative_tokens 1 \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \

Downloads last month: 1

Safetensors

Model size

6B params

Tensor type

I64

F32

I32

BF16