Prerequisite

VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
pip install git+https://github.com/huggingface/transformers.git

Basic Usage

vllm serve naaviii/glm-prun-4bit-32g-nextj \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \

vLLM

  • using pip (must use pypi.org as the index url):
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

vLLM

vllm serve naaviii/glm-prun-4bit-32g-nextj \
     --tensor-parallel-size 4 \
     --speculative-config.method mtp \
     --speculative-config.num_speculative_tokens 1 \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
Downloads last month
1
Safetensors
Model size
6B params
Tensor type
I64
F32
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support