Qwen3-4B-Thinking-2507-GPTQ-INT8

Model Details

This model is a int8 model with group_size 128 of Qwen/Qwen3-4B-Thinking-2507 generated by vastai modelzoo. Please follow the license of the original model.

vllm Inference

vllm >= v0.11.0
VVI >= 26.02

vllm serve vastai-ais/Qwen3-4B-Thinking-2507-GPTQ-INT8 --reasoning-parser qwen3 --served_model_name qwen

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d ' {
    "model": "qwen",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Who are you?"}
    ],
    "temperature": 1,
    "max_tokens": 512
  } '

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

I32

BF16

Model tree for vastai-ais/Qwen3-4B-Thinking-2507-GPTQ-INT8

Base model

Qwen/Qwen3-4B-Thinking-2507

Quantized

(98)

this model