Qwen3-4B-Thinking-2507-GPTQ-INT8
Model Details
This model is a int8 model with group_size 128 of Qwen/Qwen3-4B-Thinking-2507 generated by vastai modelzoo. Please follow the license of the original model.
vllm Inference
- vllm >= v0.11.0
- VVI >= 26.02
vllm serve vastai-ais/Qwen3-4B-Thinking-2507-GPTQ-INT8 --reasoning-parser qwen3 --served_model_name qwen
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d ' {
"model": "qwen",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
],
"temperature": 1,
"max_tokens": 512
} '
- Downloads last month
- 8
Model tree for vastai-ais/Qwen3-4B-Thinking-2507-GPTQ-INT8
Base model
Qwen/Qwen3-4B-Thinking-2507