Qwen3-VL-2B-Instruct-GPTQ-Int4

GPTQ-Int4 quantized version of Qwen/Qwen3-VL-2B-Instruct.

Quantization Details

Method: GPTQ 4-bit with group_size=128
Tool: GPTQModel 6.0.3
Calibration: 256 samples with random images
Base model: Qwen/Qwen3-VL-2B-Instruct
Model size: ~2.17 GB (vs ~4.26 GB unquantized)

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "h2oai/Qwen3-VL-2B-Instruct-GPTQ-Int4",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("h2oai/Qwen3-VL-2B-Instruct-GPTQ-Int4")

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": "https://example.com/image.png"},
        {"type": "text", "text": "Describe this image."},
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)
output = model.generate(**inputs, max_new_tokens=128)
print(processor.batch_decode(output, skip_special_tokens=True))

License

Same as the base model: Apache-2.0

Downloads last month: 271

Safetensors

Model size

2B params

Tensor type

BF16

I32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support