YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-VL-2B-Instruct-GPTQ-Int4

GPTQ-Int4 quantized version of Qwen/Qwen3-VL-2B-Instruct.

Quantization Details

  • Method: GPTQ 4-bit with group_size=128
  • Tool: GPTQModel 6.0.3
  • Calibration: 256 samples with random images
  • Base model: Qwen/Qwen3-VL-2B-Instruct
  • Model size: ~2.17 GB (vs ~4.26 GB unquantized)

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "h2oai/Qwen3-VL-2B-Instruct-GPTQ-Int4",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("h2oai/Qwen3-VL-2B-Instruct-GPTQ-Int4")

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": "https://example.com/image.png"},
        {"type": "text", "text": "Describe this image."},
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)
output = model.generate(**inputs, max_new_tokens=128)
print(processor.batch_decode(output, skip_special_tokens=True))

License

Same as the base model: Apache-2.0

Downloads last month
271
Safetensors
Model size
2B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support