MedGemmaImpact/medgemma-1.5-4b-it-nvfp4

Quantized checkpoint of google/medgemma-1.5-4b-it using NVFP4 via llmcompressor.

Usage with vLLM

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset
from transformers import AutoProcessor

model_name = "MedGemmaImpact/medgemma-1.5-4b-it-nvfp4"

# Load image and processor
image = ImageAsset("cherry_blossom").pil_image.convert("RGB")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

# Build multimodal prompt
chat = [
    {"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "What is in this image?"}]},
    {"role": "assistant", "content": []}
]
prompt = processor.apply_chat_template(chat, add_generation_prompt=True)

# Initialize model
llm = LLM(model=model_name, trust_remote_code=True)

# Run inference
inputs = {"prompt": prompt, "multi_modal_data": {"image": [image]}}
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))

print("RESPONSE:", outputs[0].outputs[0].text)

Calibration Details

Dataset: flickr30k
Num samples: 256
Max sequence length: 1024

Notes

Vision tower and multi-modal projector are preserved at full precision
Embeddings and layer norms are preserved in original precision
Compatible with vLLM's compressed-tensors format

Downloads last month: 44

Safetensors

Model size

4B params

Tensor type

BF16

F32

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MedGemmaImpact/medgemma-1.5-4b-it-nvfp4

Base model

google/medgemma-1.5-4b-it

Quantized

(33)

this model