MedGemmaImpact/medgemma-1.5-4b-it-nvfp4

Quantized checkpoint of google/medgemma-1.5-4b-it using NVFP4 via llmcompressor.

Usage with vLLM

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset
from transformers import AutoProcessor

model_name = "MedGemmaImpact/medgemma-1.5-4b-it-nvfp4"

# Load image and processor
image = ImageAsset("cherry_blossom").pil_image.convert("RGB")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

# Build multimodal prompt
chat = [
    {"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "What is in this image?"}]},
    {"role": "assistant", "content": []}
]
prompt = processor.apply_chat_template(chat, add_generation_prompt=True)

# Initialize model
llm = LLM(model=model_name, trust_remote_code=True)

# Run inference
inputs = {"prompt": prompt, "multi_modal_data": {"image": [image]}}
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))

print("RESPONSE:", outputs[0].outputs[0].text)

Calibration Details

  • Dataset: flickr30k
  • Num samples: 256
  • Max sequence length: 1024

Notes

  • Vision tower and multi-modal projector are preserved at full precision
  • Embeddings and layer norms are preserved in original precision
  • Compatible with vLLM's compressed-tensors format
Downloads last month
44
Safetensors
Model size
4B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for MedGemmaImpact/medgemma-1.5-4b-it-nvfp4

Quantized
(33)
this model