MedGemmaImpact/medgemma-1.5-4b-it-nvfp4
Quantized checkpoint of google/medgemma-1.5-4b-it using NVFP4 via llmcompressor.
Usage with vLLM
from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset
from transformers import AutoProcessor
model_name = "MedGemmaImpact/medgemma-1.5-4b-it-nvfp4"
# Load image and processor
image = ImageAsset("cherry_blossom").pil_image.convert("RGB")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
# Build multimodal prompt
chat = [
{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "What is in this image?"}]},
{"role": "assistant", "content": []}
]
prompt = processor.apply_chat_template(chat, add_generation_prompt=True)
# Initialize model
llm = LLM(model=model_name, trust_remote_code=True)
# Run inference
inputs = {"prompt": prompt, "multi_modal_data": {"image": [image]}}
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))
print("RESPONSE:", outputs[0].outputs[0].text)
Calibration Details
- Dataset:
flickr30k - Num samples:
256 - Max sequence length:
1024
Notes
- Vision tower and multi-modal projector are preserved at full precision
- Embeddings and layer norms are preserved in original precision
- Compatible with vLLM's compressed-tensors format
- Downloads last month
- 44
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for MedGemmaImpact/medgemma-1.5-4b-it-nvfp4
Base model
google/medgemma-1.5-4b-it