Qwen2.5-VL-7B — Cell Captioning

Fine-tuned Qwen2.5-VL-7B-Instruct to generate biomedical descriptions of cells in fluorescence microscopy images. Part of the biomech-inference-serving pipeline (internal research project).

Training

Base model Qwen/Qwen2.5-VL-7B-Instruct
Training data DnaRnaProteins/cell_seg_labeled
Fine-tuning QLoRA (4-bit, PEFT) via TRL SFTTrainer
Evaluation ROUGE-L on validation split before push

Usage

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info

model_id = "DnaRnaProteins/qwen2.5-vl-7b-cells-cap"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

image = Image.open("cell_image.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": (
            "You are a biomedical imaging expert. Describe what you observe in this "
            "microscopy image of cells. Include cell morphology, density, any visible "
            "structures, and any notable features relevant to biomechanics analysis."
        )},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256)

caption = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

Via Modal endpoint

import base64, modal

caption_fn = modal.Function.from_name("biomech-inference-serving", "caption")
with open("cell_image.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
result = caption_fn.remote(b64)
# {"caption": "The image shows densely packed epithelial cells..."}

Limitations

  • Descriptions are intended as a research aid, not clinical guidance.
  • Trained on fluorescence cell images; other imaging modalities are out-of-distribution.
Downloads last month
323
Safetensors
Model size
8B params
Tensor type
F16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DnaRnaProteins/qwen2.5-vl-7b-cells-cap

Quantized
(134)
this model

Dataset used to train DnaRnaProteins/qwen2.5-vl-7b-cells-cap