Qwen2.5-VL-7B — Cell Captioning

Fine-tuned Qwen2.5-VL-7B-Instruct to generate biomedical descriptions of cells in fluorescence microscopy images. Part of the biomech-inference-serving pipeline (internal research project).

Training


Base model	`Qwen/Qwen2.5-VL-7B-Instruct`
Training data	`DnaRnaProteins/cell_seg_labeled`
Fine-tuning	QLoRA (4-bit, PEFT) via TRL `SFTTrainer`
Evaluation	ROUGE-L on validation split before push

Usage

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info

model_id = "DnaRnaProteins/qwen2.5-vl-7b-cells-cap"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

image = Image.open("cell_image.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": (
            "You are a biomedical imaging expert. Describe what you observe in this "
            "microscopy image of cells. Include cell morphology, density, any visible "
            "structures, and any notable features relevant to biomechanics analysis."
        )},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256)

caption = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

Via Modal endpoint

import base64, modal

caption_fn = modal.Function.from_name("biomech-inference-serving", "caption")
with open("cell_image.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
result = caption_fn.remote(b64)
# {"caption": "The image shows densely packed epithelial cells..."}

Limitations

Descriptions are intended as a research aid, not clinical guidance.
Trained on fluorescence cell images; other imaging modalities are out-of-distribution.

Downloads last month: 323

Safetensors

Model size

8B params

Tensor type

F16

F32

Model tree for DnaRnaProteins/qwen2.5-vl-7b-cells-cap

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

(134)

this model

DnaRnaProteins
/

qwen2.5-vl-7b-cells-cap