Qwen2.5-VL-3B — Cell Detection

Fine-tuned Qwen2.5-VL-3B-Instruct for bounding-box detection of cells in fluorescence microscopy images. Part of the biomech-inference-serving pipeline (internal research project).

Training

Base model Qwen/Qwen2.5-VL-3B-Instruct
Training data DnaRnaProteins/cell_seg_labeled
Fine-tuning QLoRA (4-bit, PEFT) via TRL SFTTrainer
Output format JSON array of {"label": "cell", "bbox": [x1, y1, x2, y2]}

Usage

import json, torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info

model_id = "DnaRnaProteins/qwen2.5-vl-3b-cells-det"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

image = Image.open("cell_image.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": (
            "Detect all cells in this microscopy image. "
            'Return bounding boxes in JSON format: [{"label": "cell", "bbox": [x1, y1, x2, y2]}, ...]'
        )},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=512)

raw = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
detections = json.loads(raw)
# [{"label": "cell", "bbox": [x1, y1, x2, y2]}, ...]

Via Modal endpoint

import base64, modal

detect = modal.Function.from_name("biomech-inference-serving", "detect")
with open("cell_image.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
result = detect.remote(b64)
# {"raw_output": "...", "detections": [{"label": "cell", "bbox": [...]}]}

Limitations

  • JSON parsing may fall back to an empty list on rare malformed outputs.
  • Trained on a single fluorescence dataset; generalisation to other stains is untested.
Downloads last month
179
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DnaRnaProteins/qwen2.5-vl-3b-cells-det

Quantized
(79)
this model

Dataset used to train DnaRnaProteins/qwen2.5-vl-3b-cells-det