Qwen2.5-VL-3B — Cell Detection

Fine-tuned Qwen2.5-VL-3B-Instruct for bounding-box detection of cells in fluorescence microscopy images. Part of the biomech-inference-serving pipeline (internal research project).

Training


Base model	`Qwen/Qwen2.5-VL-3B-Instruct`
Training data	`DnaRnaProteins/cell_seg_labeled`
Fine-tuning	QLoRA (4-bit, PEFT) via TRL `SFTTrainer`
Output format	JSON array of `{"label": "cell", "bbox": [x1, y1, x2, y2]}`

Usage

import json, torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info

model_id = "DnaRnaProteins/qwen2.5-vl-3b-cells-det"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

image = Image.open("cell_image.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": (
            "Detect all cells in this microscopy image. "
            'Return bounding boxes in JSON format: [{"label": "cell", "bbox": [x1, y1, x2, y2]}, ...]'
        )},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=512)

raw = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
detections = json.loads(raw)
# [{"label": "cell", "bbox": [x1, y1, x2, y2]}, ...]

Via Modal endpoint

import base64, modal

detect = modal.Function.from_name("biomech-inference-serving", "detect")
with open("cell_image.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
result = detect.remote(b64)
# {"raw_output": "...", "detections": [{"label": "cell", "bbox": [...]}]}

Limitations

JSON parsing may fall back to an empty list on rare malformed outputs.
Trained on a single fluorescence dataset; generalisation to other stains is untested.

Downloads last month: 179

Safetensors

Model size

4B params

Tensor type

F32

BF16

F16

Model tree for DnaRnaProteins/qwen2.5-vl-3b-cells-det

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Quantized

(79)

this model

DnaRnaProteins
/

qwen2.5-vl-3b-cells-det