OpScribe Captioner Combined v1

LoRA adapter for Qwen2.5-VL-72B-Instruct — surgical frame captioning for the OpScribe pipeline.

Training

Base model: Qwen/Qwen2.5-VL-72B-Instruct
Dataset: EgoSurgery (9,618 frames) + surgeon-narrated voiceover dataset (3,402 frames, 3x weighted)
Total training examples: 13,020 train / 1,893 val
Framework: Custom PyTorch training loop, LoRA rank=16, alpha=32, lr=2e-4
Epochs: 3 (24h on 4x NVIDIA H200 140GB)
Best val_loss: 0.1046
Hardware: CHTC bhaskargpu4000

Usage

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel

base = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-72B-Instruct", device_map="auto"
)
model = PeftModel.from_pretrained(base, "rbryant19/opscribe-captioner-merged-v1")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-72B-Instruct")

Framework versions

PEFT 0.18.1

Downloads last month: 10

Model tree for rbryant19/opscribe-captioner-merged-v1

Base model

Qwen/Qwen2.5-VL-72B-Instruct

Adapter

(22)

this model