OpScribe Captioner Combined v1

LoRA adapter for Qwen2.5-VL-72B-Instruct — surgical frame captioning for the OpScribe pipeline.

Training

  • Base model: Qwen/Qwen2.5-VL-72B-Instruct
  • Dataset: EgoSurgery (9,618 frames) + surgeon-narrated voiceover dataset (3,402 frames, 3x weighted)
  • Total training examples: 13,020 train / 1,893 val
  • Framework: Custom PyTorch training loop, LoRA rank=16, alpha=32, lr=2e-4
  • Epochs: 3 (24h on 4x NVIDIA H200 140GB)
  • Best val_loss: 0.1046
  • Hardware: CHTC bhaskargpu4000

Usage

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel

base = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-72B-Instruct", device_map="auto"
)
model = PeftModel.from_pretrained(base, "rbryant19/opscribe-captioner-merged-v1")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-72B-Instruct")

Framework versions

  • PEFT 0.18.1
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbryant19/opscribe-captioner-merged-v1

Adapter
(22)
this model