OpScribe Captioner Combined v1
LoRA adapter for Qwen2.5-VL-72B-Instruct — surgical frame captioning for the OpScribe pipeline.
Training
- Base model: Qwen/Qwen2.5-VL-72B-Instruct
- Dataset: EgoSurgery (9,618 frames) + surgeon-narrated voiceover dataset (3,402 frames, 3x weighted)
- Total training examples: 13,020 train / 1,893 val
- Framework: Custom PyTorch training loop, LoRA rank=16, alpha=32, lr=2e-4
- Epochs: 3 (24h on 4x NVIDIA H200 140GB)
- Best val_loss: 0.1046
- Hardware: CHTC bhaskargpu4000
Usage
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-72B-Instruct", device_map="auto"
)
model = PeftModel.from_pretrained(base, "rbryant19/opscribe-captioner-merged-v1")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-72B-Instruct")
Framework versions
- PEFT 0.18.1
- Downloads last month
- 10
Model tree for rbryant19/opscribe-captioner-merged-v1
Base model
Qwen/Qwen2.5-VL-72B-Instruct