MedBridgeRL
This repository contains the weights for MedBridgeRL, a medical Vision-Language Model (VLM) post-trained with Reinforcement Learning (RL).
Project Page | GitHub | Paper
Description
MedBridgeRL is introduced in the paper "When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains". The model is based on a Qwen2.5-VL architecture, initialized from OctoMed, and post-trained using a boundary-aware RL recipe on a balanced subset of PMC multiple-choice VQA.
The research disentangles the effects of vision, supervised fine-tuning (SFT), and RL. The findings suggest that RL is most effective when the model already has non-trivial support (high Pass@K) induced by SFT; in these cases, RL primarily sharpens the output distribution, improving Accuracy@1 and sampling efficiency.
Evaluation
The model was evaluated using the MedBridgeRL-Eval kit across six medical VQA benchmarks. For detailed evaluation scripts and instructions on how to measure the reasoning boundary using Pass@K, please refer to the official GitHub repository.
Citation
If you find this work useful, please cite:
@misc{jeddi2026doesrlhelpmedical,
title={When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains},
author={Ahmadreza Jeddi and Kimia Shaban and Negin Baghbanzadeh and Natasha Sharan and Abhishek Moturu and Elham Dolatabadi and Babak Taati},
year={2026},
eprint={2603.01301},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.01301},
}
- Downloads last month
- 42