MedBridgeRL

This repository contains the weights for MedBridgeRL, a medical Vision-Language Model (VLM) post-trained with Reinforcement Learning (RL).

Project Page | GitHub | Paper

Description

MedBridgeRL is introduced in the paper "When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains". The model is based on a Qwen2.5-VL architecture, initialized from OctoMed, and post-trained using a boundary-aware RL recipe on a balanced subset of PMC multiple-choice VQA.

The research disentangles the effects of vision, supervised fine-tuning (SFT), and RL. The findings suggest that RL is most effective when the model already has non-trivial support (high Pass@K) induced by SFT; in these cases, RL primarily sharpens the output distribution, improving Accuracy@1 and sampling efficiency.

Evaluation

The model was evaluated using the MedBridgeRL-Eval kit across six medical VQA benchmarks. For detailed evaluation scripts and instructions on how to measure the reasoning boundary using Pass@K, please refer to the official GitHub repository.

Citation

If you find this work useful, please cite:

@misc{jeddi2026doesrlhelpmedical,
      title={When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains}, 
      author={Ahmadreza Jeddi and Kimia Shaban and Negin Baghbanzadeh and Natasha Sharan and Abhishek Moturu and Elham Dolatabadi and Babak Taati},
      year={2026},
      eprint={2603.01301},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.01301}, 
}

Downloads last month: 42

Safetensors

Model size

8B params

Tensor type

BF16

Paper for armenjeddi/MedBridgeRL-OctoMed-7B-PMC-VQA-RL

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Paper • 2603.01301 • Published Mar 1 • 8