--- --- license: apache-2.0 tags: - medical-imaging - vision-language-model - vlm - lora - graph-neural-networks - zero-shot metrics: - accuracy --- # ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs
arXiv
**ACE-LoRA** is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel **Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN)**. ## Model Description Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only **0.95M** trainable parameters to frozen image-text encoders. ### Key Features: * **ACE-HGNN Module:** Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details. * **Label-Guided InfoNCE Loss:** A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment. * **Efficiency:** Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen. ### Environment Setup The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`. ``` conda create -n ace_lora python=3.10.18 conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt ``` ### Inference We provide an inference code sample (`hf_model_inference.py`) for the RSNA dataset. ## Datasets **MIMIC-CXR:** For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [[`link`](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)] **NIH Chest X-ray:** For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [[`link`](https://nihcc.app.box.com/v/ChestXray-NIHCC)]. After downloading, run ```dataset_prep/chestx-ray_14_prep.py``` from our github repo to split the data and prepare it in the required format. **CheXpert 5x200:** For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [[`link`](https://stanfordmedicine.app.box.com/s/j5h7q99f3pfi7enc0dom73m4nsm6yzvh)]. **RSNA:** We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data)]. After downloading, run ```dataset_prep/rsna_dataset_create.py``` from our github repo to split the data and prepare it in the required format for both tasks. **SIIM:** We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data)]. After downloading, run ```dataset_prep/SIIM_generate_class_labels.py``` from our github repo to prepare the data for zero-shot classification, and ```dataset_prep/SIIM_generate_mask.py``` for semantic segmentation. - Code: https://github.com/icon-lab/ACE-LoRA - Paper: https://arxiv.org/pdf/2603.17079 ## 🤝 Acknowledgments This implementation builds upon [CLIP-LoRA](https://github.com/MaxZanella/CLIP-LoRA) and [LoRA](https://github.com/microsoft/LoRA). We gratefully acknowledge their valuable contributions.