File size: 3,148 Bytes
25fce7f
 
0a7ec01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25fce7f
 
 
 
 
 
 
0a7ec01
 
 
 
0f68659
0a7ec01
 
 
 
 
 
 
 
f8ede90
0a7ec01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d104bc7
0a7ec01
d104bc7
 
 
0a7ec01
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: apache-2.0
library_name: peft
pipeline_tag: image-to-text
base_model:
  - lmsys/vicuna-7b-v1.5
  - liuhaotian/llava-v1.5-7b
tags:
  - medical-imaging
  - chest-xray
  - dermoscopy
  - vision-language
  - fairness
  - lora
  - peft
  - mimic-cxr
  - padchest
  - ham10000
datasets:
  - physionet/mimic-cxr-jpg
---

# FairLLaVA — Pretrained Checkpoints

Fairness-aware LoRA adapters for medical vision–language models, from the
[FairLLaVA paper](https://arxiv.org/abs/2603.26008).

- **Code**: [github.com/bhosalems/FairLLaVA](https://github.com/bhosalems/FairLLaVA)
- **Paper**: [arxiv.org/abs/2603.26008](https://arxiv.org/abs/2603.26008)

FairLLaVA minimizes the mutual information between the model's visual
features and patient demographic attributes (age, sex, race),
producing demographic-invariant representations while preserving clinical
accuracy. The adapters here plug into a standard LoRA fine-tuning loop and
are released on three medical benchmarks.

## Checkpoints

| Subdir | Dataset | Base LLM | Vision Tower | Task |
|---|---|---|---|---|
| [`mimic-cxr/`](https://huggingface.co/mbhosale/FairLLaVA/tree/main/mimic-cxr) | MIMIC-CXR | `lmsys/vicuna-7b-v1.5`     | BiomedCLIP-CXR-518     | Chest X-ray report generation |
| [`padchest/`](./padchest)   | PadChest  | `lmsys/vicuna-7b-v1.5`     | BiomedCLIP-CXR-518     | Chest X-ray report generation |
| [`ham10000/`](./ham10000)   | HAM10000  | `liuhaotian/llava-v1.5-7b` | CLIP ViT-L/14-336      | Dermoscopy VQA |

Each subdirectory contains the LoRA adapter (`adapter_model.safetensors`,
`adapter_config.json`, `non_lora_trainables.bin`), the matching multimodal
projector (`mm_projector.bin`), and the tokenizer files, so the path can be
loaded as `model_path` directly by
`llava.model.builder.load_pretrained_model`.

## Quick start

```python
from huggingface_hub import snapshot_download
from llava.model.builder import load_pretrained_model

# Download just one dataset's checkpoint
local_dir = snapshot_download(
    repo_id="mbhosale/FairLLaVA",
    allow_patterns="mimic-cxr/*",
)
model_path = f"{local_dir}/mimic-cxr"

tokenizer, model, image_processor, ctx_len = load_pretrained_model(
    model_path,
    model_base="lmsys/vicuna-7b-v1.5",
    model_name="llavarad",
)
```

See the full inference example in [`inference.py`](https://github.com/bhosalems/FairLLaVA/blob/main/inference.py).

## Ethics

These checkpoints are released **for research and educational use only**.
They are **not** approved or validated for clinical or diagnostic use and
must not be used to make medical decisions or to inform patient care. Each
downstream dataset is governed by its own data-use agreement (PhysioNet for
MIMIC-CXR, BIMCV for PadChest, ISIC / Harvard Dataverse for HAM10000).

## Citation

```bibtex
@article{bhosale2026fairllava,
  title={FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants},
  author={Bhosale, Mahesh and Wasi, Abdul and Srivastava, Shantam and Latif, Shifa and Luan, Tianyu and Gao, Mingchen and Doermann, David and Gong, Xuan},
  journal={arXiv preprint arXiv:2603.26008},
  year={2026}
}
```