mbhosale
/

FairLLaVA

medical-imaging

vision-language

Model card Files Files and versions

FairLLaVA / README.md

mbhosale's picture

Update README.md

0f68659 verified 1 day ago

|

history blame contribute delete

3.15 kB

	---
	license: apache-2.0
	library_name: peft
	pipeline_tag: image-to-text
	base_model:
	- lmsys/vicuna-7b-v1.5
	- liuhaotian/llava-v1.5-7b
	tags:
	- medical-imaging
	- chest-xray
	- dermoscopy
	- vision-language
	- fairness
	- lora
	- peft
	- mimic-cxr
	- padchest
	- ham10000
	datasets:
	- physionet/mimic-cxr-jpg
	---

	# FairLLaVA — Pretrained Checkpoints

	Fairness-aware LoRA adapters for medical vision–language models, from the
	[FairLLaVA paper](https://arxiv.org/abs/2603.26008).

	- Code: [github.com/bhosalems/FairLLaVA](https://github.com/bhosalems/FairLLaVA)
	- Paper: [arxiv.org/abs/2603.26008](https://arxiv.org/abs/2603.26008)

	FairLLaVA minimizes the mutual information between the model's visual
	features and patient demographic attributes (age, sex, race),
	producing demographic-invariant representations while preserving clinical
	accuracy. The adapters here plug into a standard LoRA fine-tuning loop and
	are released on three medical benchmarks.

	## Checkpoints

	\| Subdir \| Dataset \| Base LLM \| Vision Tower \| Task \|
	\|---\|---\|---\|---\|---\|
	\| [`mimic-cxr/`](https://huggingface.co/mbhosale/FairLLaVA/tree/main/mimic-cxr) \| MIMIC-CXR \| `lmsys/vicuna-7b-v1.5` \| BiomedCLIP-CXR-518 \| Chest X-ray report generation \|
	\| [`padchest/`](./padchest) \| PadChest \| `lmsys/vicuna-7b-v1.5` \| BiomedCLIP-CXR-518 \| Chest X-ray report generation \|
	\| [`ham10000/`](./ham10000) \| HAM10000 \| `liuhaotian/llava-v1.5-7b` \| CLIP ViT-L/14-336 \| Dermoscopy VQA \|

	Each subdirectory contains the LoRA adapter (`adapter_model.safetensors`,
	`adapter_config.json`, `non_lora_trainables.bin`), the matching multimodal
	projector (`mm_projector.bin`), and the tokenizer files, so the path can be
	loaded as `model_path` directly by
	`llava.model.builder.load_pretrained_model`.

	## Quick start

	```python
	from huggingface_hub import snapshot_download
	from llava.model.builder import load_pretrained_model

	# Download just one dataset's checkpoint
	local_dir = snapshot_download(
	repo_id="mbhosale/FairLLaVA",
	allow_patterns="mimic-cxr/*",
	)
	model_path = f"{local_dir}/mimic-cxr"

	tokenizer, model, image_processor, ctx_len = load_pretrained_model(
	model_path,
	model_base="lmsys/vicuna-7b-v1.5",
	model_name="llavarad",
	)
	```

	See the full inference example in [`inference.py`](https://github.com/bhosalems/FairLLaVA/blob/main/inference.py).

	## Ethics

	These checkpoints are released for research and educational use only.
	They are not approved or validated for clinical or diagnostic use and
	must not be used to make medical decisions or to inform patient care. Each
	downstream dataset is governed by its own data-use agreement (PhysioNet for
	MIMIC-CXR, BIMCV for PadChest, ISIC / Harvard Dataverse for HAM10000).

	## Citation

	```bibtex
	@article{bhosale2026fairllava,
	title={FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants},
	author={Bhosale, Mahesh and Wasi, Abdul and Srivastava, Shantam and Latif, Shifa and Luan, Tianyu and Gao, Mingchen and Doermann, David and Gong, Xuan},
	journal={arXiv preprint arXiv:2603.26008},
	year={2026}
	}
	```