LAnA
Layer-Wise Anatomical Attention model
Best current model in this collection:
manu02/LAnA-Arxiv
Overview
LAnA is a medical report-generation project for chest X-ray images. The completed project is intended to generate radiology reports with a vision-language model guided by layer-wise anatomical attention built from predicted anatomical masks.
The architecture combines a DINOv3 vision encoder, lung and heart segmentation heads, and a GPT-2 decoder modified so each transformer layer receives a different anatomical attention bias derived from the segmentation mask.
How to Run
New users should prefer the standard Hugging Face flow below.
The legacy snapshot/manual implementation has been moved to the snapshot-legacy branch for backwards compatibility.
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor
repo_id = "manu02/LAnA-v4"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.move_non_quantized_modules(device)
model.eval()
image = Image.open("example.png").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {name: tensor.to(device) for name, tensor in inputs.items()}
with torch.inference_mode():
generated = model.generate(**inputs, max_new_tokens=150)
report = processor.batch_decode(generated, skip_special_tokens=True)[0]
print(report)
Batched inference uses the same path:
batch = processor(images=[image_a, image_b], return_tensors="pt")
batch = {name: tensor.to(device) for name, tensor in batch.items()}
generated = model.generate(**batch, max_new_tokens=150)
reports = processor.batch_decode(generated, skip_special_tokens=True)
HF_TOKEN is optional for this public standard-loading path. If you do not set one, the model still loads, but Hugging Face may show lower-rate-limit warnings.
The default main branch is inference-minimal and avoids shipping duplicated component weights.
Intended Use
- Input: a chest X-ray image resized to
512x512and normalized with ImageNet mean/std. - Output: a generated radiology report.
- Best fit: research use, report-generation experiments, and anatomical-attention ablations.
MIMIC Test Results
These comparison tables are refreshed across the full LAnA collection whenever any collection model is evaluated.
Cross-Model Comparison: All Frontal Test Studies (3041 studies)
| Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 | LAnA-v5 | LAnA-Arxiv |
|---|---|---|---|---|---|---|---|---|
| ROUGE-L | 0.1513 |
0.1653 |
0.1686 |
0.1670 |
0.1745 | 0.1675 |
0.1702 |
`` |
| BLEU-1 | 0.1707 |
0.1916 |
0.2091 |
0.2174 |
0.2346 |
0.2244 |
0.2726 | `` |
| BLEU-4 | 0.0357 |
0.0386 |
0.0417 |
0.0417 |
0.0484 |
0.0441 |
0.0503 | `` |
| METEOR | 0.2079 |
0.2202 |
0.2298 |
0.2063 |
0.2129 |
0.2002 |
0.2607 | `` |
| RadGraph F1 | 0.0918 |
0.0921 |
0.1024 |
0.1057 | 0.0939 |
0.0794 |
0.0853 |
`` |
| RadGraph entity F1 | 0.1399 |
0.1459 |
0.1587 | 0.1569 |
0.1441 |
0.1437 |
0.1481 |
`` |
| RadGraph relation F1 | 0.1246 |
0.1322 |
0.1443 |
0.1474 | 0.1280 |
0.1293 |
0.1308 |
`` |
| CheXpert F1 14-micro | 0.1829 |
0.1565 |
0.2116 |
0.1401 |
0.3116 |
0.2196 |
0.3552 | `` |
| CheXpert F1 5-micro | 0.2183 |
0.1530 |
0.2512 |
0.2506 |
0.2486 |
0.0538 |
0.3777 | `` |
| CheXpert F1 14-macro | 0.1095 |
0.0713 |
0.1095 |
0.0401 |
0.1363 |
0.0724 |
0.1790 | `` |
| CheXpert F1 5-macro | 0.1634 |
0.1007 |
0.1644 |
0.1004 |
0.1686 |
0.0333 |
0.2647 | `` |
Cross-Model Comparison: Findings-Only Frontal Test Studies (2210 studies)
| Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 | LAnA-v5 | LAnA-Arxiv |
|---|---|---|---|---|---|---|---|---|
| ROUGE-L | 0.1576 |
0.1720 |
0.1771 |
0.1771 |
0.1848 | 0.1753 |
0.1781 |
`` |
| BLEU-1 | 0.1754 |
0.2003 |
0.2177 |
0.2263 |
0.2480 |
0.2337 |
0.2774 | `` |
| BLEU-4 | 0.0405 |
0.0449 |
0.0484 |
0.0487 |
0.0573 |
0.0509 |
0.0575 | `` |
| METEOR | 0.2207 |
0.2347 |
0.2466 |
0.2240 |
0.2310 |
0.2137 |
0.2760 | `` |
| RadGraph F1 | 0.1010 |
0.1000 |
0.1119 |
0.1181 |
0.1046 |
0.0906 |
0.0938 |
0.1831 |
| RadGraph entity F1 | 0.1517 |
0.1577 |
0.1713 |
0.1739 |
0.1584 |
0.1566 |
0.1580 |
0.1831 |
| RadGraph relation F1 | 0.1347 |
0.1413 |
0.1549 |
0.1628 | 0.1405 |
0.1410 |
0.1395 |
0.1596 |
| CheXpert F1 14-micro | 0.1651 |
0.1442 |
0.1907 |
0.1365 |
0.2921 |
0.2205 |
0.3173 |
0.3228 |
| CheXpert F1 5-micro | 0.2152 |
0.1716 |
0.2415 |
0.2455 |
0.2394 |
0.0555 |
0.3372 |
0.3745 |
| CheXpert F1 14-macro | 0.1047 |
0.0700 |
0.1039 |
0.0381 |
0.1326 |
0.0714 |
0.1632 |
0.2190 |
| CheXpert F1 5-macro | 0.1611 |
0.1112 |
0.1578 |
0.0952 |
0.1636 |
0.0342 |
0.2343 |
0.3354 |
Data
- Full project datasets: CheXpert and MIMIC-CXR.
- Intended project scope: train on curated chest X-ray/report data from both datasets and evaluate on MIMIC-CXR test studies.
- Current released checkpoint datasets:
MIMIC-CXR (findings-only)for training andMIMIC-CXR (findings-only)for validation. - Current published evaluation: MIMIC-CXR test split,
frontal-only (PA/AP)studies.
Evaluation
- Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (
14-micro,5-micro,14-macro,5-macro).
Experiment Model Descriptions
LAnA-MIMIC-CHEXPERT: This variant was trained on a combined dataset ofCheXpertandMIMIC-CXRusing LoRA fine-tuning with theAdamWoptimizer.LAnA-MIMIC: This model was trained on theMIMIC-CXR (findings-only)dataset using LoRA fine-tuning with theAdamWoptimizer.LAnA: This model was trained on theMIMIC-CXR (findings-only)dataset using full-model optimization withAdamWinstead of LoRA.LAnA-v2: This version keeps the same training setup asLAnA, but increases the effective global batch size from16to128.LAnA-v3: This version keeps the same training setup asLAnA, including the effective global batch size of16, but changes how EOS is handled so training and generation follow the same behavior. The model no longer uses the EOS token during training, and generation remained greedy without stopping when an EOS token was produced. In the previous setup, decoding was also greedy, stopped at EOS, and used a maximum of128new tokens.LAnA-v4: This version keeps the same decoding behavior asLAnA-v3, but increases the effective global batch size from16to128.LAnA-v5: This version uses the training recipe from the originalLAnApaper, while switching to the legacyCXR-Findings-AIgeneration behavior.LAnA-Arxiv: This model is the report-generation model created in the arXiv paper, packaged locally with its original legacy generation code.
Training Snapshot
- Run:
LAnA-v4 - This section describes the completed public training run.
- Method:
full_adamw - Vision encoder:
facebook/dinov3-vits16-pretrain-lvd1689m - Text decoder:
gpt2 - Visual projection:
linear - Segmentation encoder:
facebook/dinov3-convnext-small-pretrain-lvd1689m - Image size:
512 - Local batch size:
1 - Effective global batch size:
128 - Scheduler:
cosine - Warmup steps:
165 - Weight decay:
0.01 - Steps completed:
3289 - Planned total steps:
3297 - Images seen:
421707 - Total training time:
8.0982hours - Hardware:
NVIDIA GeForce RTX 5070 - Final train loss:
1.9641 - Validation loss:
1.6446
Status
- Project status:
Training completed - Release status:
Completed training run - Current checkpoint status:
Final completed run - Training completion toward planned run:
100.00%(3/3epochs) - Current published metrics are intermediate and will change as training continues.
Notes
HF_TOKENis optional for this public repo and only helps with Hugging Face rate limits.segmenters/contains the lung and heart segmentation checkpoints used to build anatomical attention masks.evaluations/mimic_test_metrics.jsoncontains the latest saved MIMIC test metrics.
- Downloads last month
- 449
