Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,130 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- ophthalmology
|
| 5 |
+
- object-detection
|
| 6 |
+
- image-classification
|
| 7 |
+
- medical-imaging
|
| 8 |
+
- fundus
|
| 9 |
+
- oct
|
| 10 |
+
- detectron2
|
| 11 |
+
- figure-parsing
|
| 12 |
+
- pytorch
|
| 13 |
---
|
| 14 |
+
|
| 15 |
+
# PubMed-Ophtha Detection Models
|
| 16 |
+
|
| 17 |
+
This repository contains the three detection and classification models used in the
|
| 18 |
+
[PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
|
| 19 |
+
ophthalmological figures from scientific publications.
|
| 20 |
+
|
| 21 |
+
**Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
|
| 22 |
+
for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).
|
| 23 |
+
|
| 24 |
+
## Models
|
| 25 |
+
|
| 26 |
+
The repository contains three model checkpoints under `models/`:
|
| 27 |
+
|
| 28 |
+
| Directory | Checkpoint | Framework | Architecture | Task | Classes |
|
| 29 |
+
|---|---|---|---|---|---|
|
| 30 |
+
| `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other |
|
| 31 |
+
| `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label |
|
| 32 |
+
| `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated |
|
| 33 |
+
|
| 34 |
+
Each Detectron2 model directory also contains a `config.yaml` required for inference.
|
| 35 |
+
|
| 36 |
+
### Panel Detection Model
|
| 37 |
+
|
| 38 |
+
Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
|
| 39 |
+
Trained on the PubMed-Ophtha-Annotation dataset merged with
|
| 40 |
+
[PanelSeg](https://doi.org/10.1145/3331184.3331253) and
|
| 41 |
+
[ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
|
| 42 |
+
ImageCLEF2016-pretrained checkpoint.
|
| 43 |
+
|
| 44 |
+
- **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers)
|
| 45 |
+
- **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers)
|
| 46 |
+
|
| 47 |
+
### Image Type Detection Model
|
| 48 |
+
|
| 49 |
+
Detects individual images within a panel and assigns each a retinal imaging modality:
|
| 50 |
+
color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
|
| 51 |
+
(ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).
|
| 52 |
+
|
| 53 |
+
- **mAP@0.50:** 0.892
|
| 54 |
+
- **mAP@0.95:** 0.558
|
| 55 |
+
|
| 56 |
+
### Mark Status Classifier
|
| 57 |
+
|
| 58 |
+
A ResNet-50 binary classifier applied to cropped image regions detected by the image
|
| 59 |
+
type model. Predicts whether an image contains annotation marks such as arrows, dots,
|
| 60 |
+
or bounding boxes.
|
| 61 |
+
|
| 62 |
+
- **Accuracy:** 89.5% on the held-out test set
|
| 63 |
+
|
| 64 |
+
## Usage
|
| 65 |
+
|
| 66 |
+
Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
|
| 67 |
+
Python package. Download all weights with:
|
| 68 |
+
|
| 69 |
+
```bash
|
| 70 |
+
pip install pubmed-ophtha
|
| 71 |
+
pubmed-ophtha-split pull-models --local-dir .
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
Or download directly via `huggingface_hub`:
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from huggingface_hub import snapshot_download
|
| 78 |
+
|
| 79 |
+
snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
After downloading, run inference via the `DetectronFigureSplitter`:
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
|
| 86 |
+
from pubmed_ophtha.const.models import get_default_model_args
|
| 87 |
+
|
| 88 |
+
splitter = DetectronFigureSplitter(**get_default_model_args())
|
| 89 |
+
|
| 90 |
+
with open("figure.png", "rb") as f:
|
| 91 |
+
image_bytes = f.read()
|
| 92 |
+
|
| 93 |
+
predictions = splitter.predict(image_bytes)
|
| 94 |
+
# Keys: pred_boxes, pred_classes, scores,
|
| 95 |
+
# secondary_pred_classes, secondary_scores, keep_after_nms
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
`pred_classes` contains Panel/Label detections from the panel detection model followed
|
| 99 |
+
by CFP/OCT/Retinal Imaging/Other detections from the image type model.
|
| 100 |
+
`secondary_pred_classes` contains the Plain/Annotated mark status for each image
|
| 101 |
+
detection (set to `"None"` for panel detections).
|
| 102 |
+
|
| 103 |
+
## Training
|
| 104 |
+
|
| 105 |
+
Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
|
| 106 |
+
ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
|
| 107 |
+
dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
|
| 108 |
+
for 35 epochs with random cropping, flips, affine transformations, and color
|
| 109 |
+
augmentation.
|
| 110 |
+
|
| 111 |
+
## Dataset
|
| 112 |
+
|
| 113 |
+
The ground-truth annotations used for training are available as part of the
|
| 114 |
+
PubMed-Ophtha dataset:
|
| 115 |
+
[huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)
|
| 116 |
+
|
| 117 |
+
## Citation
|
| 118 |
+
|
| 119 |
+
```bibtex
|
| 120 |
+
@article{hallitschke2026pubmed,
|
| 121 |
+
title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
|
| 122 |
+
author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
|
| 123 |
+
journal={arXiv preprint arXiv:2605.02720},
|
| 124 |
+
year={2026}
|
| 125 |
+
}
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
## License
|
| 129 |
+
|
| 130 |
+
MIT — see [LICENSE](LICENSE).
|