File size: 4,521 Bytes
e223016
 
16643f3
 
 
 
 
 
 
 
 
 
e223016
16643f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: mit
tags:
- ophthalmology
- object-detection
- image-classification
- medical-imaging
- fundus
- oct
- detectron2
- figure-parsing
- pytorch
---

# PubMed-Ophtha Detection Models

This repository contains the three detection and classification models used in the
[PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
ophthalmological figures from scientific publications.

**Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).

## Models

The repository contains three model checkpoints under `models/`:

| Directory | Checkpoint | Framework | Architecture | Task | Classes |
|---|---|---|---|---|---|
| `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other |
| `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label |
| `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated |

Each Detectron2 model directory also contains a `config.yaml` required for inference.

### Panel Detection Model

Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
Trained on the PubMed-Ophtha-Annotation dataset merged with
[PanelSeg](https://doi.org/10.1145/3331184.3331253) and
[ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
ImageCLEF2016-pretrained checkpoint.

- **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers)
- **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers)

### Image Type Detection Model

Detects individual images within a panel and assigns each a retinal imaging modality:
color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
(ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).

- **mAP@0.50:** 0.892
- **mAP@0.95:** 0.558

### Mark Status Classifier

A ResNet-50 binary classifier applied to cropped image regions detected by the image
type model. Predicts whether an image contains annotation marks such as arrows, dots,
or bounding boxes.

- **Accuracy:** 89.5% on the held-out test set

## Usage

Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
Python package. Download all weights with:

```bash
pip install pubmed-ophtha
pubmed-ophtha-split pull-models --local-dir .
```

Or download directly via `huggingface_hub`:

```python
from huggingface_hub import snapshot_download

snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
```

After downloading, run inference via the `DetectronFigureSplitter`:

```python
from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
from pubmed_ophtha.const.models import get_default_model_args

splitter = DetectronFigureSplitter(**get_default_model_args())

with open("figure.png", "rb") as f:
    image_bytes = f.read()

predictions = splitter.predict(image_bytes)
# Keys: pred_boxes, pred_classes, scores,
#       secondary_pred_classes, secondary_scores, keep_after_nms
```

`pred_classes` contains Panel/Label detections from the panel detection model followed
by CFP/OCT/Retinal Imaging/Other detections from the image type model.
`secondary_pred_classes` contains the Plain/Annotated mark status for each image
detection (set to `"None"` for panel detections).

## Training

Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
for 35 epochs with random cropping, flips, affine transformations, and color
augmentation.

## Dataset

The ground-truth annotations used for training are available as part of the
PubMed-Ophtha dataset:
[huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)

## Citation

```bibtex
@article{hallitschke2026pubmed,
  title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
  author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
  journal={arXiv preprint arXiv:2605.02720},
  year={2026}
}
```

## License

MIT — see [LICENSE](LICENSE).