File size: 4,521 Bytes
e223016 16643f3 e223016 16643f3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ---
license: mit
tags:
- ophthalmology
- object-detection
- image-classification
- medical-imaging
- fundus
- oct
- detectron2
- figure-parsing
- pytorch
---
# PubMed-Ophtha Detection Models
This repository contains the three detection and classification models used in the
[PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
ophthalmological figures from scientific publications.
**Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).
## Models
The repository contains three model checkpoints under `models/`:
| Directory | Checkpoint | Framework | Architecture | Task | Classes |
|---|---|---|---|---|---|
| `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other |
| `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label |
| `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated |
Each Detectron2 model directory also contains a `config.yaml` required for inference.
### Panel Detection Model
Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
Trained on the PubMed-Ophtha-Annotation dataset merged with
[PanelSeg](https://doi.org/10.1145/3331184.3331253) and
[ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
ImageCLEF2016-pretrained checkpoint.
- **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers)
- **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers)
### Image Type Detection Model
Detects individual images within a panel and assigns each a retinal imaging modality:
color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
(ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).
- **mAP@0.50:** 0.892
- **mAP@0.95:** 0.558
### Mark Status Classifier
A ResNet-50 binary classifier applied to cropped image regions detected by the image
type model. Predicts whether an image contains annotation marks such as arrows, dots,
or bounding boxes.
- **Accuracy:** 89.5% on the held-out test set
## Usage
Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
Python package. Download all weights with:
```bash
pip install pubmed-ophtha
pubmed-ophtha-split pull-models --local-dir .
```
Or download directly via `huggingface_hub`:
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
```
After downloading, run inference via the `DetectronFigureSplitter`:
```python
from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
from pubmed_ophtha.const.models import get_default_model_args
splitter = DetectronFigureSplitter(**get_default_model_args())
with open("figure.png", "rb") as f:
image_bytes = f.read()
predictions = splitter.predict(image_bytes)
# Keys: pred_boxes, pred_classes, scores,
# secondary_pred_classes, secondary_scores, keep_after_nms
```
`pred_classes` contains Panel/Label detections from the panel detection model followed
by CFP/OCT/Retinal Imaging/Other detections from the image type model.
`secondary_pred_classes` contains the Plain/Annotated mark status for each image
detection (set to `"None"` for panel detections).
## Training
Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
for 35 epochs with random cropping, flips, affine transformations, and color
augmentation.
## Dataset
The ground-truth annotations used for training are available as part of the
PubMed-Ophtha dataset:
[huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)
## Citation
```bibtex
@article{hallitschke2026pubmed,
title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
journal={arXiv preprint arXiv:2605.02720},
year={2026}
}
```
## License
MIT — see [LICENSE](LICENSE). |