| --- |
| license: mit |
| tags: |
| - ophthalmology |
| - object-detection |
| - image-classification |
| - medical-imaging |
| - fundus |
| - oct |
| - detectron2 |
| - figure-parsing |
| - pytorch |
| --- |
| |
| # PubMed-Ophtha Detection Models |
|
|
| This repository contains the three detection and classification models used in the |
| [PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing |
| ophthalmological figures from scientific publications. |
|
|
| **Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource |
| for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026). |
|
|
| ## Models |
|
|
| The repository contains three model checkpoints under `models/`: |
|
|
| | Directory | Checkpoint | Framework | Architecture | Task | Classes | |
| |---|---|---|---|---|---| |
| | `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other | |
| | `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label | |
| | `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated | |
|
|
| Each Detectron2 model directory also contains a `config.yaml` required for inference. |
|
|
| ### Panel Detection Model |
|
|
| Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures. |
| Trained on the PubMed-Ophtha-Annotation dataset merged with |
| [PanelSeg](https://doi.org/10.1145/3331184.3331253) and |
| [ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an |
| ImageCLEF2016-pretrained checkpoint. |
|
|
| - **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers) |
| - **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers) |
|
|
| ### Image Type Detection Model |
|
|
| Detects individual images within a panel and assigns each a retinal imaging modality: |
| color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging |
| (ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.). |
|
|
| - **mAP@0.50:** 0.892 |
| - **mAP@0.95:** 0.558 |
|
|
| ### Mark Status Classifier |
|
|
| A ResNet-50 binary classifier applied to cropped image regions detected by the image |
| type model. Predicts whether an image contains annotation marks such as arrows, dots, |
| or bounding boxes. |
|
|
| - **Accuracy:** 89.5% on the held-out test set |
|
|
| ## Usage |
|
|
| Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha) |
| Python package. Download all weights with: |
|
|
| ```bash |
| pip install pubmed-ophtha |
| pubmed-ophtha-split pull-models --local-dir . |
| ``` |
|
|
| Or download directly via `huggingface_hub`: |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| |
| snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".") |
| ``` |
|
|
| After downloading, run inference via the `DetectronFigureSplitter`: |
|
|
| ```python |
| from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter |
| from pubmed_ophtha.const.models import get_default_model_args |
| |
| splitter = DetectronFigureSplitter(**get_default_model_args()) |
| |
| with open("figure.png", "rb") as f: |
| image_bytes = f.read() |
| |
| predictions = splitter.predict(image_bytes) |
| # Keys: pred_boxes, pred_classes, scores, |
| # secondary_pred_classes, secondary_scores, keep_after_nms |
| ``` |
|
|
| `pred_classes` contains Panel/Label detections from the panel detection model followed |
| by CFP/OCT/Retinal Imaging/Other detections from the image type model. |
| `secondary_pred_classes` contains the Plain/Annotated mark status for each image |
| detection (set to `"None"` for panel detections). |
|
|
| ## Training |
|
|
| Both RetinaNet models use a ResNet backbone with FPN, finetuned from an |
| ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation |
| dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint |
| for 35 epochs with random cropping, flips, affine transformations, and color |
| augmentation. |
|
|
| ## Dataset |
|
|
| The ground-truth annotations used for training are available as part of the |
| PubMed-Ophtha dataset: |
| [huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{hallitschke2026pubmed, |
| title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature}, |
| author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp}, |
| journal={arXiv preprint arXiv:2605.02720}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT — see [LICENSE](LICENSE). |