pubmed-ophtha
/

detection-models

 ---
 license: mit
+tags:
+- ophthalmology
+- object-detection
+- image-classification
+- medical-imaging
+- fundus
+- oct
+- detectron2
+- figure-parsing
+- pytorch
 ---
+# PubMed-Ophtha Detection Models
+This repository contains the three detection and classification models used in the
+[PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
+ophthalmological figures from scientific publications.
+**Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
+for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).
+## Models
+The repository contains three model checkpoints under `models/`:
+| Directory | Checkpoint | Framework | Architecture | Task | Classes |
+|---|---|---|---|---|---|
+| `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other |
+| `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label |
+| `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated |
+Each Detectron2 model directory also contains a `config.yaml` required for inference.
+### Panel Detection Model
+Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
+Trained on the PubMed-Ophtha-Annotation dataset merged with
+[PanelSeg](https://doi.org/10.1145/3331184.3331253) and
+[ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
+ImageCLEF2016-pretrained checkpoint.
+- **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers)
+- **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers)
+### Image Type Detection Model
+Detects individual images within a panel and assigns each a retinal imaging modality:
+color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
+(ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).
+- **mAP@0.50:** 0.892
+- **mAP@0.95:** 0.558
+### Mark Status Classifier
+A ResNet-50 binary classifier applied to cropped image regions detected by the image
+type model. Predicts whether an image contains annotation marks such as arrows, dots,
+or bounding boxes.
+- **Accuracy:** 89.5% on the held-out test set
+## Usage
+Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
+Python package. Download all weights with:
+```bash
+pip install pubmed-ophtha
+pubmed-ophtha-split pull-models --local-dir .
+```
+Or download directly via `huggingface_hub`:
+```python
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
+```
+After downloading, run inference via the `DetectronFigureSplitter`:
+```python
+from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
+from pubmed_ophtha.const.models import get_default_model_args
+splitter = DetectronFigureSplitter(**get_default_model_args())
+with open("figure.png", "rb") as f:
+    image_bytes = f.read()
+predictions = splitter.predict(image_bytes)
+# Keys: pred_boxes, pred_classes, scores,
+#       secondary_pred_classes, secondary_scores, keep_after_nms
+```
+`pred_classes` contains Panel/Label detections from the panel detection model followed
+by CFP/OCT/Retinal Imaging/Other detections from the image type model.
+`secondary_pred_classes` contains the Plain/Annotated mark status for each image
+detection (set to `"None"` for panel detections).
+## Training
+Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
+ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
+dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
+for 35 epochs with random cropping, flips, affine transformations, and color
+augmentation.
+## Dataset
+The ground-truth annotations used for training are available as part of the
+PubMed-Ophtha dataset:
+[huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)
+## Citation
+```bibtex
+@article{hallitschke2026pubmed,
+  title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
+  author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
+  journal={arXiv preprint arXiv:2605.02720},
+  year={2026}
+}
+```
+## License
+MIT — see [LICENSE](LICENSE).