pubmed-ophtha commited on
Commit
16643f3
·
verified ·
1 Parent(s): c95b5bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md CHANGED
@@ -1,3 +1,130 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - ophthalmology
5
+ - object-detection
6
+ - image-classification
7
+ - medical-imaging
8
+ - fundus
9
+ - oct
10
+ - detectron2
11
+ - figure-parsing
12
+ - pytorch
13
  ---
14
+
15
+ # PubMed-Ophtha Detection Models
16
+
17
+ This repository contains the three detection and classification models used in the
18
+ [PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
19
+ ophthalmological figures from scientific publications.
20
+
21
+ **Paper:** Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
22
+ for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).
23
+
24
+ ## Models
25
+
26
+ The repository contains three model checkpoints under `models/`:
27
+
28
+ | Directory | Checkpoint | Framework | Architecture | Task | Classes |
29
+ |---|---|---|---|---|---|
30
+ | `imaging_type_detection_1515892632/` | `model_0003909.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Image type detection | CFP, OCT, Retinal Imaging, Other |
31
+ | `panel_detection_1020880423/` | `model_0026865.pth` | PyTorch (Detectron2) | RetinaNet + ResNet FPN | Panel & identifier detection | Panel, Label |
32
+ | `mark_status_classifier_482239176/` | `model_epoch_7.pth` | PyTorch | ResNet-50 | Mark status classification | Plain, Annotated |
33
+
34
+ Each Detectron2 model directory also contains a `config.yaml` required for inference.
35
+
36
+ ### Panel Detection Model
37
+
38
+ Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
39
+ Trained on the PubMed-Ophtha-Annotation dataset merged with
40
+ [PanelSeg](https://doi.org/10.1145/3331184.3331253) and
41
+ [ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
42
+ ImageCLEF2016-pretrained checkpoint.
43
+
44
+ - **mAP@0.50:** 0.909 (panels), 0.903 (panel identifiers)
45
+ - **mAP@0.95:** 0.532 (panels), 0.018 (panel identifiers)
46
+
47
+ ### Image Type Detection Model
48
+
49
+ Detects individual images within a panel and assigns each a retinal imaging modality:
50
+ color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
51
+ (ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).
52
+
53
+ - **mAP@0.50:** 0.892
54
+ - **mAP@0.95:** 0.558
55
+
56
+ ### Mark Status Classifier
57
+
58
+ A ResNet-50 binary classifier applied to cropped image regions detected by the image
59
+ type model. Predicts whether an image contains annotation marks such as arrows, dots,
60
+ or bounding boxes.
61
+
62
+ - **Accuracy:** 89.5% on the held-out test set
63
+
64
+ ## Usage
65
+
66
+ Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
67
+ Python package. Download all weights with:
68
+
69
+ ```bash
70
+ pip install pubmed-ophtha
71
+ pubmed-ophtha-split pull-models --local-dir .
72
+ ```
73
+
74
+ Or download directly via `huggingface_hub`:
75
+
76
+ ```python
77
+ from huggingface_hub import snapshot_download
78
+
79
+ snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
80
+ ```
81
+
82
+ After downloading, run inference via the `DetectronFigureSplitter`:
83
+
84
+ ```python
85
+ from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
86
+ from pubmed_ophtha.const.models import get_default_model_args
87
+
88
+ splitter = DetectronFigureSplitter(**get_default_model_args())
89
+
90
+ with open("figure.png", "rb") as f:
91
+ image_bytes = f.read()
92
+
93
+ predictions = splitter.predict(image_bytes)
94
+ # Keys: pred_boxes, pred_classes, scores,
95
+ # secondary_pred_classes, secondary_scores, keep_after_nms
96
+ ```
97
+
98
+ `pred_classes` contains Panel/Label detections from the panel detection model followed
99
+ by CFP/OCT/Retinal Imaging/Other detections from the image type model.
100
+ `secondary_pred_classes` contains the Plain/Annotated mark status for each image
101
+ detection (set to `"None"` for panel detections).
102
+
103
+ ## Training
104
+
105
+ Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
106
+ ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
107
+ dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
108
+ for 35 epochs with random cropping, flips, affine transformations, and color
109
+ augmentation.
110
+
111
+ ## Dataset
112
+
113
+ The ground-truth annotations used for training are available as part of the
114
+ PubMed-Ophtha dataset:
115
+ [huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)
116
+
117
+ ## Citation
118
+
119
+ ```bibtex
120
+ @article{hallitschke2026pubmed,
121
+ title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
122
+ author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
123
+ journal={arXiv preprint arXiv:2605.02720},
124
+ year={2026}
125
+ }
126
+ ```
127
+
128
+ ## License
129
+
130
+ MIT — see [LICENSE](LICENSE).