Update README.md

16643f3 verified 3 days ago

4.52 kB

	---
	license: mit
	tags:
	- ophthalmology
	- object-detection
	- image-classification
	- medical-imaging
	- fundus
	- oct
	- detectron2
	- figure-parsing
	- pytorch
	---

	# PubMed-Ophtha Detection Models

	This repository contains the three detection and classification models used in the
	[PubMed-Ophtha](https://arxiv.org/abs/2605.02720) dataset pipeline for parsing
	ophthalmological figures from scientific publications.

	Paper: Hallitschke V.J., Eickhoff C., Berens P. *PubMed-Ophtha: An open resource
	for training ophthalmology vision-language models on scientific literature.* arXiv:2605.02720 (2026).

	## Models

	The repository contains three model checkpoints under `models/`:

	\| Directory \| Checkpoint \| Framework \| Architecture \| Task \| Classes \|
	\|---\|---\|---\|---\|---\|---\|
	\| `imaging_type_detection_1515892632/` \| `model_0003909.pth` \| PyTorch (Detectron2) \| RetinaNet + ResNet FPN \| Image type detection \| CFP, OCT, Retinal Imaging, Other \|
	\| `panel_detection_1020880423/` \| `model_0026865.pth` \| PyTorch (Detectron2) \| RetinaNet + ResNet FPN \| Panel & identifier detection \| Panel, Label \|
	\| `mark_status_classifier_482239176/` \| `model_epoch_7.pth` \| PyTorch \| ResNet-50 \| Mark status classification \| Plain, Annotated \|

	Each Detectron2 model directory also contains a `config.yaml` required for inference.

	### Panel Detection Model

	Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures.
	Trained on the PubMed-Ophtha-Annotation dataset merged with
	[PanelSeg](https://doi.org/10.1145/3331184.3331253) and
	[ImageCLEF2016](https://www.imageclef.org/2016/medical), starting from an
	ImageCLEF2016-pretrained checkpoint.

	- mAP@0.50: 0.909 (panels), 0.903 (panel identifiers)
	- mAP@0.95: 0.532 (panels), 0.018 (panel identifiers)

	### Image Type Detection Model

	Detects individual images within a panel and assigns each a retinal imaging modality:
	color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging
	(ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).

	- mAP@0.50: 0.892
	- mAP@0.95: 0.558

	### Mark Status Classifier

	A ResNet-50 binary classifier applied to cropped image regions detected by the image
	type model. Predicts whether an image contains annotation marks such as arrows, dots,
	or bounding boxes.

	- Accuracy: 89.5% on the held-out test set

	## Usage

	Models are consumed by the [`pubmed-ophtha`](https://github.com/berenslab/pubmed-ophtha)
	Python package. Download all weights with:

	```bash
	pip install pubmed-ophtha
	pubmed-ophtha-split pull-models --local-dir .
	```

	Or download directly via `huggingface_hub`:

	```python
	from huggingface_hub import snapshot_download

	snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")
	```

	After downloading, run inference via the `DetectronFigureSplitter`:

	```python
	from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
	from pubmed_ophtha.const.models import get_default_model_args

	splitter = DetectronFigureSplitter(**get_default_model_args())

	with open("figure.png", "rb") as f:
	image_bytes = f.read()

	predictions = splitter.predict(image_bytes)
	# Keys: pred_boxes, pred_classes, scores,
	# secondary_pred_classes, secondary_scores, keep_after_nms
	```

	`pred_classes` contains Panel/Label detections from the panel detection model followed
	by CFP/OCT/Retinal Imaging/Other detections from the image type model.
	`secondary_pred_classes` contains the Plain/Annotated mark status for each image
	detection (set to `"None"` for panel detections).

	## Training

	Both RetinaNet models use a ResNet backbone with FPN, finetuned from an
	ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation
	dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint
	for 35 epochs with random cropping, flips, affine transformations, and color
	augmentation.

	## Dataset

	The ground-truth annotations used for training are available as part of the
	PubMed-Ophtha dataset:
	[huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha](https://huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha)

	## Citation

	```bibtex
	@article{hallitschke2026pubmed,
	title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
	author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
	journal={arXiv preprint arXiv:2605.02720},
	year={2026}
	}
	```

	## License

	MIT — see [LICENSE](LICENSE).