Envision Eye Imaging Classifier

SetFit binary classifier for identifying eye imaging datasets from scientific metadata.

Developed by: FAIR Data Innovations Hub in collaboration with the EyeACT Study

Model Description

Uses sentence-transformers/all-mpnet-base-v2 as backbone with binary classification:

  • EYE_IMAGING (1): Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
  • NEGATIVE (0): Everything else (software, non-imaging eye data, unrelated)

Validation

Spot-check (33 expert-verified Zenodo records)

Metric Score
Accuracy 0.939 (31/33)
Macro F1 0.923
EYE_IMAGING F1 0.889 (P=0.889, R=0.889)
NEGATIVE F1 0.958 (P=0.958, R=0.958)

Held-out test set (20% stratified split)

Metric Score
Accuracy 0.940
Macro F1 0.936
EYE_IMAGING F1 0.922 (P=0.887, R=0.959)
NEGATIVE F1 0.951 (P=0.975, R=0.929)

Multi-repository spot-check (6,833 records across 6 sources)

Source Records EYE_IMAGING F1 Precision Recall
Zenodo 514 0.677 0.537 0.917
DataCite 1,836 0.866 0.858 0.874
Figshare 2,000 0.833 0.788 0.884
Kaggle 732 0.739 0.939 0.610
Dryad 89 0.764 0.750 0.778
NEI 1,662 0.814 0.931 0.724
Overall 6,833 0.822 0.845 0.800

Training

  • Base model: sentence-transformers/all-mpnet-base-v2 (768-dimensional)
  • Training data: 994 examples (365 EYE_IMAGING, 629 NEGATIVE) from multi-repository sources (Zenodo, Figshare, Dryad, Kaggle, NEI)
  • Dataset: fairdataihub/envision-eye-imaging-training-data
  • Epochs: 10 (early stopping, patience=3)
  • Batch size: 16
  • Learning rate: 2e-5 (default)
  • Scheduler: linear with 10% warmup

Usage

from setfit import SetFitModel

model = SetFitModel.from_pretrained("fairdataihub/envision-eye-imaging-classifier")

predictions = model.predict(["Retinal OCT dataset for diabetic retinopathy"])

Citation

  • EyeACT Envision project
  • FAIR Data Innovations Hub (fairdataihub.org)
  • sentence-transformers/all-mpnet-base-v2

Contact

EyeACT team: eyeactstudy.org

Downloads last month
34
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train fairdataihub/envision-eye-imaging-classifier