Image Feature Extraction
Birder
PyTorch

Model Card for vit_b16_pn_bioclip-v1

A ViT b16 image encoder from BioCLIP by Stevens et al., converted to the Birder format for image feature extraction. This version preserves the original model weights and architecture. Trained on the large-scale TreeOfLife-10M dataset, it serves as a powerful foundation for downstream computer vision tasks. The model excels at understanding biological imagery across diverse taxonomic groups.

See: https://huggingface.co/imageomics/bioclip for further details.

Model Details

  • Model Type: Image classification and detection backbone

  • Model Stats:

    • Params (M): 86.2
    • Input image size: 224 x 224
  • Dataset: Trained on the TreeOfLife-10M dataset

  • Papers:

Model Usage

Image Embeddings

import birder
from birder.inference.classification import infer_image

(net, model_info) = birder.load_pretrained_model("vit_b16_pn_bioclip-v1", inference=True)

# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)

# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)

image = "path/to/image.jpeg"  # or a PIL image
(out, embedding) = infer_image(net, image, transform, return_embedding=True)
# embedding is a NumPy array with shape of (1, 768)

Detection Feature Map

from PIL import Image
import birder

(net, model_info) = birder.load_pretrained_model("vit_b16_pn_bioclip-v1", inference=True)

# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)

# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)

image = Image.open("path/to/image.jpeg")
features = net.detection_features(transform(image).unsqueeze(0))
# features is a dict (stage name -> torch.Tensor)
print([(k, v.size()) for k, v in features.items()])
# Output example:
# [('neck', torch.Size([1, 768, 14, 14]))]

Citation

@misc{dosovitskiy2021imageworth16x16words,
      title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
      author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
      year={2021},
      eprint={2010.11929},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2010.11929},
}

@misc{stevens2024bioclipvisionfoundationmodel,
      title={BioCLIP: A Vision Foundation Model for the Tree of Life},
      author={Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
      year={2024},
      eprint={2311.18803},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2311.18803},
}

@software{ilharco_gabriel_2021_5143773,
  author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig},
  title={OpenCLIP},
  year={2021},
  doi={10.5281/zenodo.5143773},
}
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for birder-project/vit_b16_pn_bioclip-v1

Finetuned
(1)
this model

Dataset used to train birder-project/vit_b16_pn_bioclip-v1

Papers for birder-project/vit_b16_pn_bioclip-v1