Model Details

The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general model deployment - to deploy models like CLIP, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within.

Model Type

The base model uses a ResNet50 with several modifications as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. There is also a variant of the model where the ResNet image encoder is replaced with a Vision Transformer.

Model Versions

Initially, we’ve released one CLIP model based on the Vision Transformer architecture equivalent to ViT-B/32, along with the RN50 model, using the architecture equivalent to ResNet-50.

As part of the staged release process, we have also released the RN101 model, as well as RN50x4, a RN50 scaled up 4x according to the EfficientNet scaling rule. In July 2021, we additionally released the RN50x16 and ViT-B/16 models, and in January 2022, the RN50x64 and ViT-L/14 models were released. Lastly, the ViT-L/14@336px model was released in April 2022.

Please see the paper linked below for further details about their specification.

Source model

Input shape: [1x3x224x224], [1x77]
Number of parameters: 82.25M, 60.61M
Model size: 329.00M, 242.44M
Output shape: [1x512], [1x512]

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Source Model: MIT
Deployable Model: APLUX-MODEL-FARM-LICENSE

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for APLUX-ModelFarm/OpenAI-CLIP-ViT-B16

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 4