This model is similar to https://huggingface.co/nlpconnect/vit-gpt2-image-captioning but uses Distil-GPT2 instead of GPT2 for the text encoder
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support