Papers
arxiv:2407.07726

PaliGemma: A versatile 3B VLM for transfer

Published on Jul 10, 2024
· Submitted by
AK
on Jul 11, 2024
#1 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

PaliGemma, a versatile Vision-Language Model based on SigLIP-So400m and Gemma-2B, demonstrates strong performance across numerous open-world tasks, including specialized areas like remote sensing and segmentation.

AI-generated summary

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

Community

Paper submitter

Screen Shot 2024-07-10 at 10.55.19 PM.png

also read hf.co/blog/paligemma

are the finetuned models going to be available on huggingface?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2407.07726
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 177

Browse 177 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 113

Collections including this paper 24