Segment images into objects, instances, or scenes
Generate depth maps and 3D views from photos
Annotate and describe images with text prompts
a tiny vision language model