Datasets used in "Understanding the Design Space and Cross-Modality Transfer for Vision-Language Models"