Papers - Image - Encoders - Clip
updated
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper
• 2403.18978
• Published • 15
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
Generation
Paper
• 2404.02733
• Published • 22
OmniFusion Technical Report
Paper
• 2404.06212
• Published • 77
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper
• 2404.07448
• Published • 12
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal
Large Language Models
Paper
• 2404.09204
• Published • 11
MoDE: CLIP Data Experts via Clustering
Paper
• 2404.16030
• Published • 15
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper
• 2404.17672
• Published • 19
Stylus: Automatic Adapter Selection for Diffusion Models
Paper
• 2404.18928
• Published • 15
Data curation via joint example selection further accelerates multimodal
learning
Paper
• 2406.17711
• Published • 3
MAVIS: Mathematical Visual Instruction Tuning
Paper
• 2407.08739
• Published • 32
Law of Vision Representation in MLLMs
Paper
• 2408.16357
• Published • 95
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
Image-to-Video Generation
Paper
• 2411.04709
• Published • 27
SLIP: Self-supervision meets Language-Image Pre-training
Paper
• 2112.12750
• Published • 1
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper
• 2411.02327
• Published • 11
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper
• 2203.03897
• Published • 1