Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks Paper • 2602.23898 • Published Feb 27 • 10
Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning Paper • 2602.09439 • Published Feb 10 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 39