Multimodal
updated
Visual Representation Alignment for Multimodal Large Language Models
Paper
• 2509.07979
• Published • 84
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for
Interactive Complex World Generation
Paper
• 2509.05263
• Published • 11
Symbolic Graphics Programming with Large Language Models
Paper
• 2509.05208
• Published • 47
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper
• 2509.12201
• Published • 107
Multimodal Reasoning for Science: Technical Report and 1st Place
Solution to the ICML 2025 SeePhys Challenge
Paper
• 2509.06079
• Published • 6
Lost in Embeddings: Information Loss in Vision-Language Models
Paper
• 2509.11986
• Published • 29
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Paper
• 2509.11362
• Published • 5
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper
• 2509.11543
• Published • 50
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods,
Results, Discussion, and Outlook
Paper
• 2509.14142
• Published • 10
Qwen3-Omni Technical Report
Paper
• 2509.17765
• Published • 153