Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 12 days ago • 36
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 13 days ago • 841
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 19 days ago • 15
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper • 2603.21944 • Published 22 days ago • 26
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 22 days ago • 123
Hidden Dynamics of Massive Activations in Transformer Training Paper • 2508.03616 • Published Aug 5, 2025 • 19
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation Paper • 2410.17799 • Published Oct 23, 2024 • 12
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 29 days ago • 153
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning Paper • 2401.08553 • Published Jan 16, 2024 • 2
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization Paper • 2303.14189 • Published Mar 24, 2023 • 5