-
Utonia: Toward One Encoder for All Point Clouds
Paper • 2603.03283 • Published • 185 -
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM
Paper • 2603.23386 • Published • 40 -
Learn2Fold: Structured Origami Generation with World Model Planning
Paper • 2603.29585 • Published • 16 -
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
Paper • 2604.07882 • Published • 9
Collections
Discover the best community collections!
Collections including paper arxiv:2603.23386
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 55 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 103 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 64 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper • 2508.14879 • Published • 69 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
Paper • 2508.17437 • Published • 37 -
Multi-View 3D Point Tracking
Paper • 2508.21060 • Published • 23
-
Utonia: Toward One Encoder for All Point Clouds
Paper • 2603.03283 • Published • 185 -
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM
Paper • 2603.23386 • Published • 40 -
Learn2Fold: Structured Origami Generation with World Model Planning
Paper • 2603.29585 • Published • 16 -
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
Paper • 2604.07882 • Published • 9
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 103 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 64 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 55 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper • 2508.14879 • Published • 69 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
Paper • 2508.17437 • Published • 37 -
Multi-View 3D Point Tracking
Paper • 2508.21060 • Published • 23
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95