-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 33 -
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Paper • 2509.21657 • Published • 4 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37 -
GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing
Paper • 2508.02831 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2507.07982
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 25 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 2 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95
-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 33 -
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Paper • 2509.21657 • Published • 4 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37 -
GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing
Paper • 2508.02831 • Published • 12
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 25 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 2 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13