Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.07982

Unbounded 3D worlds

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 33
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction

Paper • 2509.21657 • Published Sep 25, 2025 • 4
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 37
GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing

Paper • 2508.02831 • Published Aug 4, 2025 • 12

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2, 2025 • 69
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published Jul 16, 2025 • 25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Paper • 2507.09751 • Published Jul 13, 2025 • 2
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9, 2025 • 107
Depth Anything at Any Condition

Paper • 2507.01634 • Published Jul 2, 2025 • 49

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 13
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

Unbounded 3D worlds

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 33
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction

Paper • 2509.21657 • Published Sep 25, 2025 • 4
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 37
GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing

Paper • 2508.02831 • Published Aug 4, 2025 • 12

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9, 2025 • 107
Depth Anything at Any Condition

Paper • 2507.01634 • Published Jul 2, 2025 • 49

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2, 2025 • 69
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published Jul 16, 2025 • 25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Paper • 2507.09751 • Published Jul 13, 2025 • 2
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 13
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs