WorldScore: A Unified Evaluation Benchmark for World Generation Paper • 2504.00983 • Published Apr 1, 2025 • 1
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks Paper • 1810.10191 • Published Oct 24, 2018 • 1
Deep Visual-Semantic Alignments for Generating Image Descriptions Paper • 1412.2306 • Published Dec 7, 2014
Dynamic-Resolution Model Learning for Object Pile Manipulation Paper • 2306.16700 • Published Jun 29, 2023 • 6
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 24
Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy Paper • 1912.07726 • Published Dec 16, 2019 • 1
Unsupervised Learning of Long-Term Motion Dynamics for Videos Paper • 1701.01821 • Published Jan 7, 2017
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Paper • 1602.07332 • Published Feb 23, 2016
Perceptual Losses for Real-Time Style Transfer and Super-Resolution Paper • 1603.08155 • Published Mar 27, 2016