-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2511.01678
-
vrgamedevgirl84/Wan14BT2VFusioniX
Text-to-Video • Updated • 605 -
TheStageAI/Elastic-mochi-1-preview
Text-to-Video • Updated • 24 • 2 -
nesaorg/animatediff-base
Text-to-Video • Updated • 140 -
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Paper • 2506.18839 • Published • 13
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper • 2409.13690 • Published • 13 -
Latent Intrinsics Emerge from Training to Relight
Paper • 2405.21074 • Published • 1 -
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
Paper • 2409.14677 • Published • 15 -
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Paper • 2501.09756 • Published • 20
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
vrgamedevgirl84/Wan14BT2VFusioniX
Text-to-Video • Updated • 605 -
TheStageAI/Elastic-mochi-1-preview
Text-to-Video • Updated • 24 • 2 -
nesaorg/animatediff-base
Text-to-Video • Updated • 140 -
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Paper • 2506.18839 • Published • 13
-
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper • 2409.13690 • Published • 13 -
Latent Intrinsics Emerge from Training to Relight
Paper • 2405.21074 • Published • 1 -
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
Paper • 2409.14677 • Published • 15 -
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Paper • 2501.09756 • Published • 20
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13