Collections
Discover the best community collections!
Collections including paper arxiv:2604.28190
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 80 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 86 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 28 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 61
-
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper • 2509.20427 • Published • 84 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Paper • 2512.00473 • Published • 27 -
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
Paper • 2602.03139 • Published • 44
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 80 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 86 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 28 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 61
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper • 2509.20427 • Published • 84 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Paper • 2512.00473 • Published • 27 -
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
Paper • 2602.03139 • Published • 44
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122