-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2511.19365
-
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
The Collapse of Patches
Paper • 2511.22281 • Published • 6 -
Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories
Paper • 2511.23342 • Published • 15 -
Glance: Accelerating Diffusion Models with 1 Sample
Paper • 2512.02899 • Published • 30
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 31 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper • 2602.02493 • Published • 46 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers
Paper • 2603.10744 • Published • 7 -
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration
Paper • 2603.24800 • Published • 68
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 47
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper • 2602.02493 • Published • 46 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers
Paper • 2603.10744 • Published • 7 -
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration
Paper • 2603.24800 • Published • 68
-
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
The Collapse of Patches
Paper • 2511.22281 • Published • 6 -
Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories
Paper • 2511.23342 • Published • 15 -
Glance: Accelerating Diffusion Models with 1 Sample
Paper • 2512.02899 • Published • 30
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 47
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 31 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23