-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
Collections
Discover the best community collections!
Collections including paper arxiv:2212.09748
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 697k • • 12.7k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.41M • 1.27k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • Updated • 24.5k • 516
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 15 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 21 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 18
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 85 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Generating 3D-Consistent Videos from Unposed Internet Photos
Paper • 2411.13549 • Published -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 56 -
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Paper • 2412.12093 • Published
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 5 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 156
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 85 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Generating 3D-Consistent Videos from Unposed Internet Photos
Paper • 2411.13549 • Published -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 56 -
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Paper • 2412.12093 • Published
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 697k • • 12.7k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.41M • 1.27k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • Updated • 24.5k • 516
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 5 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 156
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 15 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 21 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 18
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17