-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2508.10711
-
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Paper • 2505.19602 • Published • 13 -
DiSA: Diffusion Step Annealing in Autoregressive Image Generation
Paper • 2505.20297 • Published • 3 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28 -
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Paper • 2507.01957 • Published • 23
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 163
-
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Paper • 2508.10711 • Published • 146 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 189 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 17 • 7 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 82 -
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Paper • 2408.02657 • Published • 35 -
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Paper • 2508.10711 • Published • 146 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 153
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Paper • 2508.10711 • Published • 146 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 189 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Paper • 2505.19602 • Published • 13 -
DiSA: Diffusion Step Annealing in Autoregressive Image Generation
Paper • 2505.20297 • Published • 3 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28 -
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Paper • 2507.01957 • Published • 23
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 17 • 7 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 163
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 82 -
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Paper • 2408.02657 • Published • 35 -
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Paper • 2508.10711 • Published • 146 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 153