-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper β’ 2401.09048 β’ Published β’ 10 -
Improving fine-grained understanding in image-text pre-training
Paper β’ 2401.09865 β’ Published β’ 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper β’ 2401.10891 β’ Published β’ 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper β’ 2401.13627 β’ Published β’ 78
Collections
Discover the best community collections!
Collections including paper arxiv:2411.14793
-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Paper β’ 2411.07126 β’ Published β’ 30 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39 -
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Paper β’ 2411.09449 β’ Published -
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper β’ 2411.15098 β’ Published β’ 61
-
Controllable Text Generation for Large Language Models: A Survey
Paper β’ 2408.12599 β’ Published β’ 65 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper β’ 2408.12590 β’ Published β’ 35 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper β’ 2408.12588 β’ Published β’ 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper β’ 2408.11039 β’ Published β’ 63
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper β’ 2405.08748 β’ Published β’ 23 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper β’ 2405.10300 β’ Published β’ 31 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper β’ 2405.09818 β’ Published β’ 134 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper β’ 2405.11143 β’ Published β’ 41
-
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper β’ 2404.02733 β’ Published β’ 22 -
Stylecodes: Encoding Stylistic Information For Image Generation
Paper β’ 2411.12811 β’ Published β’ 12 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper β’ 2411.10958 β’ Published β’ 57
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper β’ 2410.10306 β’ Published β’ 57 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper β’ 2411.05003 β’ Published β’ 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper β’ 2411.04709 β’ Published β’ 27 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper β’ 2410.07171 β’ Published β’ 43
-
Magic Insert: Style-Aware Drag-and-Drop
Paper β’ 2407.02489 β’ Published β’ 21 -
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper β’ 2408.05492 β’ Published β’ 7 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper β’ 2408.16766 β’ Published β’ 18 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper β’ 2401.09048 β’ Published β’ 10 -
Improving fine-grained understanding in image-text pre-training
Paper β’ 2401.09865 β’ Published β’ 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper β’ 2401.10891 β’ Published β’ 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper β’ 2401.13627 β’ Published β’ 78
-
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper β’ 2404.02733 β’ Published β’ 22 -
Stylecodes: Encoding Stylistic Information For Image Generation
Paper β’ 2411.12811 β’ Published β’ 12 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper β’ 2411.10958 β’ Published β’ 57
-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Paper β’ 2411.07126 β’ Published β’ 30 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39 -
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Paper β’ 2411.09449 β’ Published -
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper β’ 2411.15098 β’ Published β’ 61
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper β’ 2410.10306 β’ Published β’ 57 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper β’ 2411.05003 β’ Published β’ 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper β’ 2411.04709 β’ Published β’ 27 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper β’ 2410.07171 β’ Published β’ 43
-
Controllable Text Generation for Large Language Models: A Survey
Paper β’ 2408.12599 β’ Published β’ 65 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper β’ 2408.12590 β’ Published β’ 35 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper β’ 2408.12588 β’ Published β’ 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper β’ 2408.11039 β’ Published β’ 63
-
Magic Insert: Style-Aware Drag-and-Drop
Paper β’ 2407.02489 β’ Published β’ 21 -
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper β’ 2408.05492 β’ Published β’ 7 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper β’ 2408.16766 β’ Published β’ 18 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper β’ 2411.14793 β’ Published β’ 39
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper β’ 2405.08748 β’ Published β’ 23 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper β’ 2405.10300 β’ Published β’ 31 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper β’ 2405.09818 β’ Published β’ 134 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper β’ 2405.11143 β’ Published β’ 41