-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2409.13346
-
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 69 -
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
Paper • 2409.13592 • Published • 50 -
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
Paper • 2409.13648 • Published • 11
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 69 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 96
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 32 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 69
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 34 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 42 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 30
-
Rich feature hierarchies for accurate object detection and semantic segmentation
Paper • 1311.2524 • Published • 1 -
DeepPose: Human Pose Estimation via Deep Neural Networks
Paper • 1312.4659 • Published • 1 -
Generative Adversarial Networks
Paper • 1406.2661 • Published • 5 -
scikit-image: Image processing in Python
Paper • 1407.6245 • Published • 1
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 69 -
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
Paper • 2409.13592 • Published • 50 -
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
Paper • 2409.13648 • Published • 11
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 32 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 69
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 34 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 42 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 30
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 69 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 96
-
Rich feature hierarchies for accurate object detection and semantic segmentation
Paper • 1311.2524 • Published • 1 -
DeepPose: Human Pose Estimation via Deep Neural Networks
Paper • 1312.4659 • Published • 1 -
Generative Adversarial Networks
Paper • 1406.2661 • Published • 5 -
scikit-image: Image processing in Python
Paper • 1407.6245 • Published • 1