-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2512.22323
-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 39 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 52 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 101 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 36 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 31 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Zero Bubble Pipeline Parallelism
Paper • 2401.10241 • Published • 25
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 39 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 52 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 101 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 36 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 31 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Zero Bubble Pipeline Parallelism
Paper • 2401.10241 • Published • 25