EchoGen: Cycle-Consistent Learning for Unified Layout-Image Generation and Understanding Paper • 2603.18001 • Published 29 days ago • 2
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 17 days ago • 57
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 18 days ago • 143
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published Mar 9 • 32
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published Mar 9 • 32
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published Dec 26, 2025 • 18
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published Dec 26, 2025 • 18
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published Dec 9, 2025 • 18
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published Dec 5, 2025 • 38
Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training Paper • 2308.06689 • Published Aug 13, 2023 • 1
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input Paper • 2411.11934 • Published Nov 18, 2024
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 34
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper • 2506.21356 • Published Jun 26, 2025 • 22
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Paper • 2510.13759 • Published Oct 15, 2025 • 11
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published Mar 27, 2025 • 33
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29