RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details Paper • 2604.06870 • Published 6 days ago • 36
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On Paper • 2604.08526 • Published 5 days ago • 19
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published 6 days ago • 34
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization Paper • 2604.04787 • Published 8 days ago • 12
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 17 days ago • 25
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data Paper • 2604.01666 • Published 12 days ago • 10
PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models Paper • 2603.28763 • Published 14 days ago • 7
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 14 days ago • 13
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 14 days ago • 48
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 17 days ago • 62
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 19 days ago • 26
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 18 days ago • 15
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 20 days ago • 33
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 22 days ago • 123
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 27 days ago • 109
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 25 days ago • 42