WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 9 days ago • 237
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published 19 days ago • 30
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 24 days ago • 35
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 26 days ago • 123
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction Paper • 2602.13294 • Published Feb 9 • 13
VideoMaMa: Mask-Guided Video Matting via Generative Prior Paper • 2601.14255 • Published Jan 20 • 15
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published Dec 18, 2025 • 48
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 170
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published Apr 17, 2025 • 36
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17, 2025 • 51
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models Paper • 2503.18886 • Published Mar 24, 2025 • 24
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation Paper • 2503.10636 • Published Mar 13, 2025 • 3
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published Feb 26, 2025 • 47
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 164