Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 59
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published Feb 2 • 46
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 52
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published Nov 24, 2025 • 66
RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models Paper • 2510.25257 • Published Oct 29, 2025 • 6
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians Paper • 2505.11934 • Published May 17, 2025 • 1
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22, 2025 • 91
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper • 2410.12787 • Published Oct 16, 2024 • 30