Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling Paper • 2604.05072 • Published 7 days ago • 4
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 6 days ago • 72
GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos Paper • 2604.07273 • Published 9 days ago • 4
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding Paper • 2604.00886 • Published 15 days ago • 6
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 24 days ago • 123
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 22 days ago • 26
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 21 days ago • 15
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video Paper • 2603.21618 • Published 24 days ago • 15
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published about 1 month ago • 153
See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis Paper • 2602.20951 • Published Feb 24 • 14
SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking Paper • 2602.20792 • Published Feb 24 • 3
SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads Paper • 2602.07449 • Published Feb 7 • 4